## 12. Multiple Regression

##### 1. A study was conducted to assess the influence of various factors on the start of new firms in the agricultural industry. For a sample of 70 countries the following model was estimated:

yn = -59.31 + 4.983x1 + 2.198x2 + 3.816x3 - 0.310x4 11.1562 10.2102 12.0632 10.3302

-0.886x5 + 3.215x6 + 0.85x7 13.0552 11.5682 10.3542

R2 = 0.766

where:

*yn = new business starts in the industry*

*x1 = population in millions*

*x2 = industry size*

*x3 = measure of economic quality of life*

*x4 = measure of political quality of life*

*x5 = measure of environmental quality of life*

*x6 = measure of health and educational quality of life*

*x7 = measure of social quality of life*

The numbers in parentheses under the coefficients are the estimated coefficient standard errors.

a. Interpret the estimated regression coefficients.

b. Interpret the coefficient of determination.

c. Find a 90% confidence interval for the increase in new business starts resulting from a one-unit increase in the economic quality of life, with all other variables unchanged.

d. Test, against a two-sided alternative at the 5% level, the null hypothesis that, all else remaining equal, the environmental quality of life does not influence new business starts.

e. Test, against a two-sided alternative at the 5% level, the null hypothesis that, all else remaining equal, the health and educational quality of life does not influence new business starts.

f. Test the null hypothesis that, taken together, these seven independent variables do not influence new business starts.

##### 2. Based on 25 years of annual data, an attempt was made to explain savings in Japan. The model fitted was as follows:

*y *= b0 + b1*x*1 + b2*x*2 + e

where

*y = change in real deposit rate*

*x1 = change in real per capita income*

*x2 = change in real interest rate*

The least squares parameter estimates (with standard errors in parentheses) were (Ghatak and Deadman 1989) as follows:

*b*1 = 0.097410.02152 *b*2 = 0.37410.2092

The adjusted coefficient of determination was as follows:

*R*2 = .91

a. Find and interpret a 99% confidence interval for b1.

b. Test, against the alternative that it is positive, the null hypothesis that b2 is 0.

c. Find the coefficient of determination.

d. Test the null hypothesis that b1 = b2 = 0.

e. Find and interpret the coefficient of multiple correlation.

##### 3. Based on data from 63 countries, the following model was estimated by least squares:

*y*n = 0.58 - .052*x*1 - .005*x*2 *R*2 = .17

1.0192 1.0422

where:

*yn = growth rate in real gross domestic product*

*x1 = real income per capita*

*x2 = average tax rate, as a proportion of gross national product*

The numbers in parentheses under the coefficients are the estimated coefficient standard errors.

a. Test against a two-sided alternative the null hypothesis that b1 is 0. Interpret your result.

b. Test against a two-sided alternative the null hypothesis that b2 is 0. Interpret your result.

c. Interpret the coefficient of determination.

d. Find and interpret the coefficient of multiple correlation.

## 13. Additional Topics in Regression Analysis

**1**. The following model was fitted to data on 90 French technical companies:

yn = 0.819 + 2.11x111.792 + 0.96x21.942 - 0.059x310.1442 + 5.87x4 14.082 + 0.00226x510.001152

R2 = .410

where the numbers in parentheses are estimated coefficient standard errors and

*y = share price*

*x1 = earnings per share*

*x2 = funds flow per share*

*x3 = dividends per share*

*x4 = book value per share*

*x5 = a measure of growth*

a. Test at the 10% level the null hypothesis that the coefficient on x1 is 0 in the population regression against the alternative that the true coefficient is positive.

b. Test at the 10% level the null hypothesis that the coefficient on x2 is 0 in the population regression against the alternative that the true coefficient is positive.

c. The variable X2 was dropped from the original model, and the regression of Y on 1X1, X3, X4, X52 was estimated. The estimated coefficient on X1 was 2.95 with standard error 0.63. How can this result be reconciled with the conclusion of part a?

**2**. A market researcher is interested in the average amount of money per year spent by students on books. From 30 years of annual data, the following regression was estimated by least squares:

*y*n

*t *= 40.93 + 0.253*x**t*

10.1062

+ 0.546*y**t*-1

10.1342

*d *= 1.86

where

*y**t *= expenditure per student, in dollars, on books

*x**t *= disposable income per student, in dollars, after payment of tuition, fees, and room and board

The numbers below the coefficients are the coefficient standard errors.

a. Find a 95% confidence interval for the coefficient on *x**t *in the population regression.

b. What would be the expected impact over time of a $1 increase in disposable income per student on entertainment expenditure?

c. Test the null hypothesis of no autocorrelation in the errors against the alternative of positive autocorrelation.

## 15. Analysis of Variance

**1**. In a study to estimate the effects of drinking alcohol on routine health risk, employees were classified as heavy drinkers, people recently cut back on alcohol, long-term drinkers, and those who never drank alcohol. Samples of 96, 34, 86, and 206 members of these groups were taken. Sample mean numbers of mean health risk rates per month were found to be 2.15, 2.21, 1.47, and 1.69, respectively.

The *F *ratio calculated from these data was 2.56.

a. Prepare the complete analysis of variance table.

b. Test the null hypothesis of equality of the four population mean health risk rates.

**2**. For the two-way analysis of variance model with one observation per cell, write the observation from the ith group and jth block as

Xij = m + Gi + Bj + eij

Refer to Exercise 15.65 and consider the observation on agent B and house 1 1x21 = 2182.

a. Estimate m.

b. Estimate and interpret G2.

c. Estimate and interpret B1.

d. Estimate e21.

## 16. Time-Series Analysis and Forecasting

**1**. In some experiments with several observations per cell the analyst is prepared to assume that there is no interaction between groups and blocks. Any apparent interaction found is then attributed to random error.

When such an assumption is made, the analysis is carried out in the usual way, except that what were previously the interaction and error sums of squares are now added together to form a new error sum of squares. Similarly, the corresponding degrees of freedom are added. If the assumption of no interaction is correct, this approach has the advantage of providing more error degrees of freedom and, hence, more powerful tests of the equality of group and block means.

For the study of Exercise 15.47, suppose that we now make the assumption of no interaction between dormitory ratings and student years.

a. State, in your own words, what is implied by this assumption.

b. Given this assumption, set up the new analysis of variance table.

c. Test the null hypothesis that the population mean ratings are the same for all dormitories.

d. Test the null hypothesis that the population mean ratings are the same for all four student years.

**2**. In a study to estimate the effects of smoking on routine health risk, employees were classified as continuous smokers, recent ex-smokers, long-term ex-smokers, and those who never smoked. Samples of 96, 34, 86, and 206 members of these groups were taken. Sample mean numbers of mean health risk rates per month were found to be 2.15, 2.21, 1.47, and 1.69, respectively.

The *F *ratio calculated from these data was 2.56.

a. Prepare the complete analysis of variance table.

b. Test the null hypothesis of equality of the four population mean health risk rates.

## 17. Additional Topics in Sampling

##### 1. A hospital has 100 members of doctors. Information was obtained from the individuals responsible for managing correspondence in 61 doctors' offices. Of these, 38 specified a minimum number of complaints that must be received on an issue before action is undertaken

a. Assume these observations constitute a random sample from the population, and find a 90% confidence interval for the proportion of all doctors' offices with this policy.

b. In fact, information was not obtained from a random sample of doctor’s offices. Questionnaires were sent to all 100 offices, but only 61 responded. How does this information influence your view of the answer to part (a)?

##### 2. Discuss the advantages and disadvantages of various sampling designs that might be used to select ballots to be recounted in a close election.

**for free**to follow other supporters, see more content and use the tools**for a small donation by becoming a member**to see all content

**Why create an account?**

- Your WorldSupporter account gives you access to all functionalities of the platform
- Once you are logged in, you can:
- Save pages to your favorites
- Give feedback or share contributions
- participate in discussions
- share your own contributions through the 11 WorldSupporter tools

## Add new contribution