What is multiple regression? – Chapter 11

11.1 What does a multiple regression model look like?
11.2 How do you interpret the coefficient of determination for multiple regression?
11.3 How do you predict the values of multiple regression coefficients?
11.4 How does a statistical model represent interaction effects?
11.5 How do you compare possible regression models?
11.6 How do you calculate the partial correlation?
11.7 How do you compare the coefficients of variables with different units of measurement by using standardized regression coefficients?

11.1 What does a multiple regression model look like?

A multiple regression model has more than one explanatory variable and sometimes also (a) controle variable(s): E(y) = α + β₁x₁ + β₂x₂. The explanatory variables are numbered: x₁, x₂, etc. When an explanatory variable is added, then the equation is extended with β₂x₂. The parameters are α, β₁ and β₂. The y-axis is vertical, x₁ is horizontal and x₂ is perpendicular to x₁. In this three-dimensional graph the multiple regression equation describes a flat surface, called a plane.

A partial regression equation describes only part of the possible observations, only those with a certain value.

In multiple regression a coefficient indicates the effect of an explanatory variable on a response variable, while controlling for other variables. Bivariate regression completely ignores the other variables, multiple regression only brushes them aside for a bit. This is the basic difference between bivariate and multiple regression. The coefficient (like β₁) of a predictor (like x₁) tells what is the change in the mean of y when the predictor is raised by one point, controlling for the other variables (like x₂). In that case, β₁ is a partial regression coefficient. The parameter α is the mean of y when all explanatory variables are 0.

The multiple regression model has its limitations. An association doesn't automatically mean that there is a causal relationship, there may be other factors. Some researchers are more careful and call statistical control 'adjustment'. The regular multiple regression model assumes that there is no statistical interaction and that the slope β doesn't depend on which combination of explanatory variables is formed.

Multiple regression that exists in the population is estimated by the prediction equation : ŷ = a + b₁ x₁ + b₂ x₂+ … + b _p x _p in which p is the number of explanatory variables.

Just like the bivariate model, the multiple regression model uses residuals to measure prediction errors. For a predicted response ŷ and a measured response y, the residual is the difference between them: y – ŷ. The SSE (Sum of Squared Errors/Residual Sum of Squares) is similar as for bivariate models: SSE = Σ (y – ŷ)², the only difference is the fact that the estimate ŷ is shaped by multiple explanatory variables. Multivariate models also use the least squares line, with the smallest possible SSE (which indicates how good or bad ŷ is in estimating y).

To check for linearity, multiple regression is plotted in a scatterplot matrix, a mosaic with scatterplots of the data points of several pairs of variables. Another option is to mark the different pairs in a single scatterplot. Software can create a partial regression plot, also called added-variable plot. This graph compares the residuals of different pairs and shows the relationship between the response variable and the explanatory variable after removing the effects of the other predictors.

11.2 How do you interpret the coefficient of determination for multiple regression?

For multiple regression, the sample multiple correlation, R, is the correlation between the observed and predicted y-values. R is between 0 and 1. When the correlation increases, so does the strength of the association between y and the explanatory variables. Its square, the multiple coefficient of determination, R², measures the proportion of the variance in y that is explained by the predictive power of all explanatory variables. It has elements similar to the bivariate coefficient of determination:

Rule 1: y is predicted no matter what x_p is. Then the best prediction is the sample mean ȳ.
Rule 2: y is predicted by x_p. The prediction equation ŷ = a + b₁x₁ + b₂x₂ + … + b_px_p predicts y.
The multiple coefficient of determination is the proportional limit of the number of errors: R² = (TSS – SSE) / TSS in which TSS = Σ (y – ȳ)² and SSE = Σ (y – ŷ)².

Software like SPSS shows the output in an ANOVA table. The TSS is listed behind Total, under Sum of Squares and the SSE behind Residual, under Sum of Squares.

Characteristics of R-squared are:

R² is between 0 and 1.
When SSE = 0, then R² = 1 and the predictions are perfect.
When b₁, b₂, …, b_p = 0 then R² = 0.
When R² increases, the explanatory variables predict y better.
R² can't decrease when explanatory variables are added.
R² is at least as big as the r²-values for the separate bivariate models.
R² usually overestimates the population value, so software also offers an adjusted R2.

In case there are already a lot of strongly correlated explanatory variables, then R² changes little for adding another explanatory variable. This is called multicollinearity. Problems with multicollinearity are smaller for larger samples. Ideally the sample is at least ten times the size of the number of explanatory variables.

11.3 How do you predict the values of multiple regression coefficients?

Significance tests for multiple regression can either check whether the collective of explanatory variables is related to y, or check whether the individual explanatory variables significantly effect y. In a collective significance test H₀ : β₁ = β₂ = … = β_p = 0 and H_a : (at least one of) β_i ≠ 0 (i means any). This test measures whether the multiple correlation of the population is 0 or something else. The F-distribution is used for this significance test, resulting in the test statistic F:

$F = \frac{R^2 / p}{(1-R^2) / [n - (p+1)]}$

In this p is the number of predictors (explanatory variables). The F-distribution only has positive values, is skewed to the right and averages at 1. The bigger R², the bigger F and the bigger the evidence against H₀.

The F-distribution depends on two kinds of degrees of freedom: df₁ = p (the number of predictors) and df₂ = n – (p + 1). SPSS indicates F separately in the ANOVA table and P under Sig. (in R under p-value, in Stata under Prob > F and in SAS under Pr > F).

A significance test whether an individual explanatory variable (x_i) has a partial effect on y, tests whether H₀ : β _i = 0 or H_a : β_i ≠ 0. The confidence interval for β_i is b_i ± t(se) in which t = b_i / se. In case of multicollinearity the separate P-values may not indicate correlations, while a collective significance test would clearly indicate a correlation.

For controlled explanatory variables, the conditional standard deviation is estimated by:

$s = \sqrt{\frac{SSE}{df}}$

Software also calculates the conditional variance, called the error mean square (MSE) or residual mean square.

An alternative calculation for F uses the mean squares from the ANOVA table in SPSS. Then F = regression mean square / MSE in which regression mean square = regression sum of squares (in SPSS) / df₁.

The t-distribution and the F-distribution are related, but F lacks information about the direction of an association and F is not appropriate for onesided alternative hypothesis.

11.4 How does a statistical model represent interaction effects?

Statistical interaction often happens in multiple regression: the interaction between x₁ and x₂ and their effect on y when the actual effect of x₁ on y changes for different x₂-values. A model using cross-product terms shows this interaction: E(y) = α + β₁x₁ + β₂x₂ + β₃x₁x₂. A significance test with a null hypothesis H₀ : β₃ = 0 shows whether there is interaction. For little interaction, the cross-product term is better left out. For much interaction, it doesn't make sense anymore to do significance test for the other explanatory variables.

Coefficients often have limited use because they only indicate the effect of a variable when the other variables are constant. Coefficients become more useful by centering them around 0 by subtracting the mean. It is indicated by the symbol C:

$E(y)=\alpha+\beta_1x_1^C+\beta_2x_2^C+\beta_3x_1^Cx_2^C$

Now the coefficient of x₁ (so β₁) shows the effect of x₁ when x₂ is at its mean. These effects are similar to the effects in a model without interaction. The advantages of centering are that the estimates of x₁ and x₂ give more information and that the standard errors are similar to those of a model without interaction.

11.5 How do you compare possible regression models?

Reduced models (showing only some variables) can be better than complete models (showing all variables). For a complete model E(y) = α + β₁x₁ + β₂x₂ + β₃x₃ + β₄x₁x₂ + β₅x₁x₃ + β₆x₂x₃ , the reduced version is: E(y) = α + β₁x₁ + β₂x₂ + β₃x₃. The null hypothesis says that the models are identical: H₀ : β₄ = β₅ = β₆ = 0.

A comparison method is to subtract the complete model SSE (SSE_c) from the reduced model SSE (SSE_r). Because the reduced model is more limited, its SSE will always be bigger and be a less accurate estimate of reality. Another comparison method subtracts the different R²-values. The equations are:

$F = \frac{(SSE_r-SSE_c)/df_1}{SSE_c/df_2}=\frac{(R_c^2-R_r^2)/df_1}{(1-R_c^2)/df_2}$

Df₁ are the number of extra predictors in the complete model and df₂ are the other degrees of freedom. A big difference in SSE or a big R² means a bigger F and smaller P, so more evidence against H₀.

11.6 How do you calculate the partial correlation?

The partial correlation is the strength of the association between y and the explanatory variable x₁ while controlling for x₂:

$r_{yx_1.x_2}=\frac{r_{yx_1}-r_{yx_2}r_{x_1x_2}}{\sqrt{(1-r_{yx_2}^2)(1-r_{x_1x_2}^2)}}$

In the partial correlation r_yx1.x2 , the variable on the right side of the dot is the control variable. A first order partial correlation has one control variable, a second order partial correlation has two. The characteristics are similar to regular correlations; the value is between -1 and 1 and the bigger it is, the stronger the association.

The partial correlation also has a squared version:

$r_{yx_2.x_1}^2=\frac{R^2-r_{yx_1}^2}{1-r_{yx_1}^2}$

The squared partial correlation is the proportion of the variance in y that is explained by x₁. The variance in y exists of a part explained by x₁, a part explained by x₂, and a part that is not explained by these variables. The combination of the parts explained by x₁ and x₂ is R². Also when more variables are added, R² is the part of the variance in y that is explained.

11.7 How do you compare the coefficients of variables with different units of measurement by using standardized regression coefficients?

The standardized regression coefficient (β*₁, β*₂, etc) is the change in the mean of y for an added 1 standard deviation, measured in standard deviations instead of other units of measurement. The other explanatory variables are controlled. This compares whether an increase in x₁ has a bigger effect on y than an increase in x₂. The standardized regression coefficient is estimated by standardizing the regular coefficients:

$b_1^*=b_1(\frac{s_{x_1}}{s_y})$

In this, s_y is the sample standard deviation of y and s_x1 is the sample standard deviation of an explanatory variable. In SPSS and other software, the standardized regression coefficients are sometimes called BETA (beta weights). Just like the correlation, they indicate the strength of an association, but in a comparative way. When the value exceeds 1, the explanatory variables are highly correlated.

For a variable y the z_y is the standardized version; the version expressed in the number of standard deviations. When z_y = (y – ȳ) / s_y, then its estimate is: ẑ_y = (ŷ – ȳ) / s_y. The prediction equation estimates how far an observation falls from the mean, measured in standard deviations:

$\hat{z}_y=b_1^*z_{x_1}+b_2^*z_{x_2}+...+b_p^*z_{x_p}$

Access:

Public

Join: WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant and sustainable world. Through physical and online platforms, it supports personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.