Bullets Statistics for Business and Economics

12. Multiple Regression

  • Regression objectives are either to predict the value of the dependent variable, or to estimate the marginal effect of each independent variable.
  • A population multiple regression model is a model that includes multiple independent variables.
  • Standard multiple regression assumptions include the four standard simple regression assumptions, plus a fifth one: It is not possible to find a set off nonzero numbers such that the sum of the coefficients equals zero.
  • Multiple regression models include an error term, ε, that represents variability caused by variables not included in the model.
  • In multiple regression coefficients are estimated using least squares, but these estimates become less reliable the higher the correlations between independent variables are.
  • Any regression coefficient in a multiple regression model is dependent on all independent variables, and are thus referred to as conditional coefficients.
  • Mean square regression (MSR) shows the proportion of the variability by the dependent variable that can be explained by the regression model.
  • In a multiple regression model the sum-of-squares (SST; or sample variability) can be split into the sum of squares regression (SSR; or explained variability) and the sum of squares error (SSE; or unexplained variability). This is referred to as sum-of-squares decomposition.
  • The coefficient of determination, R2, describes the strength of the linear relationship between the independent variables and the dependent variables, and is calculated by 1 – SSE/SST.
  • Adding more independent variables leads to a misleading increase in R2, which can be avoided by calculating the adjusted coefficient of determination.
  • The coefficient variance estimator, s2b, is calculated as:
    The square root of s2b is the coefficient standard error.
  • Multiple regression models can be transformed into non-linear models, namely quadratic models and logarithmic models.
  • Dummy variables can be used to represent categorical data in a regression model, and have a value of either 0 or 1.


13. Additional Topics in Regression Analysis

  • Models are developed through four steps: model specification (selecting the variables, the algebraic form, and the data), coefficient estimation, model verification (checking whether the model is still accurate), and interpretation and inference.
  • Dummy variables can be used to represent more than two categories by using multiple dummy variables. The rule is: number of categories -1 = number of dummy variables.
  • In time series data the values of the dependent variable are related, this is then referred to
