## 13. Several Aspects of Regression Analysis

This chapter focusses on topics that add to the understanding of regression analysis. This includes alternative specifications for these models, and what happens in the situations where basic regression assumptions are violated.

## 13.1. Developing models

The goal when developing a model is to approximate the complex reality as close as possible with a relatively simple model, which can then be used to provide insight into reality. It is impossible to represent all of the influences in the real situation in a model, instead only the most influential variables are selected.

Building a statistical model has 4 stages:

1. Model Specification: This step involves the selection of the variables (dependent and independent), the algebraic form of the model, and the required data. In order to this correctly it is important to understand the underlying theory and context for the model. This stage may require serious study and analysis. This step is crucial to the integrity of the model.
2. Coefficient Estimation: This step involves using the available data to estimate the coefficients and/or parameters in the model. The desired values are dependent on the objective of the model. Roughly there are two goals:
1. Predicting the mean of the dependent variable: In this case it is desirable to have a small standard error of the estimate, se. The correlations between independent variables need to be steady, and there needs to be a wide spread for these independent variables (as this means that the prediction variance is small).
2. Estimating one or more coefficients: In this case a number of problems arise, as there is always a trade-off between estimator bias and variance, within which a proper balance must be found. Including an independent variable that is highly correlated with other independent variables decreases bias but increases variance. Excluding the variable decreases variance, but increases bias. This is the case because both these correlations and the spread of the independent variables influence the standard deviation of the slope coefficients, sb.
3. Model Verification: This step involves checking whether the model is still accurate in its portrayal of reality. This is important because simplifications and assumptions are often made while constructing the model, this can lead to the model becoming (too) inaccurate). It is important to examine the regression assumptions, the model specification, and the selected data. If something is wrong here, we return to step

Content