Bias can be detrimental for the parameter estimates (1), standard errors and confidence intervals (2) and the test statistics and p-values (3). Outliers and violations of assumptions are forms of bias.
An outlier is a score very different from the rest of the data. They bias parameter estimates and have an impact on the error associated with that estimate. Outliers have a strong effect on the sum of squared errors and this biases the standard deviation.
There are several assumptions of the linear model:
- Additivity and linearity
The scores on the outcome variable are linearly related to any predictors. If there are multiple predictors, their combined effect is best described by adding them together. - Normality
The parameter estimates are influenced by a violation of normality and the residuals of the parameters should be normally distributed. It is normality for each level of the predictor variable that is relevant. Normality is also important for confidence intervals and for null hypothesis significance testing. - Homoscedasticity / homogeneity of variance Homoscedasticity / homogeneity of variance
This impacts the parameters and the null hypothesis significance testing. It means that the variance of the outcome variable should not change between levels of the predictor variable. Violation of this assumption leads to bias in the standard error. - Independence
This assumption means that the errors in the model are not related to each other. The data has to be independent.
The assumption of normality is mainly relevant in small samples. Outliers can be spotted using graphs (e.g. histograms or boxplots). Z-scores can also be used to find outliers.
The P-P plot can be used to look for normality of a distribution. It is the expected z-score of a score against the actual z-score. If the expected z-scores overlap with the actual z-scores, the data will be normally distributed. The Q-Q plot is like the P-P plot but it plots the quantiles of the data instead of every individual score.
Kurtosis and skewness are two measures of the shape of the distribution. Positive values of skewness indicate a lot of scores on the left side of th distribution. Negative values of skewness indicate a lot of scores on the right side of the distribution. The further the value is from zero, the more likely it is that the data is not normally distributed.
Normality can be checked by looking at the z-scores of the skewness and kurtosis. It uses the following formula:


Levene’s test is a one-way ANOVA on the deviation scores. The homogeneity of variance can be tested using Levene’s test or by evaluating a plot of the standardized predicted values against the standardized residuals.
REDUCING BIAS
There are four ways of correcting problems with the data:
- Trim the data
Delete a certain quantity of scores from the extremes. - Winsorizing
Substitute outliers with the highest value that isn’t an outlier. - Apply a robust estimation method
Use bootstrapping. - Transform the data
Apply a mathematical function to scores to correct problems.
Trimming often occurs on a percentage based rule (1) or a standard deviation based rule (2). With bootstrapping, the sample data are treated as a population from which smaller samples are taken. When transforming the data, all data has to be transformed and not only a part of it.