Hoorcollege Multiple Linear Regression

Summary and study notes

Welke onderwerpen worden behandeld in het hoorcollege?

Multiple linear regression. A multiple linear regression involves 1 outcome and multiple predictors. It is important to check to what extent the model explains the variation and the slope of the regression line. MLR examines a model where multiple predictors are included to check their unique linear effect on Y. 

The model of MLR. We have an observed outcome which is the score of a participant. We have a predicted score, which is based on the model and some error in prediction. 

Types of variables. The types you can conclude in an MLR are interval and ratio level, this are continuous variables. You can distinguish continuous variables from categorical variables, normal and ordinal. The MLR requires continuous outcome and continuous predictors. But categorical predictors can be included as dummy variables (only as predictor, not as outcome). 

Hierarchical MLR. A hierarchical MLR is based on more models. The first model is to look if the predictors are good. The second model is to look if the additional variables are a good addition as predictor. The question is if the first model if better than the second model. This implies a lot of hypotheses that you can test. For each model you can make a hypothesis if there is a good fit.

Welke onderwerpen worden besproken die niet worden behandeld in de literatuur?

In dit college worden geen andere onderwerpen besproken dit niet worden behandeld in de literatuur.  

Welke recente ontwikkelingen in het vakgebied worden besproken? 

Er worden geen recente ontwikkelingen besproken. 

Welke opmerkingen worden er tijdens het college gedaan door de docent met betrekking tot het tentamen?

De assumpties die gelden voor een MLR worden niet behandeld in het college, maar via Grasple. Eveneens zijn deze assumpties belangrijk om te weten voor het tentamen. 

Welke vragen worden behandeld die gesteld kunnen worden op het tentamen? 

Er worden geen tentamenvragen behandeld. 

Hoorcollege aantekeningen

HC1 - Multiple Linear Regression

The birth order effect

Galton (1874) noticed that the number of firstborns among eminent scientists was remarkably large. Researchers started to study relation birth order with IQ and observed a significant positive relation. But does this mean that being a first born is the reason you have a high IQ? To investigate that you have to critical review how the study is performed. For example, representative sample, reliable measures of variables, correct analyses and interpretations? It is important to realize that association is not the same as causation. There can be an alternative variable that actually caused the relationship. This can be investigated by multiple linear regression. 

Multiple linear regression (MLR)

A linear regression is about adding variables to your model. 

  • Simple linear regression involves 1 outcome (Y) and 1 predictor (X). Outcome = dependent variable. Predictor = independent variable. 
  • Multiple linear regression involves 1 outcome and multiple predictors.

Two things are important to check if the regression is a good analysis:

  • To what extent does the model explain the variation in the data (R2). If the dots are near the line, then the R2 is larger. If the dots are widely from the line (large residuals), then the R2 is smaller. How well fits the data?
  • The slope of the regression line (B1). If the slope is larger, the slope is deep. If the slope is close to 0, the slope is more horizontal. How important is the predictor?

MLR examines a model where multiple predictors are included to check their unique linear effect on Y. Things you need to know about MLR:

1. The model of MLR

We have an observed outcome which is the score of a participant. We have a predicted score, which is based on the model and some error in prediction. The predicted part is the statistical part. A multiple linear regression is sometimes called an additive linear model. 

2. Types of variables

The types you can conclude in an MLR are interval and ratio level, this are continuous variables. You can distinguish continuous variables from categorical variables, normal and ordinal. The MLR requires continuous outcome and continuous predictors. But categorical predictors can be included as dummy variables (only as predictor, not as outcome). Dummy variables have only values 0 and 1. This works because the coefficients of the model (0 and 1) have a very clear interpretation. When you code all the male with 1 and all the women with 0, then the equation for the male is B0 + B1 and for woman B0. The difference between male and women is B1, and this indicates if gender is a strong predictor of your variable. When a categorical predictor has more than two levels, you can create more dummies. If you have 4 levels between your predictor, you need 3 dummies, because you need 1 reference group. 

3. MLR and hierarchical MLR

A hierarchical MLR is based on more models. The first model is to look if the predictors are good. The second model is to look if the additional variables are a good addition as predictor. The question is if the first model if better than the second model. This implies a lot of hypotheses that you can test. For each model you can make a hypothesis if there is a good fit – the predictors of the model do predict Y (R2 = 0 or R2 ¹ 0). The second research question is, is the addition of the variable 3 and 4 an improvement for the model (R2-change = 0 or R2-change ¹ 0). For each predictor is it important to look at the slope, if it’s larger then 0 and if it’s a significant amount (B1 = 0 or B1 ¹ 0). 

  • R2 versus adjusted R2 = sample value versus estimated population value. The R2 of the sample is not an excellent estimate of the population, is it a little too high/optimistic. 
  • R2 versus R2-change = fit of model versus improvement of fit compared to previous model. When R2-change is significant, then the additional variables are a improvement for the model. 
  • B versus Beta = unstandardized versus standardized. Beta ensures that the scales of all the variables are equal and the effects can be compared. Beta tells you which variable is the strongest predictor. 
  • Bivariate correlation (ignores other variables) versus unique contribution within a model (account for the other variables)

Note. R = multiple linear regression coefficient. R2 = explained variation. 

4. Exploration or theory evaluation

Imagine a data set with one outcome and 20 candidate predictors. There are different methods to choose which participants will be used for the model. Researcher A based on his theory, chooses participant 1, 7 and 12 as important predictors for Y (method enter). Researcher B explores all predictors on their contribution of predicting Y and the final model will be based on observed relations in the data set (stepwise method). 

5. The assumptions of MLR are mentioned on Grasple. (Not mentioned in lecture, but important to know for the test.) Statistical inference is based on many assumptions. 

  • Serious violations lead to incorrect results (e.g., wrong p-values, or wrong confidence intervals) 
  • Check model assumptions carefully (see Field and Grasple) 
  • Sometimes there are easy solutions (e.g. deleting a severe outlier; careful reporting is crucial) and sometimes not 

Most of the statistics we discuss are based on theoretical sampling distributions à parametric (that is, with distributional assumptions).  Distribution-free methods also exist: 

  • Non-parametric tests (not part of this course) 
  • Bootstrapping methods (applied and discussed next week) 
Join World Supporter
Join World Supporter
Log in or create your free account

Why create an account?

  • Your WorldSupporter account gives you access to all functionalities of the platform
  • Once you are logged in, you can:
    • Save pages to your favorites
    • Give feedback or share contributions
    • participate in discussions
    • share your own contributions through the 7 WorldSupporter tools
Follow the author: Britt van Dongen
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
vacatures

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

WorldSupporter Resources
Summary and Study Notes - Advanced Research Methods and Statistics (2019/2020 - UU)