MVDA -multiple regression analysis

Multivariate Data Analysis

# Week 1: Multiple Regression Analysis

Multivariate means exploring the dynamics between 3 or more variables.

Multiple regression analysis (MRA) can be done when all variables are interval level (e.g. weight, height, IQ score).

The research question for MRA is: can Y be predicted from X1 and/or X2?

## What are the linearity, homoscedasticity and normality of residuals?

Linearity: the relationship between the independent and dependent is linear. Can be tested with scatterplots.

Homoscedasticity: variance of residuals is constant across values predictors. Can be seen on the scatterplot.

Normality: approximate straight line on P-P plot.

## What does multicollinearity mean?

That there is a high intercorrelation between predictors. There is multicollinearity when the tolerance is higher than 0.10 and the VIF is lower than 10.

## Are there outliers, inﬂuential points, or outliers on the predictors?

Outliers: on dependent variable Y: Residuals, between −3 and 3.

Influential points: Cook’s distance smaller than 1.

Outliers on the predictors: on independent variable(s) X: Leverage, smaller than 3(k+1)/n

## What are the null and the alternative hypothesis to test the regression model?

Ho: b*1 =b2 =···=bk =0 (No relation between Y and X1, X2)

Ha: :at least one bj =/= 0

## When can the null hypothesis be rejected?

When the hypothesis of no relation between variables can be rejected (no relation would mean all variables equal 0)

## What are the null and the alternative hypothesis to test the individual coeﬃcients?

H0: b1=0 and Ha = b1=/=0

H0: b2=0 and Ha = b2=/=0

What are the unstandardized and standardized regression equations?

Unstandardized (MRA): b0+b1X1+b2X2

Standardized (MRAst): β1X1st+ β2X2st

How much variance of Y is explained in total by X1 and X2?

The R2 gives the value for the explained data (X1 and X2).

How much variance of Y is uniquely explained by X1? How much variance of Y is uniquely explained by X2? What is the best predictor?

We look at part coefficient. r2x(1.2) and r2y(2.1) gives the answer. Whichever the highest value is, that’s the best predictor.

