Discovering statistics using IBM SPSS statistics
Chapter 20
Categorical outcomes: logistic regression
This summary contains the information from chapter 20.8 and forward, the rest of the chapter is not necessary for the course.
Logistic regression is a model for predicting categorical outcomes from categorical and continuous predictors.
A binary logistic regression is when we’re trying to predict membership of only two categories.
Multinominal is when we want to predict membership of more than two categories.
The linear model can be expressed as: Yi = b0 + b1Xi + errori
b0 is the value of the outcome when the predictors are zero (the intercept).
The bs quantify the relationship between each predictor and outcome.
X is the value of each predictor variable.
One of the assumptions of the linear model is that the relationship between the predictors and outcome is linear.
When the outcome variable is categorical, this assumption is violated.
One way to solve this problem is to transform the data using the logarithmic transformation, where you can express a non-linear relationship in a linear way.
In logistic regression, we predict the probability of Y occurring, P(Y) from known (logtransformed) values of X1 (or Xs).
The logistic regression model with one predictor is:
P(Y) = 1/(1+e –(b0 +b1X1i))
The value of the model will lie between 1 and 0.
You need to test for
- Linearity of the logit
You need to check that each continuous variable is linearly related to the log of the outcome variable.
If this is significant, it indicates that the main effect has violated the assumption of linearity of the logic. - Multicollinearity
This has a biasing effect
Multinomial logistic regression predicts membership of more than two categories.
The model breaks the outcome variable into a series of comparisons between two categories.
In practice, you have to set a baseline outcome category.
Add new contribution