In logistic regression, the goal is the same as in linear regression (link): we wish to model a dependent variable (DV) in terms of one or more independent variables However, OLS regression is for continuous (or nearly continuous) DVs; logistic regression is for DVs that are categorical. When the DV has two categories (e.g., alive/dead; male/female; voted for McCain/Obama), we use dichotomous logistic regression.
WHY LOGISTIC REGRESSION IS NEEDED
One might try to use OLS regression with dichotomous DVs. There are several reasons why this is a bad idea:
1. The residuals cannot be normally distributed (as the OLS model assumes), since they can only take on two values for each combination of level of the IVs
2. The OLS model makes nonsensical predictions, since the DV is not continuous – e.g., it may predict that someone does something more than ‘all the time’.
A VERY QUICK INTRODUCTION TO LOGISTIC REGRESSION
Logistic regression deals with these issues by transforming the DV. Rather than using the categorical responses, it uses the log of the odds ratio of being in a particular category for each combination of values of the IVs. The odds is the same as in gambling, e.g., 3-1 indicates that the event is three times more likely to occur than not. We take the ratio of the odds in order to allow us to consider the effect of the IVs. We then take the log of the ratio so that the final number goes from negative infinity to infinity, so that 0 indicates no effect, and so that the result is symmetric around 0, rather than 1. The log of the odds ratio is known as the logit.