PROC LOGISTIC: Coding 0 and 1
The problem of coding 0 and 1 in PROC LOGISTIC
PROC LOGISTIC can be used to run logistic regression on a dichotomous dependent variable. Often, these are coded 0 and 1, with 0 for `no’ or the equivalent, and 1 for `yes’ or the equivalent. In this case, we are usually interested in modeling the probability of a ‘yes’. However, by default, SAS models the probability of a 0 (which would be a `no’).
For example, we might be interested in modeling the presence of a disease, with 0 meaning the person is not infected, and 1 meaning he or she is infected. To keep it simple, I will use one independent variable: sex, code as 1 for female and 0 for male. So:
data today;
input disease female weight;
datalines;
0 0 100
1 0 200
0 1 200
1 1 100
;;;;
we then run PROC LOGISTIC:
proc logistic data = today;
model disease = female;
weight weight;
run;
and get, among other output, an odds ratio estimate of 1.39 for female, while it’s clear that men are much more likely to be infected.
Evidence of a 0-1 coding problem in PROC LOGISTIC
The evidence that this is happening is one line in the output:
Probability modeled is disease=0
and several lines in the log:
NOTE: PROC LOGISTIC is modeling the probability that disease=0.
One way to change this to model the probability that disease=1
is to specify the response variable option EVENT=’1′
Solving 0-1 coding problems in PROC LOGISTIC
There are several solutions. The simplest is not the one mentioned in the log, but rather the DESCENDING option.
proc logistic data = today |descending|;
model disease = female;
weight weight;
run;
Another method is the one mentioned in the log, which is more general:
proc logistic data = today;
model disease|(event = '1')| = female;
weight weight;
run;