PROC LOGISTIC: Coding 0 and 1

By , April 12, 2010 2:51 pm

The problem of coding 0 and 1 in PROC LOGISTIC
PROC LOGISTIC can be used to run logistic regression on a dichotomous dependent variable. Often, these are coded 0 and 1, with 0 for `no’ or the equivalent, and 1 for `yes’ or the equivalent. In this case, we are usually interested in modeling the probability of a ‘yes’. However, by default, SAS models the probability of a 0 (which would be a `no’).

For example, we might be interested in modeling the presence of a disease, with 0 meaning the person is not infected, and 1 meaning he or she is infected. To keep it simple, I will use one independent variable: sex, code as 1 for female and 0 for male. So:

data today;
input disease female weight;
datalines;
0 0 100
1 0 200
0 1 200
1 1 100
;;;;

we then run PROC LOGISTIC:


proc logistic data = today;
model disease = female;
weight weight;
run;

and get, among other output, an odds ratio estimate of 1.39 for female, while it’s clear that men are much more likely to be infected.

Evidence of a 0-1 coding problem in PROC LOGISTIC
The evidence that this is happening is one line in the output:

Probability modeled is disease=0

and several lines in the log:

NOTE: PROC LOGISTIC is modeling the probability that disease=0.
One way to change this to model the probability that disease=1
is to specify the response variable option EVENT=’1′

Solving 0-1 coding problems in PROC LOGISTIC
There are several solutions. The simplest is not the one mentioned in the log, but rather the DESCENDING option.

proc logistic data = today |descending|;
model disease = female;
weight weight;
run;

Another method is the one mentioned in the log, which is more general:

proc logistic data = today;
model disease|(event = '1')| = female;
weight weight;
run;

Leave a Reply