**The problem of coding 0 and 1 in PROC LOGISTIC**

PROC LOGISTIC can be used to run logistic regression on a dichotomous dependent variable. Often, these are coded 0 and 1, with 0 for `no’ or the equivalent, and 1 for `yes’ or the equivalent. In this case, we are usually interested in modeling the probability of a ‘yes’. However, by default, SAS models the probability of a 0 (which would be a `no’).

For example, we might be interested in modeling the presence of a disease, with 0 meaning the person is not infected, and 1 meaning he or she is infected. To keep it simple, I will use one independent variable: sex, code as 1 for female and 0 for male. So:

` data today;`

input disease female weight;

datalines;

0 0 100

1 0 200

0 1 200

1 1 100

;;;;

we then run PROC LOGISTIC:

proc logistic data = today;

model disease = female;

weight weight;

run;

and get, among other output, an odds ratio estimate of 1.39 for female, while it’s clear that men are much more likely to be infected.

**Evidence of a 0-1 coding problem in PROC LOGISTIC**

The evidence that this is happening is one line in the output:

Probability modeled is disease=0

and several lines in the log:

NOTE: PROC LOGISTIC is modeling the probability that disease=0.

One way to change this to model the probability that disease=1

is to specify the response variable option EVENT=’1′

**Solving 0-1 coding problems in PROC LOGISTIC**

There are several solutions. The simplest is not the one mentioned in the log, but rather the DESCENDING option.

`proc logistic data = today |descending|;`

model disease = female;

weight weight;

run;

Another method is the one mentioned in the log, which is more general:

`proc logistic data = today;`

model disease|(event = '1')| = female;

weight weight;

run;

