Description of separation in PROC LOGISTIC
If you picture the data as a 2 x 2 crosstab, then quasi-complete separation occurs when one of the cells is 0. Complete separation occurs when one cell in each row and column is 0.
An example of quasi-complete separation in PROC LOGISTIC
An example of quasi-complete separation is:
input x $ y $ @@;
A C A C A C A C A C
B C B C B C B C B C B C B C B C B C B C
B D B D B D B D B D
Evidence of separation problems in PROC LOGISTIC
It varies. Confidence intervals will be extremely wide. But sometimes there is a warning that there was complete or quasi-complete separation, sometimes a note. Sometimes, you get a warning that `convergence was not attained’. With the weight option you sometimes get a note that `observations with nonpositive frequencies or weights were excluded’. For example, if you run the data step above and then
proc logistic data = today7a;
model y = x;
you get a warning in the log. But if you run this data step:
input x $ y $ weight @@;
A C 5 A D 0 B C 10 B D 5
proc logistic data = today7;
model y = x;
you get a note in the log.
Solutions to separation problems in PROC LOGISTIC
Sometimes, you can delete the offending variable. In the example, there was only one independent variable, but usually, there will be more, and one of them will be the problematic one. Alternatively, if there are multiple categories, it may be sensible to combine some of them. A third possibility is to leave the offending variable in the equation, and simply report the results for the other variables — these are still correct. You can then report the coefficients for the offending variables as . Finally, you can use exact inference with the EXACT option.
Paul Allison’s paper at SGF 2008 has a great deal of lucid explanation of this problem although his statement that PROC LOGISTIC gives clear diagnostic messages is, as we have seen, not always true.