PROC LOGISTIC: Reference coding and effect coding

By , April 13, 2010 10:09 am

Description of the problem with effect coding
When you have a categorical independent variable with more than 2 levels, you need to define it with a CLASS statement. In PROC GLM the default coding for this is dummy coding. In PROC LOGISTIC, it’s effect coding. To me, effect coding is quite unnatural.

Effect coding compares each level to the grand mean, and mirrors ANOVA coding; this seems natural to me in ANOVA, but very counter intuitive here.  Reference (or dummy) coding, compares each level to one “reference” level.  I find this easier to understand, and it mirrors what I see in most reports of logistic regression results.
Example of the problem of effect coding
Continuing with the same example of modeling probability of infection, suppose you now have race/ethnicity as an IV, with 6 categories, as defined by the Census Bureau: White, Black/African American, Hispanic/Latino, American Indian/Alaskan Native, Native Hawaiian or other Pacific Islander, and Asian.

proc logistic data = today2;
class race;
model disease(event = '1') = race;
weight weight;
run;

and get parameter estimates that include (among much else):

Parameter df Estimate
Intercept 1 -0.8527
Race AIAN 1 -0.0636
Race AfrA 1 -0.7568
Race Asian 1 0.1595
Race Latino 1 0.5650
Race NHPI 1 0.1565

and OR estimates

Effect Point estimate
race AIAN vs White 1.000
race AfrA vs White 0.500
race Asian vs White 1.250
race Lat vs White 1.875
race NHPI vs White 1.250

but we know that the OR estimate should be  e^{OR}$, and, for example, $e^{-.06} = 0.94 not 1.

Evidence of effect coding problems
The design matrix. With the default, the design matrix looks like this:

AIAN 1 0 0 0 0
AfrA 0 1 0 0 0
Asian 0 0 1 0 0
Latino 0 0 0 1 0
NHPI -1 -1 -1 -1 -1
White 0 0 0 0 0

and each parameter estimate estimates the difference between that level and the average of the other levels.

On the other hand, with dummy (or reference) coding, it looks like

race
AIAN 1 0 0 0 0
AfrA 0 1 0 0 0
Asian 0 0 1 0 0
Latino 0 0 0 1 0
NHPI 0 0 0 0 1
White 0 0 0 0 0

and each parameter estimates the difference between that level and the reference group (in this case, White).

Solution to the effect coding problem in PROC LOGISTIC
Use the param = reference option on the class statement:

proc logistic data = today2;
class race/param = ref;
model disease(event = '1') = race;
weight weight;
run;

For more information (and other possible parameterizations) see the SAS documentation for PROC LOGISTIC, in particular the section CLASS variable parameterization in DETAILS

15 Responses to “PROC LOGISTIC: Reference coding and effect coding”

  1. [...] This post was mentioned on Twitter by Samuel Allende, Peter Flom. Peter Flom said: I updated my website with #SAS PROC LOGISTIC effect coding and reference coding http://ow.ly/1xRul #statistics [...]

  2. Jeremy Miles says:

    Is your effect coding matrix right? I think you’ve missed the -1s.

  3. Peter Flom says:

    Thanks Jeremy, you are right. I will fix that

  4. Paul Swank says:

    I think you should explain the difference between the coding schemes. Effect (or trinary) coding compares each level to the grand mean instead of to a reference category and basically mirrors the ANOVA model.

  5. Peter Flom says:

    Paul: Good point. I will add that.

  6. WP Themes says:

    Amiable post and this fill someone in on helped me alot in my college assignement. Say thank you you as your information.

  7. TomPier says:

    great post as usual!

  8. Keep posting stuff like this i really like it

  9. Logistic says:

    What happens when you don’t specify Param=ref, and merely include the reference level in the CLASS statement? For eg:
    CLASS race (Ref=”White”);

    This document: http://www.nesug.org/proceedings/nesug07/sa/sa11.pdf says that the reference level is then assigned -1s as coefficients. The odds ratios presented are all in reference to this level. Is the interpretation of these odds any different than the odds presented had the Param=ref option also been included in the CLASS statement?

  10. Peter Flom says:

    Hi Jyoti

    Yes, if you don’t include the PARAM = REF coding, you will get different results than if you do include it. That is

    proc logistic data = today2;
    class race (ref = "White");
    model disease(event = '1') = race;
    weight weight;
    run;

    gives different results for the parameter estimates than

    proc logistic data = today2;
    class race/param = ref;
    model disease(event = '1') = race;
    weight weight;
    run;

    but they give the same results for the odds ratios. In the former code, the ORs are not equal to exp(parameter).

    Peter

  11. Claire says:

    Hello,

    I really liked your post.

    I would like to ask you something. Running a logistic regression where the dependent variable can only take two values, does the result change if the event is ‘A’ or ‘B’ as in the efficiency of the model and the interpretation?

    Thank you very much,

    Claire

  12. Peter Flom says:

    Hi Claire

    The signs of the parameters change, and the OR of one is the inverse of the other, but the meaning, efficiency etc. will be identical

    Peter

  13. Aida says:

    Hi Peter,

    Your post is very useful.

    I have a query. I’m building a model using PROC LOGISTIC at the moment and I want to score each of the level in every characteristic without any reference level. Is there any way that I could get the result I want using SAS?

    Thank you

    Aida

  14. Peter Flom says:

    You need to have SOME reference level for a categorical variable.

    Peter

  15. Daniel says:

    Hi Peter,
    Great post, it’s very helpful.

    I have a query. In a logistic model for data with more than one observation per individual the SAS code I’m runnig is able to estimate the parameters for time-variant variables but failure for time-invariant one’s wich are removed, according to SAS notes, “because of its redundancy”. It’s possible to obtain those parameters, wich option shoul I call in the PROC LOGISTIC statement? Than you.

    Daniel

Leave a Reply