How to analyze Likert type dependent variables

By , April 15, 2010 5:15 pm

Suppose your dependent variable (DV) is a Likert scale or something similar. That is, it’s some sort of rating, from 1 to 5 or 1 to 7 or some such. And suppose you want to regress that on several independent variables. What should you do?

There are three broad categories of regression models that might be applicable. A lot of people routinely use linear regression (often simply called regression). Others routinely say this is incorrect, and that you should use ordinal logistic regression. And yet others will do things such as multinomial logistic regression, or collapsing the DV into two categories, and then doing binary logistic. Which is right?

The short answer to this is to quote Sir David Cox

There are no routine statistical questions, only questionable statistical routines

Let’s get more specific. Suppose you are a doctor studying back pain, and suppose your DV is response to a scale:

How much pain are you in on a typical day
1 – None
2 – Barely noticeable
3 – Moderate
4 – Severe
5 – Excruciating

and your independent variables are things like age, sex, injury status, time since injury and so on.

If one is strict about it, linear regression requires a continuous DV – and we do not have one, at least as we’ve measured it, although it could be argued that there is a latent underlying variable here that is continuous. But you’d be hard pressed to prove that the difference between “none” and “barely noticeable”
is the same as that between (say) “moderate” and “severe”. Technically, if you follow Steven’s categories of nominal, ordinal, interval, ratio, your DV is ordinal, and should be analyzed with some form of ordinal logistic regression.

But the most common type (by far) of ordinal logistic regression is the proportional hazards model, which assumes proportional hazards. That assumption might be violated, in which case, you might want to use multinomial logistic.

Since those are relatively unusual methods, some people just collapse the categories into (say) “severe’” or “excruciating” vs. anything less than that.

Which is right?

The great advantages of linear regression are its ease of interpretation and its familiarity. But it might be wrong.
Ordinal logistic is more likely to be correct, but is less known and harder to understand.
Multinomial logistic is even harder to understand, and is a very complex model, with many parameters to estimate.
Collapsing the variable will only very rarely be correct. It throws away information, and that’s rarely a good thing to do.

So, here’s what I recommend:
Do ordinal logistic regression and test the assumptions. Then if the assumptions are met, also do linear and regression and compare the results by making a scatterplot of one set of predicted values vs. the other. If they are very similar (YOU decide. Statistical analysis requires thought and judgment) then go with linear regression. If the assumptions are NOT met, then also do multinomial logistic regression, and compare those two sets of results, opting for the simpler ordinal model if results are very similar.

21 Responses to “How to analyze Likert type dependent variables”

  1. Shawn L says:

    Hi Peter, thanks for the information. I was wondering how to go about regressing likert variables with multinomial logistic as the ordinal regression’s parallel lines assumption is being violated in my data. Despite the quote, is there a standard way to do this testing/reporting or should I just get odds ratios for every single comparison?

  2. Peter Flom says:

    Hi Shawn
    Multinomial logistic regression is a pain to report on – you more or less have to report each OR. But are you sure the assumptions are severely violated? Do the two models make similar predictions?

    Peter

  3. Shawn L says:

    Thanks for the response Peter,
    It looks like the linear and ordinal regressions seem to make reasonably similar predictions for all three of my response variables:

    http://penguooo.com/scattertest.jpg
    http://penguooo.com/scattertest2.jpg
    http://penguooo.com/scattertest3.jpg

    And the predictors are seem to be mostly the same so I should do LR in this case. That saves me a big hassle (hopefully). Since my research is for an undergraduate thesis, I’ll have a lot of space to report diagnostics. Should/how can I cite this procedure (reference this blog post?)?

    Best wishes,
    Shawn

  4. Ramnath Vaidyanathan says:

    Hi Shawn,

    I use ordinal regressions very extensively in my work. Violations of the parallel lines assumption is extremely common and there are basically two ways to overcome this, by using advanced ordinal regression models of which there are two types:

    1. Generalized Threshold Model
    2. Heteroskedastic Ordinal Model

    Both models provide similar results, but the interpretations are different. If you would like to know how to apply these models, here is a link to one of my papers:

    opim.wharton.upenn.edu/~senthilv/papers/seat_value_J.pdf

    Let me know if you find it useful and if you have additional questions.

    Best,
    Ramnath

  5. Peter Flom says:

    Shawn
    Referencing the blog post would be fine

    thanks

    Peter

  6. Peter Flom says:

    Thanks for this Ramnath!

    Peter

  7. Shawn L says:

    Hi Ramnath,
    Thank you very much for sharing your expertise. I’m not quite skilled enough that I feel confident to program and apply either advanced ordinal model within the next two weeks (which is when my thesis is technically due) but I’ll certainly reference your paper in the discussion should I not be able to implement either.

    My initial query would just be the interpretation of my data in reference to the assumptions of the models. My response variables are assessments of participants’ moods and my predictors are fourier coefficients which describe participants’ walking motions.

    So, it turns out that the parallel lines assumption is NOT broken p=.264, if I only evaluate males; but it is when I evaluate females p<.000. Excluding improbable random error, according to what I gather, this means one of three possibilities:

    1. Females are more likely to misrepresent their mood.
    2. Females' walks have nothing to do with their moods.
    3. Females are more varied in mood reporting thresholds while men have similar thresholds.
    4. Something about women who walk a certain way disposes them to a particular range of mood rating.

  8. Shawn L says:

    “Excluding improbable random error, according to what I gather, this means one of three possibilities:”

    Sorry, that should be four possibilities. I tacked one possibility on at the end and forgot to alter the preceding statement haha.

    Best wishes to all,
    Shawn

  9. Shawn L says:

    Oh by the way, when mood is linearly regressed onto the fourier coefficients, a stepwise model accounts for 56.3% of the variance in mood for males while it only accounts for 8.3% of the variance for females.

  10. Tal Galili says:

    Thank you for the post Peter.

    I recently wrote a post about how to present A Correlation scatter-plot matrix for ordered-categorical data (with code on how to do it with R), thought it might be interesting for you and your readers:
    http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/

    Cheers,
    Tal

  11. Keep posting stuff like this i really like it

  12. found your site on del.icio.us today and really liked it.. i bookmarked it and will be back to check it out some more later

  13. MarkSpizer says:

    great post as usual!

  14. Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!

  15. dejja says:

    hello intellectuals! i’m in a position to be crazy of identifying the value of the dependent variable on the Likert type of questionnaire to apply for SPSS analysis. eg. i want to know the value of company performance as expressed by several independent variables. So what should i do? help me!

  16. Peter Flom says:

    Ummm…. that is what the article is about. What is your question exactly?

  17. mic says:

    hi, my question is somewhat like dejja. I created a likert scale survey (Strongly Agree to Strongly Disagree) to analysis the attitude and actions of clients looking for a new home. I asked a several opionion (attitude) questions (and questions about things that they actual do, Actions = (eg. look for house on there own, contact an agent 1st, talk to a lender 1st, etc) In this whole process I thinking that attitude and action are my dependent variables, however, those terms are not used specifically in the question and to do a regression you have to provide a dependent variable. Just what would that be. I am using SPSS, I’m stuck.

  18. Peter Flom says:

    Hi Mic
    You would probably sum the action questions and call that action, and sum the attitude questions and call that attitude. If you wanted to do something fancier, you could do factor analysis first

    Peter

  19. kin says:

    Dear peter,
    I’m confused of the stat. analysis when the dependent variable(consumers’ over all housing preference) is in likert scale and all the five independent variables are in likert scale

  20. Peter Flom says:

    If the DV has only a few levels, you probably want ordinal logistic regression. The independent variables might be treated as continuous or categorical variables; probably continuous

  21. Rachel says:

    Can anyone recommend literature that reports ordinal regression results? In psychology, preferably. Thanks!

Leave a Reply