How to analyze Likert type dependent variables

By , April 15, 2010 5:15 pm

Suppose your dependent variable (DV) is a Likert scale or something similar. That is, it’s some sort of rating, from 1 to 5 or 1 to 7 or some such. And suppose you want to regress that on several independent variables. What should you do?

There are three broad categories of regression models that might be applicable. A lot of people routinely use linear regression (often simply called regression). Others routinely say this is incorrect, and that you should use ordinal logistic regression. And yet others will do things such as multinomial logistic regression, or collapsing the DV into two categories, and then doing binary logistic. Which is right?

The short answer to this is to quote Sir David Cox

There are no routine statistical questions, only questionable statistical routines

Let’s get more specific. Suppose you are a doctor studying back pain, and suppose your DV is response to a scale:

How much pain are you in on a typical day
1 – None
2 – Barely noticeable
3 – Moderate
4 – Severe
5 – Excruciating

and your independent variables are things like age, sex, injury status, time since injury and so on.

If one is strict about it, linear regression requires a continuous DV – and we do not have one, at least as we’ve measured it, although it could be argued that there is a latent underlying variable here that is continuous. But you’d be hard pressed to prove that the difference between “none” and “barely noticeable”
is the same as that between (say) “moderate” and “severe”. Technically, if you follow Steven’s categories of nominal, ordinal, interval, ratio, your DV is ordinal, and should be analyzed with some form of ordinal logistic regression.

But the most common type (by far) of ordinal logistic regression is the proportional hazards model, which assumes proportional hazards. That assumption might be violated, in which case, you might want to use multinomial logistic.

Since those are relatively unusual methods, some people just collapse the categories into (say) “severe’” or “excruciating” vs. anything less than that.

Which is right?

The great advantages of linear regression are its ease of interpretation and its familiarity. But it might be wrong.
Ordinal logistic is more likely to be correct, but is less known and harder to understand.
Multinomial logistic is even harder to understand, and is a very complex model, with many parameters to estimate.
Collapsing the variable will only very rarely be correct. It throws away information, and that’s rarely a good thing to do.

So, here’s what I recommend:
Do ordinal logistic regression and test the assumptions. Then if the assumptions are met, also do linear and regression and compare the results by making a scatterplot of one set of predicted values vs. the other. If they are very similar (YOU decide. Statistical analysis requires thought and judgment) then go with linear regression. If the assumptions are NOT met, then also do multinomial logistic regression, and compare those two sets of results, opting for the simpler ordinal model if results are very similar.

67 Responses to “How to analyze Likert type dependent variables”

  1. Shawn L says:

    Hi Peter, thanks for the information. I was wondering how to go about regressing likert variables with multinomial logistic as the ordinal regression’s parallel lines assumption is being violated in my data. Despite the quote, is there a standard way to do this testing/reporting or should I just get odds ratios for every single comparison?

  2. Peter Flom says:

    Hi Shawn
    Multinomial logistic regression is a pain to report on – you more or less have to report each OR. But are you sure the assumptions are severely violated? Do the two models make similar predictions?

    Peter

  3. Shawn L says:

    Thanks for the response Peter,
    It looks like the linear and ordinal regressions seem to make reasonably similar predictions for all three of my response variables:

    http://penguooo.com/scattertest.jpg
    http://penguooo.com/scattertest2.jpg
    http://penguooo.com/scattertest3.jpg

    And the predictors are seem to be mostly the same so I should do LR in this case. That saves me a big hassle (hopefully). Since my research is for an undergraduate thesis, I’ll have a lot of space to report diagnostics. Should/how can I cite this procedure (reference this blog post?)?

    Best wishes,
    Shawn

  4. Ramnath Vaidyanathan says:

    Hi Shawn,

    I use ordinal regressions very extensively in my work. Violations of the parallel lines assumption is extremely common and there are basically two ways to overcome this, by using advanced ordinal regression models of which there are two types:

    1. Generalized Threshold Model
    2. Heteroskedastic Ordinal Model

    Both models provide similar results, but the interpretations are different. If you would like to know how to apply these models, here is a link to one of my papers:

    opim.wharton.upenn.edu/~senthilv/papers/seat_value_J.pdf

    Let me know if you find it useful and if you have additional questions.

    Best,
    Ramnath

  5. Peter Flom says:

    Shawn
    Referencing the blog post would be fine

    thanks

    Peter

  6. Peter Flom says:

    Thanks for this Ramnath!

    Peter

  7. Shawn L says:

    Hi Ramnath,
    Thank you very much for sharing your expertise. I’m not quite skilled enough that I feel confident to program and apply either advanced ordinal model within the next two weeks (which is when my thesis is technically due) but I’ll certainly reference your paper in the discussion should I not be able to implement either.

    My initial query would just be the interpretation of my data in reference to the assumptions of the models. My response variables are assessments of participants’ moods and my predictors are fourier coefficients which describe participants’ walking motions.

    So, it turns out that the parallel lines assumption is NOT broken p=.264, if I only evaluate males; but it is when I evaluate females p<.000. Excluding improbable random error, according to what I gather, this means one of three possibilities:

    1. Females are more likely to misrepresent their mood.
    2. Females' walks have nothing to do with their moods.
    3. Females are more varied in mood reporting thresholds while men have similar thresholds.
    4. Something about women who walk a certain way disposes them to a particular range of mood rating.

  8. Shawn L says:

    “Excluding improbable random error, according to what I gather, this means one of three possibilities:”

    Sorry, that should be four possibilities. I tacked one possibility on at the end and forgot to alter the preceding statement haha.

    Best wishes to all,
    Shawn

  9. Shawn L says:

    Oh by the way, when mood is linearly regressed onto the fourier coefficients, a stepwise model accounts for 56.3% of the variance in mood for males while it only accounts for 8.3% of the variance for females.

  10. Tal Galili says:

    Thank you for the post Peter.

    I recently wrote a post about how to present A Correlation scatter-plot matrix for ordered-categorical data (with code on how to do it with R), thought it might be interesting for you and your readers:
    http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/

    Cheers,
    Tal

  11. found your site on del.icio.us today and really liked it.. i bookmarked it and will be back to check it out some more later

  12. MarkSpizer says:

    great post as usual!

  13. Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!

  14. dejja says:

    hello intellectuals! i’m in a position to be crazy of identifying the value of the dependent variable on the Likert type of questionnaire to apply for SPSS analysis. eg. i want to know the value of company performance as expressed by several independent variables. So what should i do? help me!

  15. Peter Flom says:

    Ummm…. that is what the article is about. What is your question exactly?

  16. mic says:

    hi, my question is somewhat like dejja. I created a likert scale survey (Strongly Agree to Strongly Disagree) to analysis the attitude and actions of clients looking for a new home. I asked a several opionion (attitude) questions (and questions about things that they actual do, Actions = (eg. look for house on there own, contact an agent 1st, talk to a lender 1st, etc) In this whole process I thinking that attitude and action are my dependent variables, however, those terms are not used specifically in the question and to do a regression you have to provide a dependent variable. Just what would that be. I am using SPSS, I’m stuck.

  17. Peter Flom says:

    Hi Mic
    You would probably sum the action questions and call that action, and sum the attitude questions and call that attitude. If you wanted to do something fancier, you could do factor analysis first

    Peter

  18. kin says:

    Dear peter,
    I’m confused of the stat. analysis when the dependent variable(consumers’ over all housing preference) is in likert scale and all the five independent variables are in likert scale

  19. Peter Flom says:

    If the DV has only a few levels, you probably want ordinal logistic regression. The independent variables might be treated as continuous or categorical variables; probably continuous

  20. Rachel says:

    Can anyone recommend literature that reports ordinal regression results? In psychology, preferably. Thanks!

  21. Derrick says:

    Dear professionals,
    Im working on a research in which i have worked with a survey consits of a dependent variable with likertscale. all my independent variables are also likertscale. My research question is to identify if the summer carnival has any influence on the image of Rotterdam. How could i best do the analysis? which model to use?

  22. Peter Flom says:

    Hi Derrick
    If your dependent variable is from a single Likert type scale, you probably want ordinal logistic regression.

  23. Derrick says:

    yes I have used the 5 point likert scale. When I run the normal regression I find out that my data is not normally distributed. My supervisor told me to use the probit model (regressions for categorical and limited dependent variables)in STATA for this model gives the best results. But I dont know how to work with it. Even the manual does not go that far. In the book of long & freez they wrote about the model but how to run it and read it I could not find. I have search the internet for it but even without result. Help!

  24. Peter Flom says:

    I don’t use Stata, so I can’t help. Sorry.

  25. Derrick says:

    Do you know someone who you can recommend that can help me?

  26. Peter Flom says:

    Sorry but I don’t. There are probably Stata user groups out there on the net, though

  27. Sadie says:

    I am analyzing survey data using SPSS. All of my data in categorical. I analyzed some of my data using a one-way ANOVA and have done further ordinal regression analysis using the link logit function. the results from the regression analysis supported the results of the ANOVA…my question relates more to reporting the results and whether I should report one over the other or both.

  28. Peter Flom says:

    ANOVA would be used if the dependent variable is continuous. Ordinal regression would be used if the DV is ordinal.

  29. Dan Aniz says:

    Hello. Good day! Please I am doing a research on factors(7 factors to be precise) which are likely to impact upon mobile banking and its on a 5 point Linkert scale(1 most important …. 5 least important). Please how do I deduce the DV and which regression model should I adopt? More so, do I need to use parametric test for each of the factors? Thanks

  30. Peter Flom says:

    You probably want ordinal logistic regression

  31. Dan Aniz says:

    Hello. So please how do I deduce the DV if I were to make use of ordinal logistic regression? Thanks

  32. Peter Flom says:

    You don’t “deduce” the DV. You know it before you start the analysis. In this case, it’s something about mobile banking, I think, but that’s something you would have to tell me, I can’t tell you. The DV is the variable you think “depends” on the other variables.

  33. Dan Aniz says:

    Good day! Thanks for your responses so far! Perhaps I should start by saying that I am looking at these mobile banking adoption factors(Perceived Trust,Perceived Risk,Technology Failure,Perceived Usefulness,Perceived Cost,Customer Service, and Convenience;with ranking choices of 1-most important to 5-least important). I am interested in formulating hypotheses about them and checking how significant each ranked linkert item(adoption factor,in this instance)is to the usage of mobile banking.What is the necessary test to employ for the hypothesis? More so,do I still need to build a regression model based on these factors(ranked items) given age-group as a DV,how please? Thanks

  34. Ramesh Thapa says:

    Hello Everyone,

    Thanks for sharing your wisdom. I am also writing a research paper to assess the University research productivity and have similiar issues as discussed above. I am confused over selection of appropriate model for the type of data I have. I have my dependent variables on 5 point likert scale (strongly disagree -strongly agree) as well as binary (yes and no). I have my confusion below:

    1. I have 2 binary (yes and no) dependent variables and 2 binary (yes and No) + 25 ordinal (5-point likert, strongly disagree-strongly agree)predictors. In this case is running binary logistic regression appropriate? which other regression is possible.

    2. I have 4 ordinal (5-point likert, strongly disagree-strongly agree)dependent variables and 2 binary (yes and No) + 25 ordinal (5-point likert, strongly disagree-strongly agree)predictors. Is running ordinal regression appropriate for this. Which other regression would be more appropriate for my data.

  35. Peter Flom says:

    Yes, it probably is.

  36. Ramesh Thapa says:

    Hello Peter Flom; Thank you so much for your wisdom. If you look at my variables above, looks like I would have to run both binary logistic regression for binary dependent variables and ordinal logistic regression for ordinal dependent variables.

    Is it possible to run my two binary type dependent variables as ordinal logistic regression, or is there any method where I can use both types of variables in one model. I am asking this because looks like it is going to be painful when I have to use two models (binary logistic and ordinal logistic regression) in one paper. Please help

  37. Peter Flom says:

    Yes. if you have dichotomous/binary dependent variables you would need (regular) logistic regression.

  38. Bereket.Y says:

    Can i apply count data model for the out come of the dependent variable is 0,1,2,3,4. The dependent variable is age grade gap of primary school children.

  39. Peter Flom says:

    Hi Bereket

    If your dependent variable is some sort of grade, you may want ordinal logistic regression.
    Peter

  40. Rose H. says:

    Hi! I am new in stat but I need it for my thesis. I hope you can help me. I am looking into the effect of ethnic media use on the construction of ethnic identity. My independent variable is dichotomous: use or non-use of ethnic media and my dependent variable is a 5-point Likert Scale. What statistical test can I use for this? Thanks! :)

  41. Peter Flom says:

    You probably want ordinal logistic regression and you almost surely want more than one independent variable

    Peter

  42. Ramesh Thapa says:

    Dear Peter Flom,

    I want to ask what is the way to find the overall prediction accuracy in the classification table of the ordinal regression output. For the percentage accuracy we look at the diagonal elements but I wanted to find out the overall accuracy.

  43. Peter Flom says:

    What do you mean by “overall accuracy”? If I read that phrase I would assume that it meant “percentage accuracy”.

  44. Ramesh Thapa says:

    Yes, that is actually to find the total percentage accuracy?

  45. Kancha says:

    Hello,

    Don`t find much literature on interpreting ordinal regression coefficient. How do we usually interpret the negative and positive cofficient on ordinal regression output. What does negative sign indicate?

  46. Peter Flom says:

    Hi Kancha – Too long to really answer here; see e.g. my paper on ordinal regression http://www.nesug.org/Proceedings/nesug10/sa/sa03.pdf

  47. Naveen says:

    Hi Peter, Thanks for sharing your wisdom. Post attends the topic meticulously.

  48. Naveen says:

    Peter, If my data includes 5 DVs measured on 5-point Likert scale and 5 IVs and few dummy variables, then which type of reg. will suit for this data? Is structural equation model is the option available?

  49. Peter Flom says:

    Hi Naveen

    I am not a fan of SEM – I think it usually builds a castle on a weak foundation. But I’d need to know a lot more about your study, and it sounds too complex to give a full answer here.

  50. Naveen says:

    Hi Peter
    just advise if I have to tackle 5 DVs measured on likerts scale, do I need to take 5 different reg. equations in ordinal logistic reg.?

Leave a Reply

Panorama Theme by Themocracy