How to analyze Likert type dependent variables
Suppose your dependent variable (DV) is a Likert scale or something similar. That is, it’s some sort of rating, from 1 to 5 or 1 to 7 or some such. And suppose you want to regress that on several independent variables. What should you do?
There are three broad categories of regression models that might be applicable. A lot of people routinely use linear regression (often simply called regression). Others routinely say this is incorrect, and that you should use ordinal logistic regression. And yet others will do things such as multinomial logistic regression, or collapsing the DV into two categories, and then doing binary logistic. Which is right?
The short answer to this is to quote Sir David Cox
There are no routine statistical questions, only questionable statistical routines
Let’s get more specific. Suppose you are a doctor studying back pain, and suppose your DV is response to a scale:
How much pain are you in on a typical day
1 – None
2 – Barely noticeable
3 – Moderate
4 – Severe
5 – Excruciating
and your independent variables are things like age, sex, injury status, time since injury and so on.
If one is strict about it, linear regression requires a continuous DV – and we do not have one, at least as we’ve measured it, although it could be argued that there is a latent underlying variable here that is continuous. But you’d be hard pressed to prove that the difference between “none” and “barely noticeable”
is the same as that between (say) “moderate” and “severe”. Technically, if you follow Steven’s categories of nominal, ordinal, interval, ratio, your DV is ordinal, and should be analyzed with some form of ordinal logistic regression.
But the most common type (by far) of ordinal logistic regression is the proportional hazards model, which assumes proportional hazards. That assumption might be violated, in which case, you might want to use multinomial logistic.
Since those are relatively unusual methods, some people just collapse the categories into (say) “severe’” or “excruciating” vs. anything less than that.
Which is right?
The great advantages of linear regression are its ease of interpretation and its familiarity. But it might be wrong.
Ordinal logistic is more likely to be correct, but is less known and harder to understand.
Multinomial logistic is even harder to understand, and is a very complex model, with many parameters to estimate.
Collapsing the variable will only very rarely be correct. It throws away information, and that’s rarely a good thing to do.
So, here’s what I recommend:
Do ordinal logistic regression and test the assumptions. Then if the assumptions are met, also do linear and regression and compare the results by making a scatterplot of one set of predicted values vs. the other. If they are very similar (YOU decide. Statistical analysis requires thought and judgment) then go with linear regression. If the assumptions are NOT met, then also do multinomial logistic regression, and compare those two sets of results, opting for the simpler ordinal model if results are very similar.
Hi Peter, thanks for the information. I was wondering how to go about regressing likert variables with multinomial logistic as the ordinal regression’s parallel lines assumption is being violated in my data. Despite the quote, is there a standard way to do this testing/reporting or should I just get odds ratios for every single comparison?
Hi Shawn
Multinomial logistic regression is a pain to report on – you more or less have to report each OR. But are you sure the assumptions are severely violated? Do the two models make similar predictions?
Peter
Thanks for the response Peter,
It looks like the linear and ordinal regressions seem to make reasonably similar predictions for all three of my response variables:
http://penguooo.com/scattertest.jpg
http://penguooo.com/scattertest2.jpg
http://penguooo.com/scattertest3.jpg
And the predictors are seem to be mostly the same so I should do LR in this case. That saves me a big hassle (hopefully). Since my research is for an undergraduate thesis, I’ll have a lot of space to report diagnostics. Should/how can I cite this procedure (reference this blog post?)?
Best wishes,
Shawn
Hi Shawn,
I use ordinal regressions very extensively in my work. Violations of the parallel lines assumption is extremely common and there are basically two ways to overcome this, by using advanced ordinal regression models of which there are two types:
1. Generalized Threshold Model
2. Heteroskedastic Ordinal Model
Both models provide similar results, but the interpretations are different. If you would like to know how to apply these models, here is a link to one of my papers:
opim.wharton.upenn.edu/~senthilv/papers/seat_value_J.pdf
Let me know if you find it useful and if you have additional questions.
Best,
Ramnath
Shawn
Referencing the blog post would be fine
thanks
Peter
Thanks for this Ramnath!
Peter
Hi Ramnath,
Thank you very much for sharing your expertise. I’m not quite skilled enough that I feel confident to program and apply either advanced ordinal model within the next two weeks (which is when my thesis is technically due) but I’ll certainly reference your paper in the discussion should I not be able to implement either.
My initial query would just be the interpretation of my data in reference to the assumptions of the models. My response variables are assessments of participants’ moods and my predictors are fourier coefficients which describe participants’ walking motions.
So, it turns out that the parallel lines assumption is NOT broken p=.264, if I only evaluate males; but it is when I evaluate females p<.000. Excluding improbable random error, according to what I gather, this means one of three possibilities:
1. Females are more likely to misrepresent their mood.
2. Females' walks have nothing to do with their moods.
3. Females are more varied in mood reporting thresholds while men have similar thresholds.
4. Something about women who walk a certain way disposes them to a particular range of mood rating.
“Excluding improbable random error, according to what I gather, this means one of three possibilities:”
Sorry, that should be four possibilities. I tacked one possibility on at the end and forgot to alter the preceding statement haha.
Best wishes to all,
Shawn
Oh by the way, when mood is linearly regressed onto the fourier coefficients, a stepwise model accounts for 56.3% of the variance in mood for males while it only accounts for 8.3% of the variance for females.
Thank you for the post Peter.
I recently wrote a post about how to present A Correlation scatter-plot matrix for ordered-categorical data (with code on how to do it with R), thought it might be interesting for you and your readers:
http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/
Cheers,
Tal
found your site on del.icio.us today and really liked it.. i bookmarked it and will be back to check it out some more later
great post as usual!
Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!
hello intellectuals! i’m in a position to be crazy of identifying the value of the dependent variable on the Likert type of questionnaire to apply for SPSS analysis. eg. i want to know the value of company performance as expressed by several independent variables. So what should i do? help me!
Ummm…. that is what the article is about. What is your question exactly?
hi, my question is somewhat like dejja. I created a likert scale survey (Strongly Agree to Strongly Disagree) to analysis the attitude and actions of clients looking for a new home. I asked a several opionion (attitude) questions (and questions about things that they actual do, Actions = (eg. look for house on there own, contact an agent 1st, talk to a lender 1st, etc) In this whole process I thinking that attitude and action are my dependent variables, however, those terms are not used specifically in the question and to do a regression you have to provide a dependent variable. Just what would that be. I am using SPSS, I’m stuck.
Hi Mic
You would probably sum the action questions and call that action, and sum the attitude questions and call that attitude. If you wanted to do something fancier, you could do factor analysis first
Peter
Dear peter,
I’m confused of the stat. analysis when the dependent variable(consumers’ over all housing preference) is in likert scale and all the five independent variables are in likert scale
If the DV has only a few levels, you probably want ordinal logistic regression. The independent variables might be treated as continuous or categorical variables; probably continuous
Can anyone recommend literature that reports ordinal regression results? In psychology, preferably. Thanks!
Dear professionals,
Im working on a research in which i have worked with a survey consits of a dependent variable with likertscale. all my independent variables are also likertscale. My research question is to identify if the summer carnival has any influence on the image of Rotterdam. How could i best do the analysis? which model to use?
Hi Derrick
If your dependent variable is from a single Likert type scale, you probably want ordinal logistic regression.
yes I have used the 5 point likert scale. When I run the normal regression I find out that my data is not normally distributed. My supervisor told me to use the probit model (regressions for categorical and limited dependent variables)in STATA for this model gives the best results. But I dont know how to work with it. Even the manual does not go that far. In the book of long & freez they wrote about the model but how to run it and read it I could not find. I have search the internet for it but even without result. Help!
I don’t use Stata, so I can’t help. Sorry.
Do you know someone who you can recommend that can help me?
Sorry but I don’t. There are probably Stata user groups out there on the net, though
I am analyzing survey data using SPSS. All of my data in categorical. I analyzed some of my data using a one-way ANOVA and have done further ordinal regression analysis using the link logit function. the results from the regression analysis supported the results of the ANOVA…my question relates more to reporting the results and whether I should report one over the other or both.
ANOVA would be used if the dependent variable is continuous. Ordinal regression would be used if the DV is ordinal.
Hello. Good day! Please I am doing a research on factors(7 factors to be precise) which are likely to impact upon mobile banking and its on a 5 point Linkert scale(1 most important …. 5 least important). Please how do I deduce the DV and which regression model should I adopt? More so, do I need to use parametric test for each of the factors? Thanks
You probably want ordinal logistic regression
Hello. So please how do I deduce the DV if I were to make use of ordinal logistic regression? Thanks
You don’t “deduce” the DV. You know it before you start the analysis. In this case, it’s something about mobile banking, I think, but that’s something you would have to tell me, I can’t tell you. The DV is the variable you think “depends” on the other variables.
Good day! Thanks for your responses so far! Perhaps I should start by saying that I am looking at these mobile banking adoption factors(Perceived Trust,Perceived Risk,Technology Failure,Perceived Usefulness,Perceived Cost,Customer Service, and Convenience;with ranking choices of 1-most important to 5-least important). I am interested in formulating hypotheses about them and checking how significant each ranked linkert item(adoption factor,in this instance)is to the usage of mobile banking.What is the necessary test to employ for the hypothesis? More so,do I still need to build a regression model based on these factors(ranked items) given age-group as a DV,how please? Thanks
Hello Everyone,
Thanks for sharing your wisdom. I am also writing a research paper to assess the University research productivity and have similiar issues as discussed above. I am confused over selection of appropriate model for the type of data I have. I have my dependent variables on 5 point likert scale (strongly disagree -strongly agree) as well as binary (yes and no). I have my confusion below:
1. I have 2 binary (yes and no) dependent variables and 2 binary (yes and No) + 25 ordinal (5-point likert, strongly disagree-strongly agree)predictors. In this case is running binary logistic regression appropriate? which other regression is possible.
2. I have 4 ordinal (5-point likert, strongly disagree-strongly agree)dependent variables and 2 binary (yes and No) + 25 ordinal (5-point likert, strongly disagree-strongly agree)predictors. Is running ordinal regression appropriate for this. Which other regression would be more appropriate for my data.
Yes, it probably is.
Hello Peter Flom; Thank you so much for your wisdom. If you look at my variables above, looks like I would have to run both binary logistic regression for binary dependent variables and ordinal logistic regression for ordinal dependent variables.
Is it possible to run my two binary type dependent variables as ordinal logistic regression, or is there any method where I can use both types of variables in one model. I am asking this because looks like it is going to be painful when I have to use two models (binary logistic and ordinal logistic regression) in one paper. Please help
Yes. if you have dichotomous/binary dependent variables you would need (regular) logistic regression.
Can i apply count data model for the out come of the dependent variable is 0,1,2,3,4. The dependent variable is age grade gap of primary school children.
Hi Bereket
If your dependent variable is some sort of grade, you may want ordinal logistic regression.
Peter
Hi! I am new in stat but I need it for my thesis. I hope you can help me. I am looking into the effect of ethnic media use on the construction of ethnic identity. My independent variable is dichotomous: use or non-use of ethnic media and my dependent variable is a 5-point Likert Scale. What statistical test can I use for this? Thanks!
You probably want ordinal logistic regression and you almost surely want more than one independent variable
Peter
Dear Peter Flom,
I want to ask what is the way to find the overall prediction accuracy in the classification table of the ordinal regression output. For the percentage accuracy we look at the diagonal elements but I wanted to find out the overall accuracy.
What do you mean by “overall accuracy”? If I read that phrase I would assume that it meant “percentage accuracy”.
Yes, that is actually to find the total percentage accuracy?
Hello,
Don`t find much literature on interpreting ordinal regression coefficient. How do we usually interpret the negative and positive cofficient on ordinal regression output. What does negative sign indicate?
Hi Kancha – Too long to really answer here; see e.g. my paper on ordinal regression http://www.nesug.org/Proceedings/nesug10/sa/sa03.pdf
Hi Peter, Thanks for sharing your wisdom. Post attends the topic meticulously.
Peter, If my data includes 5 DVs measured on 5-point Likert scale and 5 IVs and few dummy variables, then which type of reg. will suit for this data? Is structural equation model is the option available?
Hi Naveen
I am not a fan of SEM – I think it usually builds a castle on a weak foundation. But I’d need to know a lot more about your study, and it sounds too complex to give a full answer here.
Hi Peter
just advise if I have to tackle 5 DVs measured on likerts scale, do I need to take 5 different reg. equations in ordinal logistic reg.?