The Ordinal vs. Interval Debate in Psychology

In this video I describe how psychologists and social scientists often treat ordinal data as if it were interval data, which is a cause of some debate between pure statisticians and more pragmatic researchers. I discuss why this may or may not be appropriate in varying situations, and consider the possible justifications for taking this approach.

Stevens, S.S. (1946) On the Theory of Scales of Measurement, Science, New Series, Vol. 103, No. 2684 (Jun. 7, 1946), pp. 677-680 https://www.fpce.uc.pt/niips/novoplan…

Simon Kemp, Randolph C. Grace, Using ordinal scales in psychology, Methods in Psychology, Volume 5, 2021,100054, ISSN 2590-2601, https://doi.org/10.1016/j.metip.2021….. (https://www.sciencedirect.com/science…)

Video Transcript

Hi, I’m Michael Corayer and this is Psych Exam Review. In the video on scales of measurement or types of data I mentioned that in practice things are not always as clear-cut as they seem in theory. This brings us to a debate in psychology and social sciences when it comes to ordinal versus interval data. We’ll see examples where statisticians would regard the data as ordinal but social scientists would often treat it as if it were interval.

Part of this debate relates to whether psychological variables can even be expressed numerically or not maybe they’re just too personal or too subjective. Can we really measure something like life satisfaction from one to ten? What does it mean for a person to say 7 or 10? And isn’t this based just on their own subjective experiences? Maybe there’s no way to even think about comparing these things numerically and maybe that means we can never talk about something like average life satisfaction.

Or for more fun example we could think about rating attractiveness from one to ten. These numbers would certainly be better than simple nominal categories like hot or not, but what would they really mean? Would the distance from two to three be the same as the distance from nine to ten? Could we talk about the average attractiveness of a room? And certainly there’s going to be personal and subjective elements that go into what you think makes someone a 10.

But what if we had 10,000 people all rate the same person and let’s say they all gave scores between 8 and 10. Well now we could be pretty confident that this person is really attractive, even if there’s some subjectivity to those assessments. And then we could start thinking about maybe if we did this with enough people we could start talking about a meaningful average. Maybe we could have numbers that are accurately representing a psychological variable.

So we could apply this idea to something like the use of Likert scales, which are common in psychological assessment, in economics, and in consumer research, and they measure people’s attitudes or preferences, which are of course subjective. Now generally in a Likert scale we’ll have five options ranging from strongly disagree to strongly agree and of course we don’t have a way to do things like calculate the average with these nominal categories. But what if instead we numbered the responses from one to five? So now instead of saying one participant said strongly agree and one said agree we could say one gave an answer of 5 and one gave an answer of 4. And now we might be tempted to think that we could take the average and say it’s 4.5

Now for an individual item or statement hopefully you’ll see this isn’t really appropriate, but a single item or statement doesn’t make a Likert scale. In order to have a true Likert scale we need to have multiple responses all assessing the same preference or attitude. So now we might think well if the same individual responds to 10 statements maybe I could start thinking about looking at their average score for that particular topic or that particular attitude. And then we might think well even if it doesn’t work perfectly for every individual, maybe if I do this for enough people the differences will start to average out.

So we could think about one person who rarely chooses strongly agree because they see it as very distant from agree and somebody else who chooses it more often because they see it as closer. Then we can think well maybe everybody differs a little bit on that distance, but if we measure enough people we can sort of come up with an average between agree and strongly agree. And an average between neutral and agree etc. And then we might think well if we measure enough people maybe all those distances on average basically end up being the same. And that means now we’re kind of talking about interval data because we have equal distance between all the points. Now it is also going to depend on what traits we’re assessing and whether or not we think they fall on a normal distribution or not but that’s something that we’ll talk about in future topics.

For many traits this is the general assumption that social scientists are willing to make. If they think they have a normal distribution and they have enough items and enough responses then they start thinking about having interval data even if it’s technically just ordinal. This is something that Stevens actually pointed out in his original 1946 paper on scales of measurement where he wrote: “as a matter of fact most of the scales used widely and effectively by psychologists are ordinal scales. In the strictest propriety the ordinary statistics involving means and standard deviations ought not to be used with these scales, for these statistics imply a knowledge of something more than the relative rank order of data. On the other hand for this illegal statisticizing there can be invoked a kind of pragmatic sanction: in numerous instances it leads to fruitful results. While the outlawing of this procedure would probably serve no good purpose it is proper to point out that means and standard deviations computed on an ordinal scale are in error to the extent that these successive intervals on the scale are unequal in size. When only the rank order of data is known we should proceed cautiously with our statistics and especially with the conclusions we draw from them.”

So most social scientists have taken Stevens’s pragmatic sanction and feel justified in treating their data as interval data provided they get fruitful results. Now why would they do this? The answer is that if you treat your data as interval data you can do some more robust analyses that are either impossible or very very difficult to do if you treat it as ordinal data. Now the justification for this really depends on how close we think we are to having equal distances in our data. So how do we assess that?

For this idea that if we have a bunch of responses for something that the small differences will average out we need to make sure that we’re actually measuring the same thing each time. And so this brings us to what’s called Cronbach’s Alpha and this is an assessment of the reliability of the different items on a scale. So we want to see that if I have 10 ways of measuring somebody’s conscientiousness, I want to see if their responses are actually related to each other, if they’re actually reliable. And if they are that suggests that all 10 things really are measuring the same trait. And if they’re all measuring the same trait then we can feel a little safer in making this assumption that small differences will balance out.

Another justification is just to take the time to do the more demanding ordinal statistics and then also do the interval statistics and then see if they’re really all that different. And often what we see is there’s not really a fundamental difference in the outcomes that we get and that means we can feel justified in using the easier and more robust interval statistics because taking the time to do the ordinal ones would probably give us similar results anyway.

Finally there’s an argument that only doing ordinal level statistics is actually losing some of the nuance that is captured in our data. So if we think that our data is better than simply rank order, even if it’s not quite equal interval, then we might feel justified in at least attempting some interval statistics. We could say that by leaving off and only doing ordinal statistics that we’re actually ignoring some robust analyses that have minimal chance of error, in favor of rigid dichotomous thinking about ordinal versus interval data.

So what should we think? Should we be purists or pragmatists? Statistical sticklers or measurement manipulators? Well I don’t really have a definitive answer for this debate it really depends on what it is we’re assessing and how we’re assessing it, who the people are that we’re assessing, and how many of them there are etc. So even though we don’t have a definitive answer we can say that you should be aware enough to know when it might be happening; when you might be mistreating your data or making assumptions that aren’t really justified. And as long as we have that knowledge and awareness of when this might happen, we can help to ensure that we’re treating our data in the most appropriate way.

I hope you found this helpful if so please like the video and subscribe to the channel for more and make sure to check out the hundreds of other psychology tutorials that I have on the channel. Thanks for watching!

Leave a Reply

Your email address will not be published. Required fields are marked *