In this video I explain how to find the precise median for a continuous variable with many repeated scores. I walk through an example of this that can be solved visually, then use this example to derive the formula for an interpolated median.
Video Transcript
Hi, I’m Michael Corayer and this is Psych Exam Review. In the video on central tendency I mentioned that there’s a more precise way of calculating the median other than just taking the mean of the middle two scores. This is only for large data sets of a continuous variable where we have lots of repeated scores near the median. This can create a situation where the precise middle point of our data, where 50 of scores are below and 50 are above, is not quite the same as the mean of the two middle scores. So let’s walk through the conceptual logic of this with an example, then we’ll see if we can derive a formula for calculating this more precise median.
First this only applies to continuous variables that’s because they can be seen as having infinite fractional parts. So if I measure time in whole seconds technically each second could be divided up to an infinite number of decimal places. So let’s say that I measured time in seconds for a task for eight participants and I got the scores 3, 4, 5, 6, 6, 6, 6, 6 and 8. Our simple median method would give us a median of 6, the mean of the two middle scores, 6 and 6. But this actually isn’t the precise middle of our data because we have all these repeated scores at 6. So how do we find this precise point?
This is a case where I think visualizing the data can help us to understand so let’s draw a picture here where we have our data represented on a line and each block will represent a participant’s score in whole seconds. So we have a score at 3 seconds, 4 seconds, 5 seconds, then 4 scores at 6 seconds and one score at 8 seconds. Now our median should be the point where 50% of scores are below and 50% are above. That means we should have four boxes to the left and four boxes to the right. And if we draw a line right at 6 we see that this isn’t quite right. The area of the boxes to the left is larger than the area of the boxes to the right. This tells us that our median is actually somewhere between 5 and 6. So how do we find this point?
First we need to recognize that our scores are in whole seconds that means that a score of 5 technically could be anywhere from 4.5 up to 5.5. This is called the class interval. And we can also see that if we go to 5.5 here we have 3 boxes to the left that tells us that our median must be at least 5.5, but if we go all the way to 6 we’ve gone too far. So our median is going to be somewhere between 5.5 and 6.
So we don’t need all those boxes that are stacked up as 6, we actually just need a portion of them in order to reach four boxes to the left of our line and four boxes to the right. And you might see that since there’s four boxes stacked up at 6 we could just take a quarter of each of those boxes. That would give us four boxes to the left four boxes to the right and so from 5.5, the lower limit of our median, we move up one quarter of a box, one quarter of our class interval, and that’s going to give us a precise median of 5.75.
This precise median can also be called an interpolated median and interpolated here means that you’re finding a value that’s between your known values. So our known values were whole seconds like 5 or 6, and our interpolated median is going to be between those points. Now we’d never actually take the time to do this with most samples because with smaller samples a slight over or underestimate of the median really won’t matter very much. But if we had a really large data set then we might have hundreds or even thousands of scores all stacked up at the median. Now in order to find the precise median there, we wouldn’t want to take the time to draw a diagram and slice up all the boxes. Instead we probably want to use a formula.
So let’s see if we can derive the formula based on what we’ve already done here. What we did was we started at the lower limit of our median, 5.5. We said okay we know we’re not at the median yet but if we go all the way to 6 we’ve gone too far. So we know it’s going to be at least 5.5 plus some fractional portion of these scores that are stacked up at 6. So next we found the fractional portion. We said okay we’re trying to get to four boxes so that’s our n over 2; that’s 8 divided by 2. That’s the midpoint of our data we’re trying to get to. But we already had three boxes to the left so we don’t need to worry about those; we can take those away. We’re really only looking for one more box.
And then in order to get that we need to know, well, how many scores were stacked up at 6? How many boxes are we dividing up? And so we’re going to divide by 4 and then this gives us one-fourth. And then the last question is well, one-fourth of what? So what was our class interval? In our case it was one second so we need to add a quarter of a second and so we add 5.5 plus 0.25 this gives us a total of 5.75 seconds.
So that’s the formula that we used even though we did it visually. So let’s turn it into a real formula. We had the lower limit of the median L, plus the fractional portion, which is n divided by 2 minus the frequency of scores that are below the lower limit of the median. You’ve already counted those, and then you divide that by how many scores you’re dividing up. That’s the frequency of scores at the median, f of m, and then we multiply that times our class interval h.
So this is the formula for the interpolated median. Now you’ll probably never have to calculate this by hand but if you’re using software that gives you this interpolated median hopefully you’ll have a better understanding of where it’s coming from. I hope you found this helpful, if so, let me know in the comments, like the video, subscribe, and don’t forget to check out the hundreds of other psychology tutorials that I have on the channel. Thanks for watching!