Mean Absolute Deviation, Variance, & Standard Deviation

In this video I explain how to use mean deviation, mean absolute deviation, variance, and standard deviation to assess dispersion in interval or ratio level data. I discuss each concept and practice calculating with a sample set of data, and then consider why it’s so important to estimate the parameters of the population.

Video Transcript

Hi, I’m Michael Corayer and this is Psych Exam Review. In this video we’re going to start looking at measures of dispersion that can be applied to interval and ratio level data. Unlike the variation ratio or the interquartile range these measures of dispersion are going to take all of our scores into account and so this is going to give us a more detailed understanding of the spread of our data. And the way that they’re going to do this is by comparing each score to the mean. And so we get a sense of how spread scores are around the mean but we can also get an understanding of how spread out our scores are in relation to one another.

Now in order to do this we’re going to look at a sample of 12 scores here and we’ll practice each of these calculations using this set of scores. So the first thing that we’re going to need to do is know the mean for this set of scores and so we add up all of our scores and then we divide by the number of scores and this is going to give us a mean of 10.

At first glance, when we’re thinking of finding the average deviation from the mean we might think that we can just take the distance from each score to the mean and then add those up and divide by the number of scores. But we’ll run into a problem because we have positive and negative values. So a score of 8 is -2 from the mean and a score of 12 is plus 2 from the mean and so if we treat these as minus two and plus 2 what will happen is they’ll just cancel each other out.

And then we might realize that actually all of our scores will cancel each other out because the mean is the balancing point of the distribution. Now this is more obvious if you have a perfectly symmetrical distribution because every score above the mean has a counterpart below the mean, but even in an asymmetrical distribution the mean is the balancing point. So if you have one extremely high score it’s going to pull the mean in that direction. But if the mean gets pulled in that direction you now have more scores below the mean that are going to cancel out that high score. So no matter what we do we’ll get a mean deviation of 0, and that means we can’t divide by our number of scores and we basically have a useless statistic.

The only real purpose of the mean deviation is to make sure that you’ve done your calculations correctly. If you get a mean deviation that’s not zero then either your mean is wrong or you’ve added the deviations incorrectly. So we can check this with our sample set of data here and we’ll see that if we take the mean of 10 and then we find all the distances from that and we add them up they’ll all cancel out and we get a mean deviation of 0.

So instead of using positive and negative values we might realize that we could use the absolute value of the deviation from the mean. So we take each score’s distance to the mean, regardless of whether it’s positive or negative, we make it positive by using absolute value. Then we add up all of those deviations, we divide by our number of scores, and this will give us what’s called the mean absolute deviation.

So we can try this with our set of scores we take the distance to the mean for each score, we add up all of these deviations, treating them all as positive, then we divide by our number of scores which is 12. So we get 40 divided by 12 and we get a mean absolute deviation of 3.3.

The mean absolute deviation is logical, it’s practical, it’s fairly easy to calculate, and it gives us a very accurate representation of how our scores differ from the mean on average. It seems to be exactly what we’re looking for and yet it’s not commonly reported. It’s not frequently used. And the reason for this is that it’s only about our sample. It’s too specific. It’s just telling us about how each of the scores in our sample compared to the mean of our sample and it doesn’t give us a good sense of the population that the sample is coming from. It doesn’t allow us to make estimates for the population and it doesn’t really allow us to compare scores within our sample. And so for that we’re going to turn to a different calculation. We want to get an estimate of the parameters of the population and for that we’re going to use the variance.

Now before we get to actually calculating variance for our set of scores, I want to go over a few different versions of the formula for variance that you might see. And this depends on whether you’re talking about the variance of the population, the variance of a sample, or if you’re using the variance of a sample to estimate the variance of the population, which is going to be the most common choice.

So let’s start by looking at the formula for the variance of the population, Sigma squared. In this case if we actually have access to the whole population, we can actually measure everyone, then we can find the deviation from mu which is our population mean. We square the deviation and then we add up all of these squared deviations and we divide by n. This will give us the variance of the population. But usually we don’t have access to the full population and that means we don’t even know what mu is. We don’t know the population mean. Instead all we have is the mean of our sample; X bar.

So this brings us to the other equation here for a sample. You can say, well, if we’re calculating a sample then we just replace mu with X bar. So we have S squared equals the sum of squared deviations from X bar divided by n. But it turns out that this is actually going to be an underestimate. When we replace mu with X bar here we have a tendency to underestimate the variance of the population and so this is giving us a biased estimate, and so we generally don’t use this formula.

What we do instead is we use a slightly altered version of it and this allows us to estimate the variance of the population using a sample. So here we see S squared equals the sum of squared deviations from X bar, divided by n minus 1. So because our previous estimate was an underestimate, it was biased to be an underestimate, what we’re going to do is divide by n minus 1. By making our denominator a little bit smaller we increase our overall estimate and this will hopefully bring us closer to the actual variance of the population. So this middle equation here is going to give us what we call an unbiased estimate of the population variance, and this is what we’re most commonly going to use.

That’s because we generally don’t have access to the population, so we can’t use the first version, and we don’t want to use a biased version, which is the second version, and so instead we use this version here to try to get an accurate estimate of the variance in the population. You can also notice a difference in notation here when we’re talking about the actual population variance we can use Sigma squared but in these other cases we’re using an estimate from a sample and so we’re going to use S squared. Now you might see it can be hard to distinguish between the biased and the unbiased estimate and sometimes the biased estimate is indicated with an N here to indicate that you’re dividing by n rather than n minus 1. But most of the time we’re just going to be using this middle version. We’re going to be dividing by n minus 1 in order to get an unbiased estimate of the population variance.

And finally this version of the formula is called the definitional formula but there’s another version called the computational formula but they both give you exactly the same answer. I recommend that you focus on the definitional formula. This one shows that we’re using deviations from the mean in order to estimate the variance and that’s the key concept that I want you to get from this. The computational formula gets to the exact same answer but it does it without calculating all of the deviations so we’ll come back to this later when we talk about analysis of variance. But for now just focus on the definitional formula for calculating the variance.

So we have our formula for an unbiased estimate of the population variance using our sample. So in order to find this, first we’re going to find each score’s deviation from the mean and then we’re going to square that deviation and then we’re going to add up all of those squared deviations. This gives us what’s called a sum of squared deviations or sometimes just a sum of squares. And then we’re going to divide our sum of squared deviations by n minus 1. And this will give us an estimate of the variance for the population.

Now you’ll notice with this formula that rather than using absolute value we’re squaring our deviations from the mean in order to get rid of negatives, because a negative value that’s squared will become a positive value. Many introductory textbooks will tell you that the squaring is done to get rid of negatives and that’s true, but there’s something else to this. When we square the deviations what we’re also doing is we’re placing a greater weight on scores that are farther from the mean.

You’ll notice that if we have a deviation of 1 from the mean when we square that it’s just going to add 1 to our sum of squared deviations. But if we’re 3 points from the mean then we’re going to add 9 to our sum of squared deviations. And if we’re 10 points from the mean we’ll add 100 to our sum of squared deviations. And yet we’re still just going to divide by n minus 1. So the scores that are farther and farther from the mean are adding more to the numerator rather than just the pure distance they are from the mean. We didn’t do this with the mean absolute deviation. In that case, if you’re 1 point away then we add 1 to the numerator if you’re 3 points away we add 3 to the numerator, right? We treat all scores equally. So the question is, why would we want to place a greater weight on scores that are farther from the mean?

The reason that we want to weight scores that are farther from the mean more heavily is that we’re assuming that the population follows a normal distribution. If the population follows a normal distribution then you have lots of scores near the mean and that means those scores are very likely to end up in our sample. But as we move farther and farther from the mean what happens is scores become less and less frequent, and that means they’re less likely to show up in our sample. So if we have some then we want to weight them more heavily because we’re assuming there’s probably other scores out there too, but the chance of all of those showing up in our sample is very low.

So if I have a score that’s 10 points away from the mean, well, there’s probably a score that’s 9 points away from the mean, and there’s probably a score that’s 11 points away from the mean in the population. But the odds of getting all of those in my sample is very unlikely. So instead I’m going to weight that 10 more heavily to try to make up for some of the missing data that’s going to not show up in my sample, but that I think exists in the population. And this is part of why the variance is giving us an estimate for the population, rather than something like the mean absolute deviation which is only telling us about the exact data we have in our sample.

So let’s practice calculating the variance using our set of scores here. So we’re going to take all of our deviations from the mean and square them and then we’re going to get the sum of squared deviations by adding all of those up, and this will give us 160. And then we’re going to divide by n minus 1, which in our case would be 11, and that gives us a variance of about 14.5.

Now right away you might see one of the challenges of the variance. We have a variance of 14.5 and you might ask, “well, what do I do with this? You know? How does it fit in with my data, you know?” It seems to be on a different scale of the data because we did all that squaring so we end up with this larger number. It’s like “okay, so I have a mean of 10, but I have a variance of 14.5. I don’t really know what that means”.

And so to bring this back to terms that we can compare more directly with our mean and with our data we’re going to take the square root of the variance. This is going to give us the standard deviation. So the standard deviation is just the square root of the variance. And so now we’re back in terms that we can compare with our mean or with our other data. So in our case we take the square root of 14.5 and we get a standard deviation of about 3.8. So now we can sort of think about, okay, a mean of 10 and a standard deviation of 3.8 we can sort of make sense of that.

Now we’ll also notice that the standard deviation tends to be larger than the mean absolute deviation. This is because it’s an estimate for the population. So it’s assuming that there’s more variability than just in our sample, whereas the mean absolute deviation is specifically about our sample. And so the mean absolute deviation will tend to be about 80 percent of the size of the standard deviation. Or another way to say that is that the standard deviation will be about 1.25 times the mean absolute deviation. Now it won’t always be exactly that ratio, it’s going to vary depending on the distribution of your data, but that should give you a rough estimate of what to expect for the mean absolute deviation compared to the standard deviation, because the standard deviation is estimating for the population and so it tends to be larger.

A final question that you might have is, “why do we care so much about the population? Why don’t we just stick to analyzing our samples and limit ourselves to that?” And the reason that we care so much about the population, the reason we want to estimate the parameters for the population, is that if we can do that then we can think about the frequency of different scores in the population. And that means we can think about the probability of getting those scores in our samples. And then we think about the probability of getting a sample.

We say “what are the odds of getting a sample that looks like this, based on what we know about the population?” And that’s going to be really important when it comes to comparing samples. So I have two samples and they differ in some way and I want to know, what are the odds that this difference is just random chance from drawing two samples from the population? Versus maybe there was some intervention or some manipulation for one sample and I want to know if it had an effect. Well, in order to assess that I need to know the odds of getting those samples from the population. And that’s why estimating the parameters of the population is so important.

I hope you found this helpful, if so, let me know in the comments, like and subscribe, and make sure to check out the hundreds of other psychology tutorials that I have on the channel. Thanks for watching!

Leave a Reply

Your email address will not be published. Required fields are marked *