How Are Moments Used in Statistics?

In this video I explain how moments are used in statistics in order to describe the characteristics of a distribution. The first four moments help to tell us about the mean, variance, skew, and kurtosis of a distribution and allow us to make comparison between distributions. I define raw or crude moments and how changing reference points can provide centralized and standardized moments to describe the shape of a distribution.

Video Transcript

Hi I’m Michael Corayer and this is Psych Exam Review. In the videos on skew and kurtosis, I mentioned that these are related to the third and fourth moments of a distribution. So you might have been, wondering what are moments? In this video we’re going to get an overview of what moments are and we’re going to see how they’re used in statistics in order to get a sense of the shape of a distribution.

So moments are used in other disciplines like physics and if you search online about moments you’ll probably come across some additional notation using integrals, but we’re not going to focus on this. We’re going to just focus on the formulas that are used when calculating these moments for a sample set of scores. So we can think of each moment as answering a different question about the distribution; telling us something new about the shape. And so we’re going to look at the first four moments and these will relate to the mean, the variance, the skew, and the kurtosis and we’ll see how each of these tells us something different about the shape of the distribution.

So let’s look at our first four moments here and for each of these what we’re going to do is we’re going to think about the question it addresses related to the shape of the distribution. We’re going to think about the reference point that we might want to use for it, and then we’re going to look at the formula if we were calculating it for a population and then the adjustments we need to make if we were calculating it for a sample, which of course, would be the most common situation. So if we start by looking at these raw or crude moments we can see that the first moment is the expected value of x, the second moment is the expected value of x squared, the third moment is the expected value of x cubed and the fourth moment is the expected value of x to 4th. But as we’ll see for most of these we’re not going to use the raw or crude moment. Instead we’re going to make some adjustments to it in order to make it more applicable to our data.

So we’ll start with our first moment here, which we said is the expected value of x. So the question that this is addressing related to the distribution is; where is the center? So if we want to know the expected value of x what we really want to know is, what’s the average, and that’s going to be in reference to zero. So how far are scores from zero on average? And as we know that’s the mean. And so to calculate the mean what we’re doing is we’re adding up each score’s difference from zero and dividing by n and for a population this will give us mu. But generally we don’t have the full population, but the formula for calculating this for a sample is exactly the same.But in this case we call this x-bar, the mean of our sample. And that’s our first moment. So the first moment tells us about the mean; it tells us the center of the distribution.

And you’ll notice here that in this case we are actually using the raw or crude moment. We’re just using the expected value of x and that’s because we actually do want it to be in reference to 0. And we’ll see for the subsequent moments that we’re going to be changing our reference point.

So if we look at our raw or crude second moment we can say it’s the expected value of x squared. And so for the raw or crude version we would just square each value of x and then add them up and divide by n. But what we’ll see is that’s not really what we want. The question we want to know about the distribution is; how spread are the scores? And this refers to the variance. But an important thing to note here is we don’t want to know their spread in relation to 0; we want to change our reference point or shift our origin. We want to think about the spread of scores around the mean or about the mean. And so we’re not looking for the raw or crude second moment, we’re actually looking for the central or centralized second moment. And so in order to find this, first we have to compare each value of x to the mean, and then we can think about squaring and finding the average.

And so in the case of a population we would compare each score to mu, we would square it and add up all those deviations, and then we would divide by n. But the problem that we have for a sample is that we don’t know mu. We only have x-bar and when we use x-bar, when we substitute x-bar for mu, this introduces some bias and it gives us a tendency to underestimate. And so we’re also going to make an adjustment to the denominator and divide by n minus one rather than n. And if you want more information about this switch to using n minus 1, which is known as Bessel’s correction, I’ll put a link to a video where explain this in more detail.

And so our third raw or crude moment would be the expected value of x cubed. And so for the raw version we would just cube each value of x add them all up and divide by n. But you might guess that this is probably not what we actually want. Again we’re going to be changing our reference point. What we want to know is about the symmetry of the distribution. So we want to know about the skew, but we don’t want to know the skew in terms of 0, we want to know it in terms of the center of the distribution.

In addition, we want to know how much that value compares to how spread the scores are on average. And so we’re going to be comparing it to the standard deviation, which is the square root of the variance, the square root of our second central moment, and so this means we’re looking for the standardized third moment. And so for a population we’re thinking about each score’s deviation from mu in terms of sigma, the standard deviation, we’re taking those to the third power and then we’re adding them all up and dividing by n. But for a sample of course, we don’t know these true population parameters; we only have estimates of these. And that means that we lose degrees of freedom and we have to make some adjustments so our formula will look like this. And this will give us the adjusted Fisher-Pearson standardized third moment coefficient for skew.

And finally we come to the fourth moment which in its raw or crude form is the expected value of x to the 4th. But as you might guess we’re going to be making some adjustments to this as well. And the question that we want to answer here is; how heavy are the tails of the distribution? And this will tell us about the kurtosis of the distribution. And in terms of reference here we want to know about the tails in relation to the mean and also in relation to the standard deviation. And so that means here we’re going to be looking for the standardized fourth moment. And so for a population we’d be comparing each score to mu, dividing that by sigma, the standard deviation, taking those to the fourth power then adding them all up and dividing by n. And once again if we’re thinking about a sample then we don’t know these population parameters and so we’re going to have to make some adjustments based on our estimates and so we’ll get a formula that looks like this.

So this is a bit more complex and unfortunately we’re going to add one more thing to this, which is generally, we’re not interested in the total kurtosis of the distribution, which is what this will give us, but we want to know about the excess kurtosis. And that is the kurtosis compared to a normal distribution. So a normal distribution has a kurtosis of 3, and so we’re going to subtract 3 here, but we’re also going to have to adjust that based on our sample size. And so now with all of this we’ll have a formula for the excess kurtosis, or the heaviness of the tails in relation to the tails of a normal distribution. And a positive value would indicate positive kurtosis or a leptokurtic distribution, and a negative value will indicate negative kurtosis or a platykurtic distribution.

And so now we have all four of our moments here and each of them is telling us something different about the distribution. And so with all of these together we can have a pretty good sense of what a distribution looks like. We know where the center is from the first moment, how spread out scores are around that center from the second moment, the third moment tells us whether the spread is symmetrical or asymmetrical, and in which direction, and the fourth moment tells us how heavy or light the tails are in relation to a normal distribution. And knowing all these moments also allows us to easily compare distributions.

Two distributions could have the same mean but different variance, or the same variance but different kurtosis, or the same kurtosis but different variance, the same variance but different skew, etc. And so by calculating all four moments for distributions we have a clear sense of what they look like and how they compare to one another.

Well I hope this gave you a better sense of what moments are and, more importantly, how they’re adjusted and used in statistics in order to get a sense of the shape of a distribution. Let me know in the comments if this was helpful for you or if there’s questions that you still have and I’ll try my best to answer them. Don’t forget to like and subscribe to the channel for more, and be sure to check out the hundreds of other tutorial videos that I have on a range of topics in psychology and statistics. Thanks for watching!

Leave a Reply

Your email address will not be published. Required fields are marked *