How to Calculate Skew

In this video I explain two formulas for calculating skew: Pearson’s Median Skewness or Pearson’s 2nd equation for skew, and the Adjusted Fisher-Pearson Standardized 3rd Moment Coefficient for skew. I also explain some general guidelines for interpreting skew values, as well as some caveats for drawing conclusions.

Doane & Seward (2011) Measuring Skewness: A Forgotten Statistic? http://jse.amstat.org/v19n2/doane.pdf

Video Transcript

Hi I’m Michael Corayer and this is Psych Exam Review. In most introductory statistics courses you’ll learn about the concept of skew and how to recognize it visually, but what’s conspicuously absent is discussion of how to calculate skew. And so in this video I thought we’d look at two formulas for calculating a coefficient for skew. One of these is fairly simple and you can do it by hand and the other is a bit more complicated and you’d probably want to use software in order to calculate it.

So when we learned about the concept of skew we learned how to recognize a positively or negatively skewed distribution when looking at a histogram. But there’s two problems with relying on visual inspection for thinking about skew and the first is that it’s influenced by the bin size of the histogram. And so we can take the same set of data and by changing the bin size we can change the apparent skew that we can see in the visualization. And for each set of data there’s not necessarily a best bin size that we should use and that means that there isn’t a definitive visual view of what that data looks like and how much skew there is. We also have a problem that if we’re just looking at a visualization it’s hard to get a sense of the influence of the sample size on the skew. And so that’s another consideration that we’re going to look at when we’re calculating skew.

When we learned about skew we learned the idea that in a symmetrical distribution the mean, the median, and the mode are all equal to each other. But as we have asymmetry what happens is the mean gets pulled away from the median and the mode. And so if this happens in the positive direction then the mean gets pulled upward by that long positive tail and we end up with positive skew. And in negative skew the mean is getting pulled below the median and the mode, right? The long tail is pulling it downward.

So our first way of calculating skew is going to look at this pull on the mean and how much it’s being separated from the mode or from the median. And so we’re going to look at Pearson’s median skewness and this is a formula that was mentioned in a 2011 paper by David Doan and Lori Seward. They suggested that this should be included in more introductory statistics courses because it can help you get a stronger sense of the concept of skew and so I’m doing my part here to try to help make that the case. And I’ll post a link in the video description where you can find the original paper if you’re interested in reading it.

Okay so here’s this equation for Pearson’s median skewness or Pearson’s 2nd equation for skew and what we’re going to do is look at each part of this to try to get a sense of what it’s doing and what that’s telling us about the skew of a set of scores. So we’ll start by looking at this x-bar minus median. So what this is telling us is the distance between the mean, x-bar, and the median. And it’s telling us how far the mean is being pulled away from the median. So remember we said that in skew the mean gets pulled either positively or negatively away from the median and so this is telling us how far is it being pulled. Now if we have a symmetrical distribution then the mean and the median will be equal to each other and that means this part of the equation will give us an answer of 0 and that means we’ll end up showing that we have no skew or a skew of 0.

Now let’s look at the next part of the equation. We’re dividing this by the standard deviation. So why are we dividing this by the standard deviation? The reason for this is that we don’t just want to know the distance between the mean and the median; we want to know what does that distance mean in relation to how spread out the data are? So how far is that distance relative to the standard deviation? And so this is sort of giving us some context for what that distance means. We can imagine if our data are all very close together then a certain distance between the mean and the median might indicate more skew than if our data were very spread out. And so that’s why we’re going to divide by the standard deviation. And these two parts of the equation together; the distance between the mean and the median divided by the standard deviation, is going to give us a result that’s going to range from minus 1 to positive 1. And the reason for this is that that distance between the mean and the median is generally going to be less than a standard deviation because as that distance between the mean and the median gets very large what happens is the standard deviation is also going to get larger. And so this part of the equation is going to be sort of bounded by that idea, and that means we’re going to expect to get a result here between minus 1and positive 1.

And now this brings us to to the last part of the equation, which is multiplying by 3. And so you may wonder why are we multiplying by three? What’s the purpose of this, what is this accomplishing? It seems somewhat arbitrary. And the answer to this is that it has to do with Pearson’s first equation for skew. Pearson’s 1st equation for skew looks at the difference between the mean and the mode. And so in order to make this comparable to that result, what we have to do is multiply by 3. And the reason for this is that the distance between the mean and the median will tend to be about a third of the distance between the mean and the mode and so this allows these two equations to give us similar results.

And so with all these parts together we’ll see that we’ll get an overall result that’s going to range between minus 3 and positive 3, where minus 3 or positive 3 is going to indicate very extreme skew and scores that are closer to 0 are going to indicate less skew.

So we can see that Pearson’s median skewness is a fairly simple formula but this simplicity does come at a cost. Because we’re only looking at the summary statistics; the mean, the median, and the standard deviation, what this means is that this test will lack statistical power. And that means it’s not able to differentiate well. We could have two distributions that have similar means, medians, and standard deviations but they actually differ in their skew but this formula isn’t able to detect that. And so in order to see that we’d need to use a more complex formula, one that’s going to take all of our individual scores into account. And so if you’re calculating skew using software like Excel, or Minitab, or SPSS, then this is the formula that you’re going to be seeing.

So here we have the adjusted Fisher-Pearson standardized 3rd moment coefficient for skew. And this has a very long name for this equation but luckily each part of this is actually telling us exactly what the equation is doing and so this makes it a little bit easier to remember. And so this is named after Ronald Fisher and Karl Pearson and what we’re doing is we’re finding the standardized third moment and then we’re adjusting it based on our sample size. And if you look at the equation you’ll see there’s sort of two parts to it; one part that’s finding the standardized third moment and another part that’s adjusting for the sample size. And so if you look at this you can probably guess which part is which.

If we look at this first part here with all these “n”s in it you might correctly guess that this is the part that’s adjusting for the sample size. Then we have sort of the second half of this equation, and what this is doing is finding the standardized third moment, okay? So I’m going to make a separate video about moments in the future for those who want to know more detail, but even without really knowing what moments exactly are, we can still sort of look at this equation and figure out what’s going on. We can see what it’s doing in order to get an estimate of the skew.

So let’s start by looking at the numerator here. So we have the difference from each score and the mean cubed and then we’re adding all of those up and dividing by n and so what that means is what this is doing is it’s finding the average cubed distance from the mean. And then if we look at the denominator here what we’re doing is we’re finding the distance from the mean for each score, we’re squaring it, adding all of them up and we’re dividing by n. And what you might realize is that’s actually calculating the variance. And so we’re taking the variance, and then we’re taking the square root of it, which you might recognize as the standard deviation. The standard deviation is the square root of the variance. And then we’re taking that to the third power and so essentially what we have here is the cubed standard deviation.

Now one thing to note about this is that when we calculate the variance here you’ll notice that we’re just dividing by n so that means we’re not making Bessel’s correction here, okay? And that’s because we’re going to be making the correction later when we do the sample size adjustment. So just note you can’t just drop in your corrected standard deviation cubed here, you have to calculate it without dividing by n minus one and instead dividing by n, which is the biased sample variance.

Okay so before we get to the sample size adjustment let’s think about what this cubing is doing. So we’re finding these cubed distances from the mean. Now when we learned about variance we talked about squared distances from the mean, and what I said was the squaring accomplished two things; one is that it got rid of negatives, because a negative value that’s squared becomes a positive value, and we also said that it exaggerates larger distances from the mean. So in the case of variance if you’re 2 points away from the mean that ends up adding 4 to the numerator or if you’re 10 points away you add 100. Well now we’re cubing so we realize this is going to be even more exaggerated. So if you’re 2 points away from the mean here you’re going to add 8 to this numerator and if you’re 10 points away from the mean you’re going to add 1000. So what we’re doing is we’re weighting these scores that are very far from the mean very strongly, because those are going to play a strong role in the amount of skew. When you have a score that’s very far from the mean it’s pulling in that direction and we’re trying to figure out the overall pull on the mean in either direction, positive or negative.

And the other thing that we’ll notice is when we’re cubing we bring back negatives, right? That’s how we can tell if the pull is happening more in the positive direction or more in the negative direction, because when you cube a negative number you’re going to get a negative result. And so that’s what’s going to allow us to see the direction of the skew.

So now let’s look at the first half where we have this sample size adjustment, what’s going on here? If we look in the numerator we have n * n -1 and we could say that that’s n^2 – n and then we’re taking the square root of that. So that means we’re going to end up with a value here that’s close to the value of n, but it’s always going to be a bit smaller than n. And then we have the denominator where we have n minus 2. Now again, this is very close to the value of n but it’s always going to be a little bit smaller. And so what that means is these two parts together, as we get a larger and larger value of n, this section here is going to get closer and closer to a value of 1, right? So we say that it approaches unity; that means it gets closer and closer to a value of 1, although it will never actually get there.

Okay, so what does this mean in terms of thinking about our skew? Well, the adjustment that we’re making, if we have a very small sample size, we need to make a larger adjustment. This is just like we saw with variance; if you have a small sample then you’re probably not capturing the variation that might exist in the population, right? Because you only have a small number of scores and you’re unlikely to get some of the extreme values in that small sample. The same is true for skew, right? If we have a small sample there’s probably more skew in the population than there is in our sample, because our sample is unlikely to have those rare extreme scores, and those contribute a lot to the skew.

So if we have a small sample we have to increase our estimate. If we have a very small value for n, like 5, then this part is going to end up being equal to almost 1 and a half, about 1.49. Whereas if we have a very large value for n then this is going to be much closer to a value of 1. And so what that means is when you have a large enough sample you don’t need to make very much adjustment. The estimate for the standardized third moment that you have in this half of the equation is probably pretty close if you have a large sample size. But if you have a smaller sample size you need to increase your overall estimate.

Okay so hopefully that gives you a sense of what this equation is doing. I should note that if you look this up in the documentation for Excel or something like this to see how it’s calculating skew, you’re not going to see this version. Instead you’re going to see a slightly altered version of this but it is an equivalent formula; it gives you the same result. It’s just written a little bit differently.

So this is the version that you’ll see if you look in the documentation for Excel or some other software programs and it’s essentially doing the same thing. So I thought I’d just point out what looks slightly different. You’ll notice that this part here looks a little bit different, and you’ll notice that we’re not dividing by n in the numerator here on the right hand side of the equation, and that’s related to why this part looks a little bit different. And the most important thing to note is just that we have the standard deviation here cubed, right? Written as just a standard deviation rather than this more complex thing we had before. So what’s going on here?

The key point is that this estimate of the standard deviation here is using the unbiased variance and that means that it’s using Bessel’s correction. It’s using n -1 when it calculates the variance. so when you put this into Excel and Excel finds the standard deviation it does it using n – 1, and so that changes the adjustment that we need to make over here. Whereas in this first version of the equation we didn’t adjust until we got over here. So that’s why these equations look slightly different but they’re doing the exact same thing; they’re finding this standardized 3rd moment and they’re adjusting it based on our sample size.

So now you’re probably wondering, what does this value for skew mean? I’ve calculated a coefficient for skew, how do I interpret it? So now we’re going to look at some general guidelines for thinking about how we can interpret skew and it’s important to remember that these are just general considerations; these aren’t necessarily detailed analyses that you might want to do in some situations for a skew coefficient.

Once we’ve calculated our skew coefficient it’s going to tell us a few things. The first thing it will tell us is whether we have any skew or not. So if we get a result of 0 then that would mean that we have no skew. But if we get a positive or a negative result then that’s going to tell us the direction of skew. And then the value of the coefficient will tell us something about the strength of the skew. And so sort of a rough way of thinking about these coefficient values is that if we get a value between about -0.5 and +0.5 then we can say that means that our distribution is approximately symmetric. If we get a value between -1 and -0.5 or +0.5 and +1 then we might say that our distribution is moderately skewed. And if we get values that are below negative 1 or above positive 1 then we might say that our distribution is highly skewed.

So this gives us a general sense of the strength of our skew, but there’s a few caveats we want to keep in mind. The first is that skew is just one aspect of normality. So skew tells us if a distribution is symmetrical but it doesn’t tell us if it’s normal, because in order to be a normal distribution it also needs to be bellshaped. So it’s possible to have a distribution that is perfectly symmetrical but it’s not a normal distribution. And so in order to look at the shape of the curve and the shape of the peak in relation to the tails then we’ll need to look at another measure, which is kurtosis.

The other important caveat is that these are just rough guidelines and to truly interpret a skew value we actually have to do some inferential statistics and start thinking about the probability of getting different values. Because even if we have a perfectly normally distributed population, not every sample that we draw from that is going to have a skew of 0. And so we have to start thinking about the probability of getting a particular skew value from that population. And so we haven’t covered how this type of probability works yet. We are going to cover it in the future, mostly looking at other things like thinking about the probability of getting a particular mean for a sample, but we could apply exactly the same logic to the calculation of a skew coefficient, although you’ll find in most introductory textbooks there won’t be a table for looking up the probability of different skew values based on different sample sizes. However, if you wanted to do that there would be ways of interpreting skew at that level of detail.

So with all this in mind you might be wondering how much does skew really matter? So you’ve collected some data, you’ve calculated a skew coefficient, maybe you see that your sample is highly skewed or moderately skewed, and you want to know how much that matters. Unfortunately I can’t give you a definitive answer to that. What I can say is that it’s going to depend on the situation; it’s going to depend on the type of data that you have, it’s going to depend on your sample size, and it’s going to depend on what kind of analysis you want to do. So rather than trying to go through the range of possible situations where skew may or may not matter, what we’ll say is as we introduce some tests in the future we’ll just make a note of times where skew might be an important consideration.

And that said, even if we have a highly skewed sample we’ll see that there’s other ways of getting around this. So we might decide to remove some outliers from our data, we might trim the data, we might decide to transform the data so that there’s less of an influence of the skew on some of our calculations. And so all these are ways of getting around a highly skewed sample and still being able to do some of the analyses that we want to do.

A final consideration here is to always remember to think about our sample size. So let’s say that I have a small sample and it turns out to be highly skewed. Now I might be worried about this because I see that it’s highly skewed, but I should probably be more concerned with the fact that I have a very small sample, because if I have a small sample then our estimates of skew tend not to be very good. And so if I have a small sample and I don’t find any evidence of skew that’s actually not all that reassuring because my assessment of skew isn’t very good with a small sample.

In contrast, if I have a very large sample and it turns out to be skewed then that actually might not be much of a problem. And the reason it might not be much of a problem is because I have a very large sample size. And what we’ll see is that’s going to make up for some of the problems that skew might introduce. So we have this strange situation where in order for a test for skew to be accurate we need a large sample size, and once we have a very large sample size then actually we don’t really need the information that the test provides; skew isn’t necessarily all that relevant. And we’re going to see this type of situation again for some other tests in the future.

So that’s two formulas for calculating skew, I hope that both of these made sense to you. Let me know in the comments if there were areas that were particularly helpful or if there’s areas that you still have questions about and I’ll try my best to answer them. Make sure to like and subscribe and don’t forget to check out the hundreds of other psychology tutorials that I have on the channel. As always, thanks for watching!

Leave a Reply

Your email address will not be published. Required fields are marked *