The Logic of Null Hypothesis Significance Testing

In this video I explain the logic of null hypothesis significance testing and how researchers use a null hypothesis and a comparison distribution or sampling distribution in order to assess the probability of a particular test statistic or calculation.

Video Transcript

In this video we’re going to cover a fundamental concept in the approach known as null hypothesis significance testing and this is the generation of research and null hypotheses. And we’ll see why the null hypothesis is so important for understanding the logic of many different statistical tests and we’ll also see how it helps researchers to draw conclusions from their data. We’ll start by looking at these two hypotheses and we’ll see how they serve different purposes in the research process. First we’ll consider the research hypothesis.


The research hypothesis or the experimental hypothesis or the alternative hypothesis, often abbreviated H1, is a prediction about the relationship between the groups or conditions in a study and a particular outcome. When designing a study researchers may have a general theoretical explanation that they want to investigate but they need to turn this into a testable situation in which they can make a prediction about what will happen if that theoretical explanation is correct. So the research hypothesis is a prediction about how the intervention, manipulation, or difference between conditions in a study will relate to the measurement of some outcome. Researchers might predict that one group will perform better than another, or perform worse than another, or that they’ll observe some difference between certain conditions in the study.

In addition to the research hypothesis, researchers must also generate a null hypothesis, often abbreviated H0. Null comes from the Latin nullus for none or not at all and what the null hypothesis always predicts is that no difference will be found between the groups and conditions in a study or that any difference that is found is simply the result of random chance. So now we have these two hypotheses and they’re directly opposed to each other; the research hypothesis suggests that there will be a difference that’s found while the null hypothesis proposes that no difference will be found and what we can do is determine whether our data fit better with one of these hypotheses.


So how do we determine if our data fit better with the research hypothesis or with the null hypothesis? This is where we turn to what’s called a comparison distribution or you may see this called a sampling distribution. The comparison distribution or the sampling distribution is the distribution that our calculation or test statistic comes from. We have some calculation which is based on a sample from the population and we have to compare that to the distribution of possible values we might have gotten when taking a sample like that from the population. So if we’re looking at a mean score for 50 participants, we need to find the comparison distribution of the mean, or the sampling distribution of the mean, which would be the distribution of all possible means that we might get when taking a sample of 50 from the population. By estimating the population and then estimating the relevant sampling distribution for that population, we can now estimate the probability that we would get that calculation for that group or condition.

And this is where the null hypothesis allows us to sidestep an annoying problem; that annoying problem is we don’t know what the comparison distribution looks like when the research hypothesis is true. In other words, we don’t know what the effect looks like in the population in advance and so we can’t really compare our data to that. The comparison distribution when the research hypothesis is true is an unknown distribution. If we already knew exactly what to expect when you take a particular drug or have some intervention or experience some manipulation, well then we wouldn’t really need to do the study in the first place, right? If we already knew that, that’s kind of what we’re trying to figure out.

So instead we turn to the null hypothesis and we say what would the comparison distribution look like if the null hypothesis is true? And this is something that we can actually estimate because what the population would look like after an intervention that didn’t do anything is, it will look the same as it looked before. If you have an intervention that has no effect then the population won’t change, it will look just like it did before the intervention. And so now we can estimate what the population would look like if there’s no effect and we can compare our data to that comparison distribution. And then we can say; how likely is it that we’d get data like this if the null is true, if it’s coming from that comparison distribution?

If we look at our data after an intervention and then we look at a comparison distribution where the null is true, where there’s no effect, and it’s highly probable that our data came from that distribution, then that suggests that the null might be true. There might not be any effect. That means that we have failed to reject the null hypothesis. The null hypothesis is a sufficient explanation for what we’ve observed. The distribution shows no effect and it looks like our data came from that distribution.


On the other hand, if we look at our data and it’s highly unlikely that this set of data came from the comparison distribution where the null is true, that suggests that maybe the null isn’t true. Maybe there’s a different explanation. And this means that we can reject the null hypothesis. We can say if there’s no difference, if there’s no effect, it would be very unlikely to get data like what we observed. And that means there might be an alternative explanation. Maybe our data is coming from a different distribution; it’s coming from a distribution where the null isn’t true, where maybe the research hypothesis is true, or some other explanation might be necessary to explain why our data is so extreme, why our data is so different from the distribution where the null is true.


This use of a null hypothesis and a comparison distribution in which there’s no effect is the basis of what’s known as null hypothesis significance testing and this is foundational for understanding a lot of different statistical analyses. What researchers do is they assume that the null hypothesis is true and then they aim their analyses at rejecting that assumption. And only when that assumption becomes statistically improbable can we consider other alternatives. This approach is also sometimes called a frequentist approach because what we’re doing is we’re thinking about the frequency of observing data at least as extreme as ours when the null hypothesis is true. And only when that frequency is sufficient low can we reject the null hypothesis and consider that there might be other explanations.

Now this might seem a bit backwards but it’s actually fundamental for understanding how science works. The reason for this is not just that scientists like to be skeptical; it’s because we can never directly prove a particular hypothesis. So we can never say for sure the research hypothesis is definitely true; so aiming our analyses at doing that would be a waste of time. Instead we just focus on rejecting the null hypothesis and saying the null hypothesis is an unlikely explanation for what we’ve observed. So it’s important to remember that in either case we’ll never be able to say whether the research hypothesis or the null hypothesis is definitely true. All we can say is whether or not the null hypothesis is likely or unlikely. So if we fail to reject the null hypothesis that doesn’t mean that it’s true; it still could be the case that there is an effect, it’s just too hard to separate it from random chance. It could also be the case, if we reject the null hypothesis, that the null hypothesis is actually still true. We observed something very unlikely but it was just a coincidence; it was still just random chance. We rejected the null hypothesis even though it was still true.

So the bad news is that we can’t ever say for sure whether the research hypothesis or the null hypothesis is actually true. Truth may be our aim but it’s not something that we can find in a single set of data or even in multiple sets of data. All we can really say is whether it’s likely or unlikely that the null hypothesis is true.

But the good news is once you get used to this sort of backwards way of thinking you’ll realize that nearly all statistical tests are doing the same thing. It’s easy to get overwhelmed by formulas and calculations, but what you should realize is we’re always following the same logic. So we assume the null hypothesis is true, then we have to calculate what the comparison distribution looks like when the null is true, and then we have to estimate the probability that our data came from that distribution. That’s always what we’re doing. And the different calculations and different formulas are really just for getting us to the appropriate comparison distribution. So different measurements will require different comparison distributions and that means we’ll have to do some different calculations in order to get there.

So if we wanted to know the probability of randomly selecting a single score from the population that falls within a particular range, then we would use the probabilities from the comparison distribution of individual scores in the population. But if we wanted to know the probability of getting a particular mean or average score for a group of a particular size, then we’d need to calculate the probabilities for the distribution of means for groups of that size. If we took a measurement, then we had some intervention, then we measured again and we wanted to know if the change we observed was likely due to chance, then we’d need to make a comparison to a distribution of how much change could be expected by random chance when remeasuring. Or if we looked at the difference between the means of two separate groups, we would need to compare that difference to the distribution of differences between means of groups that might occur by chance.

The underlying logic is the same in all of these cases and this idea of testing the null hypothesis forces us to consider; what does the appropriate comparison distribution look like? What does the distribution look like when the null is true? We have to figure that out and then we can estimate the probabilities and decide whether or not we can reject the null hypothesis. So to review these concepts I have a few questions that hopefully you’ll have a better sense of how to answer. If you can answer all of these questions then you have a good sense of the logic of null hypothesis significance testing and you’ll be ready to look at this approach in more detail.


Why do researchers generate a null hypothesis?

What’s the default assumption when thinking about the null and research hypotheses?

What is a comparison distribution or a sampling distribution?

When using a null hypothesis significance testing approach and drawing a conclusion from data, what are the two options that researchers have?

What does it mean if researchers fail to reject the null hypothesis? What does it mean if researchers reject the null hypothesis?


If you aren’t sure about the answers to some of these you might want to review certain sections of the video or feel free to ask in the comments and I’ll try my best to help you out. (*suggested answers below)


So in order to reject the null hypothesis we’ve said that we have to have data that is unlikely to have come from the comparison distribution in which the null is true. So now the question arises, well, just how unlikely does our data have to be in order to reject the null hypothesis? And this brings us to what we’ll look at in the next video, which is the concept of statistical significance and the calculation of probability values or p-values. Let me know in the comments if this was helpful for you, feel free to ask other questions that you still have, make sure to like and subscribe, and check out the hundreds of other psychology and statistics tutorials that I have on the channel.

Thanks for watching!

*Sample Answers to Review Questions

Why do researchers generate a null hypothesis?

Researchers generate a null hypothesis in order to compare a result to a distribution in which there is no difference between conditions, or where the manipulation or intervention has not had any effect.

What’s the default assumption when thinking about the null and research hypotheses?

The default assumption is always that the null hypothesis is true. This assumption must be shown to be statistically unlikely before any other alternative hypotheses can be considered.

What is a comparison distribution or a sampling distribution?

A comparison distribution or sampling distribution is the distribution of possible values for a calculation or test statistic that is created using estimates for the population when the null hypothesis is true, or when there is no effect. This allows for the estimation of the probability of getting a particular result for that test statistic if the null hypothesis is true.

When using a NHST approach & drawing a conclusion from data, what are the two options that researchers have?

Researchers can either reject the null hypothesis or fail to reject the null hypothesis.

What does it mean if researchers fail to reject the null hypothesis?

If researchers fail to reject the null hypothesis, this means that the test statistic result is likely to have come from the comparison distribution in which the null is true. This implies that the null hypothesis is sufficient for explaining the observed result and that other explanations are not needed. This does not mean that the null hypothesis is definitely true (or that the research hypothesis is not true), but that the null being true is a likely explanation for what was observed.

What does it mean if researchers reject the null hypothesis?

If researchers reject the null hypothesis, this means that the test statistic result is unlikely to have come from the comparison distribution in which the null is true. This doesn’t mean that the null isn’t true (or that the research hypothesis is true), but it implies the possibility that the test statistic result might be coming from a different distribution, perhaps one where the research hypothesis (or some other alternative) might provide a more likely explanation for what was observed.

Leave a Reply

Your email address will not be published. Required fields are marked *