Statistical significance is an important concept for understanding when conclusions can (or can not) be drawn from psychological research.
Significance can be calculated in a number of different ways depending on the type of data we have collected, and calculations are based on the number of participants in our sample, as well as the effect size, or how large the difference was between our experimental group and our control group. For example, if I claimed to have developed a smart drug, then I randomly gave one student the drug and one student a placebo, then told you that the student who took the drug scored 95 on an exam while the placebo student scored 80, you might be intrigued, but you would also realize that the odds that the better student just happened to receive the drug are too high because there were only two students. For this reason, we generally want to have as many participants as possible in order to reduce these kinds of coincidences and be more confident in our conclusions. This idea that having more data points is always better is known as the Law of Large Numbers.
Similarly, if I randomly assigned 100 students to take the drug, and 100 students to take a placebo, then I found that the experimental group’s average exam score was 87 while the control group’s average was 86, you should still be skeptical, even though technically I found a difference. The problem here is that the effect size is too small. It’s probably just a coincidence, because if we take the average score of 100 random students and compare it to the average of another random 100 students, we won’t get exactly the same average every time. The fact that the difference is only 1 point means it’s not convincing evidence that the drug is having an effect.
In calculating significance we come up with a p-value. You can think of a p-value as telling you how likely your data is to occur. We want to collect data that is unlikely to “just happen” on its own. For example, imagine I told you that I could mentally control a fair coin so it always land on heads, and so you want to test this. In testing me, you wouldn’t be satisfied with a single coin flip landing on heads, because you know that a single heads is fairly likely to happen anyway, so the p-value would be high. If you flipped the coin 1,000 times and every time was heads, this would be very unlikely to occur on its own, so you might start thinking that this wasn’t just chance, and in this case the p-value would be low.
Data that is unlikely to have occurred by random chance suggests that we probably have a real effect, and so a low p-value is a good thing. Usually we want a p-value less than 0.05. When a p-value is 0.05 or lower we say that the results are statistically significant. Because we can never completely eliminate the possibility of our data being a chance occurrence (even 1000 identical flips could happen by chance), we will never have a p-value of 0.
If our p-value is 0.05, this means that the probability that there wasn’t a real effect (but we happened to get data that looks like one anyway) is 5%. To be clear, the p-value doesn’t tell us the probability that our hypothesis is correct, it tells us the odds of randomly observing the data we have observed. In the above example, a p-value wouldn’t tell us how or why this event is occurring, but it would tell us that it’s a very, very unlikely event.
(This post was an excerpt from my book Master Introductory Psychology: Volume 1 which covers history and approaches to psychology, research methods, biological bases of behavior, and sensation and perception)
One Comment on “What is Statistical Significance?”
Pingback: Research Methods – Resources | Psych Exam Review