Real and Illusory Correlations, Scatterplots, and Causation

In this video I explain the dangers of illusory correlations and confirmation bias, correlational studies, patterns of variation, r-values, scatterplots and conclusions about causation.

Try guessing the r-value from a scatterplot at www.guessthecorrelation.com

Don’t forget to subscribe to the channel to see future videos! Have questions or topics you’d like to see covered in a future video? Let me know by commenting or sending me an email!

Need more explanation? Check out my full psychology guide: Master Introductory Psychology: http://amzn.to/2eTqm5s

Video transcript:

Hi, I’m Michael Corayer and this is Psych Exam Review.

In this video I’m going to talk about correlations. Now we want to know how the world works and that means we want to know how things are related to one another, what are the relationships between things?

And the problem is that when we just look for relationships we might find relationships that aren’t really there. This is known as an illusory correlation. So an illusory correlation is a false pattern. So we happen to notice two things and we think that they’re related to one another when in fact they aren’t.

So let’s say I take my psychology exam and I get a good score and I notice that I’m wearing this blue shirt. Hmmm. I wonder if these two are related. You know, now that I think of it, last week I took a math exam I was also wearing this shirt and I also got a good score. Maybe there is something here, maybe this is my lucky shirt.

This is an illusory correlation, it’s a false pattern. The problem with this false pattern is that once I’ve noticed it, I might become more and more convinced of it. This is a bias known as confirmation bias.

Confirmation bias is the idea that when we think we have a pattern, we start looking for examples of it, we seek confirmation. So we start seeking out confirming evidence. I’m going to start thinking of other times I was wearing this shirt and I did well on an exam. That’s going to make me more convinced that this pattern is real.

And along the way I’m going to start disregarding things that don’t fit. So I’m going to disregard contradictory evidence. I’m going to say “you know what, I was wearing this blue shirt when
I wrote my English paper and I got a good grade on the paper, there’s really something here”.

And I’m going to disregard things that don’t fit. So I say “ok, well actually my physics test I was wearing a red shirt and I got a perfect score on that test. But you know what, that physics test was so easy that it doesn’t matter what I was wearing. Lucky shirt? I didn’t need the lucky shirt that day, so that doesn’t count”.

Or I say “You know I took my history exam and I was wearing blue shirt but I got a D. Hmmm. Well, you know, it’s a great thing I was wearing my lucky shirt that day, otherwise I obviously would have failed. You know, I needed all the luck I could get that day. It’s a good thing I had my lucky shirt on”.

So this would be an example of an illusory correlation which becomes stronger based on confirmation bias. Now this is a silly example a lucky shirt and exam performance and you probably don’t believe this sort of thing. But illusory correlations and confirmation bias become a lot more dangerous when we start making assumptions and we start seeing false patterns in people’s behavior based on their gender or their race or some other trait.

So we can’t trust our own minds to find correlations. Our minds are prone to error and they’re prone to bias so we need to have a better way of ensuring that a relationship is real. And to do that we do a correlational study.

So what do we do in a correlational study? What we do is we measure our two variables and then we look for a pattern of variation and this is just saying that we look for a relationship between the variables so as one variable changes,
does the other variable also change in some predictable way?

That’s a pattern of variation. And we don’t just want to know that there is a pattern, we also want to know the strength of this pattern. For that we need to calculate a correlational coefficient and this is gonna tell us the strength of the relationship.

So what is a correlational coefficient? A correlational coefficient is an r-value, so we represent this with a lower case r and the r-value can range from -1 all the way up to +1

So it’s going to be somewhere between these two extremes and this tells us the strength of the relationship. Whether it’s positive tells us, if it’s positive correlation that as one that increases the other variable also increases. And in the case of a negative correlation as one variable increases the other variable decreases.

So the correlation coefficient was invented by Sir Francis Galton who was a statistician and who is considered the father of psychometrics which we’ll talk about in the section on intelligence and also personality.

The calculation of correlation is a little bit complicated so we’re not going to go through and do this by hand but to give you an idea of what this r-value means we’re going to look at some scatter plots.

So a scatter plot is just a graph showing our two variables. So we have the X value and we have the Y value. So let’s say I go out and I measure people for two variables X and Y and what I do is each dot here would be the person’s measurements, here’s their X score, here’s their Y score. So I go through and let’s say I get something like this now here we can see that this is a positive correlation, right? The r-value is going to be positive here because as X increases, Y also increases.

That means it’s a positive correlation. And here the data is very neat, it falls perfectly on a line. So that means if I know the X value then I can perfectly predict the Y value.

In this case, that means we’d have an r-value of positive one because it’s perfect, it’s as positive as we get here, it’s very predictable.

Now in real life data isn’t often going to look this way. Instead what’s gonna happen is the data is going to get a little bit spread out, we’re going to have some exceptions so we’re not going to be at positive one anymore.

So let’s say we get something a little bit more like this so OK if I know the X value I can predict the Y value, but not perfectly. Not anymore, now I can still see it’s a positive correlation, still see there’s definitely a relationship between these two things but it’s not a perfect relationship. I can’t predict perfectly. So in this case we might have something like r equals positive 0.8

Now as we get more and more spread out data, so now let’s say I look at my graph and I have something like this people are all over the place on the X and Y values, well now if I know the X value I can’t predict the Y value at all, it could be anything
and if I know the Y value, I can’t predict the X value at all. In this case the r-value is gonna be zero. So as we get more and more spread out away from a line, the r-value gets lower and lower and lower until it reaches zero, and then as we get into negative, this is where we start seeing the relationship where as one variable increases the other variable decreases. So if I get into negative r-values, let’s say I have something like this we can see there is a pattern here, it is moving downward in a sort of predictable way but it’s not particularly it’s not very close to a line. So in this case we might say this is probably an r-value of maybe -0.6

Now if you want to practice this, get a feel for different correlations, you can go to www.guessthecorrelation.com and I’ve put a link in the video description and on that site there’s a game where they show you a scatter plot and you try to guess what the r-value is.

I said that calculating the r value is rather complicated and so I’m not gonna go into how you actually do this by hand but if you’re curious I’ll give a brief explanation of what the r-value is and where it comes from. If this doesn’t make any sense to you, don’t worry about it, you won’t have to calculate it by hand. So for those of you that are interested, the way that the r-value is actually calculated is you go through and you look at all of the X values that you have on all your data and you calculate a standard deviation for that then you do the same thing for all of your Y values, and then you look at each point on the graph and you compare this point to the standard deviation for X and then you also compare to the standard deviations for Y. That gives you two values which you then multiply and then you do that for every single point on your line and you add up all of those values and you divide them by the number of dots that you have on your graph minus 1 and that’s how you get the r-value. As you can see that’s a very complicated drawn-out process, it’s very tedious and most people aren’t going to be doing this by hand. Usually they’ll use statistical software where they just plug in their data and it just spits out the r-value.

But if you were curious of where it comes from, that’s how it’s actually calculated.

OK so you’ve probably heard this expression that correlation is not causation. So what does this mean? What this means is that a correlation doesn’t tell us about causation. It just tells us that there’s a relationship between these two variables but it doesn’t tell us what kind of relationship.

Now if you have causation then those things will be correlated. So causation is correlation . You can say that because if one thing causes another you will find a correlation between them. But just because you found a correlation doesn’t mean that they cause one another. So let’s look at one of these examples, let’s say we have this r = +0.8, so we look at this graph we say ok, as X increases Y increases in a predictable manner so what could explain this? Well it’s possible that X causes Y. That as the X value increases that is causing the Y value to increase.

Well, you say, all we have is measurements here, we don’t know for sure, it could be the other way around. It could be that as the Y value increases that causes the X value to increase. And that’s just as plausible. Again all we’ve done is measure all we know is there’s a relationship, we don’t know how the relationship works and so we don’t we know which of these it is.

In fact, it could be a third possibility and that is that Z causes X and Y. What does that mean? Mike what is this Z? I only have X and Y here, where did this Z come from? That’s the point, the point of Z is that Z is something you didn’t measure.

It’s what we call a third variable, something that was not in the study. We only measured two things and it could be the case that some third thing that we didn’t even study is causing both X and Y to increase in this case. This is a very important idea, in fact this idea of a third variable is going to be so important that it’s going to get its own video. So that’s what I’ll talk about next. I hope you found this helpful. If so, please like the video and subscribe to the channel for more.

Thanks for watching!

Leave a Reply Cancel reply