Assessing Assessments – Types of Validity

In this video I explain how we assess the validity of assessments that have been created for a particular property (such as intelligence). There are several types of validity that can be assessed including construct validity, face validity, content validity, and criterion-related validity (concurrent and predictive). I briefly describe each of these types of validity and how they tell us about the relationship between a property, behaviors associated with that property, and assessment results.

Don’t forget to subscribe to the channel to see future videos! Have questions or topics you’d like to see covered in a future video? Let me know by commenting or sending me an email!

Check out my book, Master Introductory Psychology, an alternative to a traditional textbook: http://amzn.to/2eTqm5s

Video Transcript

Hi, I’m Michael Corayer and this is Psych Exam Review. In the previous video we talked about assessing intelligence and in this video we’re going to look at how we assess those assessments. So, in other words, we’ve created some assessment for intelligence and then we want to know, is this a valid assessment? Is this really assessing intelligence or not? How do we determine that?

So when we think about something like intelligence, remember it’s a property. It’s sort of a hypothetical idea. It’s something we think exists and then the reason that we think it exists is because there’s certain behaviors that we think are associated with it. So the reason we think that there’s something like intelligence that varies amongst people is that we see that people vary in their performance on certain types of tasks and then we create an assessment and hopefully that assessment is related to this hypothetical property of intelligence that we’re thinking about.

So if we look at these three parts here; we have some property, in this case that would be intelligence, and then we have associated behaviors. And in this case that might be things like grades in school or job performance, right? These are things that make us think intelligence exists. And then we have assessment, which would be things like IQ scores on some sort of intelligence test. So the idea here is that we think there’s some relationship between these behaviors and this property that we’ve created. We say “ok, people do, you know, some people do well in school, some people don’t do as well maybe that means there’s this thing called intelligence”.

And then we say ok, well we want to assess that, so we create some sort of test that we think is related to this idea of intelligence and then we also want to see that there’s a relationship between the behaviors that we associate with intelligence and the IQ results, the assessment results that we get from this test that we’ve created. And so we have this little triangle here of different possible relationships and what we’re doing when we assess validity is we’re sort of testing the strength of these relationships. And so we have multiple ways to think about these relationships that means we’re going to have multiple types of validity.

Now I’m going to give you a bunch of types, it might seem like sort of a laundry list of validity types, but don’t feel overwhelmed by it. The important point is that we’re assessing different relationships and each different name for a type of validity gives you a more specific way to talk about it. That way when you’re talking about the validity you don’t just say this test isn’t valid. I say “well why exactly isn’t it valid? Is it that it’s not really associated with the property, or does it not allow us to learn things about the associated behaviors?”. And so each type of validity will tell you exactly why it is or is not valid.

So let’s take a look at some of these different types of validity. The first one is one that I talked about in the research methods unit and this is construct validity. This is sort of a fundamental type of validity. If you don’t have this then there’s really no point to doing your assessment. So what does construct validity refer to? Well this is the idea that there’s a relationship between the property and the assessment. So we want to have a clear relationship between our property and how we’ve measured it, right? So if we look at our diagram here, construct validity is sort of this line right here saying “Ok, is there actually some clear relationship between intelligence and assessment results?”.

So again, you know, you could pick some silly example and say “ok, I want to know about intelligence and my assessment is to ask you your favorite color” or something and we might say “I don’t think there’s really a clear relationship between someone’s favorite color and their intelligence”. So in that case, we say probably has low construct validity. Or if I wanted to know about your intelligence and I say “well, intelligent people maybe earn more money and therefore they have more cars and so I’m going to count how many cars that you have and that’s going to tell me your intelligence”. Again we look at that and say there’s not a clear relationship between those two things, therefore we would criticize it for having low construct validity.

Now another way to think about an assessment is to look at the assessment and say “does it appear to measure what it says it’s measuring?”. This is called face validity and it’s a surface-level assessment. It’s not a detailed analysis. So I design some intelligence test and if you look at it and there’s a bunch of sort of different types of problem-solving questions you might look at and say “yeah, that looks like it could be an intelligence test”. So it has this surface-level validity where I think intelligence is associated with problem-solving ability and this is a test of problem-solving ability therefore it appears to be a good way to do it.

One way you could think about this is if I wanted to assess an artist’s technical ability. I might give them a test of brushstrokes, ok, there’s different types of brush techniques I want to see if you can do them; here’s a test of all these different brush techniques. Now, on the surface that looks like a good test of artistic technique, right? It sort of has the general appearance that it’s a test that would tell me something about an artist’s ability, so it’s the surface level assessment and here’s, my pen is not working, let’s see, here we go, so a surface level assessment and the idea is that it appears to be related but doesn’t mean that it’s fully related. It doesn’t mean that it’s the best assessment of something like artistic talent.

So to continue with this sort of analogy I’m going to get to the next type of validity which is content validity and what content validity asks is not just does it appear to be related but is it comprehensive? In other words, does it cover all the things that are associated with that property? So if I took the results from my brush stroke test and then I said this test tells you everything you need to know about your artistic talent then it would be criticized for content validity. You’d say “ok, it has face validity. It appears to be related, but in terms of content there’s a lot more to being an artist than just brush stroke technique.” And therefore we’d say this doesn’t have content validity. It’s not comprehensive enough to really tell you about artistic talent. It doesn’t cover all the aspects associated with that. So we’re going to need to have other types of things in our assessment if we really want to be assessing artistic talent rather than just brush stroke technique.

Now this idea of content validity is important for thinking about intelligence because as we saw in the video on defining intelligence, researchers disagree on what should be considered to be all of the aspects of intelligence. So a common criticism would be content validity. If I look at an IQ test and it has all sorts of tests of problem-solving but it maybe doesn’t assess creativity very much, so somebody might say well it appears to be related to intelligence but it’s not comprehensive enough because it doesn’t test divergent thinking. Or maybe if we if we think that there’s some sort of thing like emotional intelligence than we could criticize the test for not assessing it and therefore not having content validity because it doesn’t cover all the aspects that could be associated with intelligence. Ok, so that’s the difference between face validity and content validity.

Now the next two types of validity are related they’re both called criterion-related validity and the reason for this is that they are both seeing if the assessment is actually related to some other criteria. So the idea is we want to know if there’s a relationship with other outcomes and there’s two ways that we could assess that.

The first way is what’s called concurrent validity. In concurrent validity we ask “does it relate to other outcomes at the same time?”. So if you remember that concurrent means occurring simultaneously then all we’re really saying is that we’re looking at things at the same time. So in the case of an IQ test you might say ok, let’s say I give an IQ test to a bunch of children then I might ask “does the IQ test that they’ve just taken now tell me about their performance in school right now as well?”. So if we go back to our diagram here, you can think of it as being this line right here. It’s saying “do these assessment results have a relationship with other associated behaviors like grades in school or current job performance?”. And for concurrent validity it’s asking is there a relationship right now. You take the IQ test today and based on that maybe I can predict something about your current grades in school. And the idea might be if grades in school are associated with intelligence and my IQ test is associated with intelligence then there should be some relationship. Doesn’t mean it’ll be a perfect relationship, we can imagine you know the sort of lazy student in school who doesn’t do his homework but actually has a very high IQ score, or we can imagine the opposite the student who has a lower IQ score but works really hard and manages to get good grades in school so it’s not going to be a perfect relationship. We want to see some relationship between the behaviors and the assessment right now.

Now you can probably guess that predictive validity is similar. It’s also a criterion-related validity but it’s looking at other outcomes in the future. So can our test predict future outcomes? So in this case I might wonder “ok, the IQ test you take when you’re 10 years old, does that, do your results on that IQ test allow me to make predictions about your SAT score or about your college performance?”. And that would be assessing predictive validity. I’d say okay there’s some behaviors associated with intelligence like high SAT scores, let’s say or with, you know, high level college performance and can I predict those if you take an IQ test when you’re 10 years old? And so that would be an assessment of the predictive validity of my IQ test.

Ok, so those are the different types of validity and in the next video we’ll start talking a little more about reliability of tests. I hope you found this helpful, if so, please like the video and subscribe to the channel for more. Thanks for watching!

Leave a Reply

Your email address will not be published. Required fields are marked *