Validity - a summary of chapter 8 of Testleer en Testconstructie by Stouthard

Critical thinking
Article: Stouthard, M, E, A

Validity: if test-results can be interpreted in terms of the construct the test tries to measure.

Understanding, test, and validity

A test is taken to make an inference about an construct that lies outside the measure-instrument itself, and that the instrument is supposed to measure.
Understanding of these results lie in lay in the extent to which are an indication of the construct.

Validity is an overacting concept. It is a term for an number of possible properties of a test.
Often multiple empirical sorts of knowledge are needed to get validity of a test.

Which sources of empirical knowledge are important for a test, depends on the users-goal of a test.

  • Describing use of a test
    When a test is meant to measure an specific behaviour or property.
    The focus lies in the validation process to find support of the underlying theoretical concept.
  • Deciding use of a test
    When a test is meant to select, classification, or diagnose.
    Here, support is needed for the prediction of the test of an extern criterium.

Two sorts of validity:

  • criterium-oriented validity
  • concept-validity

The difference between these two isn’t absolute.

Criterium oriented validity

When a test is meant to predict behaviour outside the test-situation, it is relevant to ask whether the instrument is a good predictor of the behaviour.
How better the test predicts the variations of the criterium, the higher the validity of the test.

The criterium

Like a test, a criterium is an operationalization of an underlying concept.
More criteria are possible.
There are different methods to make distinctions between criteria.

Kinds of criteria

An distinction between:

  • Specific/ closed criteria
    In selection situations.
  •  Global/ open criteria
    By classification

An distinction in time

  • Predictive criterium-oriented validity
    The criterium lies in the future
    Criterium-performance aren’t measured at the same time as test-performance, but later.
  •  Concurrent criterium-oriented validity
    The criterium is measured at the same time as the test.
    The criterium lays in the now or past.
    Mostly diagnostic use of the test

Distinction of criteria in the future

  • Final criterium
    Typically has a big criterium-relevance
    The criterium-behaviour is the most fully reflected.
  • Meantime criterium
  • Instant criterium

Relation test and criterium

The relation between a test and a criterium is mostly expressed as a correlation between both.
There is a cohesion but no causality.

A condition to interpret a relation between a test and a criterium as support for the criterium-oriented validity, is that there is at least one content explanation for the relation.

Research of criterium-oriented validity

Research of criterium-oriented validity of a test focuses on the determination of the relation between tests-core and criterium-measure.
If this relation is known, then the test-score can be a predictor for the criterium-measure and the criterium-score can be estimated.

Choice of criterium

The research of the criterium-oriented validity of a test requires a good determination of the criterium and a good operationalization of the criterium in a criterium-measure.

The criterium must connect with the goal of the test.
This choice must be made with a good reason.

Choise of criterium-measure

The criterium is measured with a criterium-measure.

  • this is explicit, not to confuse, unambiguously statement of the score, connecting to the criterium-behaviour of the criterium-performance.

The choice of the criterium-measure is difficult because of the availability of relevant information.
A criterium-measure can be a quantitative operationalization of the criterium-behaviour.

  •  Criterium-measure is been measured on a metric scale.
    The measuring unit is presented trough a distance on the measure-scale and the scale-points have numeric meaning.

Often there are not-quantifiable criterium-measures, of which scale-scores have no numeric meaning.

  • A criterium-measure on a ordinal level has two or more grounded classes or categories
  • A criterium-measure on a nominal level has only distinguishable classes or categories, which have no order.

The level on which the criterium is measured has consequences of the used measure in relation to the test.

A criterium-measure can be used in the grouping of persons in unordered (nominal) or ordered (ordinal) groups.
These are criterium-groups.
Criterium-group: a, for the users-goal of the test, representative group of which all the members have the same criterium-behaviour and of which all the criterium-scores are known.
The test must then predict membership of a group better than chance.

Reliability and validity of the criterium-measure

For a criterium-measure, the same requirements of reliability and validity apply as for a test.
The measurement of the determined criterium must be reliable and valid enough to make meaningful conclusion.

In a adequate measurement of the underlying criterium, the validity of the criterium-score must be plausible.
This is more personal judgment than empirical support.

Constitution of the sample

Research to the criterium-oriented validity of a test pretends to do general judgements about the validity, which reeks further than the researched sample.
The research design must make generalization possible.

  •  A sample which is big enough and representative for the users-goal of the test, of multiple samples which together are representative.

A sample must be chosen in such a way that enough that one expects a wide enough scatter of criterium-scores.

  • Lack of scatter leads to low reliability of the criterium-measure, and so to a low correlation between test-scores and criterium-measures.

Predictive criterium-oriented validity

Research to predictive criterium-oriented validity regards longitudinal research.

  • (selective) dropout of test-subjects and changes in test-subjects regarding the test or the criterium.
  • structural changes can cause an impairment of the validity in the course of time
  • Selection because only suitable deemed test-subjects are adopted for a certain treatment.

Simultaneous criterium-oriented validity


  • There could be a contamination between test and criterium-measure when they are simultaneously elicited by the same person.
  • Selection on another variable that is connected with the criterium.


The construct-validity of a test indicates in how far the test is a good measure of the underlying theoretical construct.

This is the question of the test-results can be interpreted as an indication of the concept that the test tries to measure, not of another concept, or only a part of a much wider concept.

The construct

Construct-validity is about the relation between a test and a not-immediately measurable underlying theoretical construct.
This relation can’t be established directly.

Support for construct-validity of a test must be deduced out a combination of empirical data and theoretical notions.
It implies a clear description of the construct though:

  • mapping the content domain of the construct
  • researching the intern structure
  • specifying the nomological network

Content domain

The content domain of a construct includes a description of the universe of phenomena that the construct is about.
The test must exists out of items that are representative for that universe.

The content domain of a construct describes the wide of the phenomena.

Internal structure

Out of the description of the content domain can be made up whether the construct exists out of more separate constructs.
Out of this the hypothetical internal structure of the test can be determined.

The nomological network

The nomological network exists out of relations of the construct and other constructs, relations with operationalisations of these constructs and relations between the mutual operationalizations.

A nomological network doesn’t have to be a real theory, it can exist out of hypothesis about empirical relations.
It must be possible to make testable statements.
On a theoretical level, you can describe the similarities and the differences to place and demarcate a construct of other, related constructs, or declare the similarities.

Typically, a test must be related to already existing tests about the same construct.
This relation must be strong if the tests use the same methods, but also if they use different methods.
The new test must show the same pattern with related and non-related constructs as already existing tests about the same construct.

Research to construct-validity

Research to construct-validity focuses to find empirical support for the coverage of the content of the construct, the internal structure of the test, and the relations with other constructs.

Coverage of the content domain

Determination of the content domain happens usually on the basis of theory, literature research, and research of other measurement-instruments for the same construct.
The coverage of the content domain of the test is usually derived on the manner on which the test is constructed. The content domain is described and items are made on basis of this domain.

Other than a partial coverage of the construct (the items aren’t a representative sample of the universe of phenomena), wrong coverage of the construct is also possible (the items measure something else then the intended construct).

Research to the internal structure

A measurement-instrument that is deemed to measure one construct, must meet an uni-dimensional model.

  • The internal structure of the test can be researched by using an uni-dimensional model.
  • A good use of a uni-dimensional model only supports the construct-validity if there may be assumed that the construct-as-meant is measured by the test.

When a measurement-instrument measures a multidimensional construct, then the separate aspects of the construct must be measured uni-dimensional.

  • Here also the relation between separate constructs must be specified in advance
  • the internal structure of the test must show these expected mutual relations

Relations with other constructs and tests

More support of the representation of the construct-as-intended by the test can be found by researching testable implications of the nomological network.
The correlations of the test with other tests of related or non-related constructs can be calculated.

  • with tests of related constructs, substantive, but not high correlations are expected
  • with tests of non-related constructs non-substantive correlations are expected.

The more the constructs are related, the higher the correlations must be.

Multitrait-multimethodmatrix by Campbell and Fiske

Campbell and Fiske have designed a strategy to evaluate the construct-validity of a test.

  • they see every test as a combination of a trait and a method.
  • a test measures a trait according to a method
    If people differ on the trait, it gives us systematic variance on test-scores.
  • Method-variance: systematic variance as a result of the measurement-procedure with which the trait is measured.

In the validation-process, convergence and divergence is needed.
Convergence: a tests is cohesive with other measurements of the same construct or related constructs.
Divergence: the test isn’t cohesive with other non-related constructs.

The multitrait-multimethod approach of validation: a process in which with separated independent measurement-procedures at different traits is sought to construct-validity of a test.

  • An assumption is independence of measurement-procedures or methods

To show convergence, the same trait must be measured with different methods
To show divergence, more traits must be measured with multiple methods in the validation-process

The correlations must meet four assumptions to support construct-validity of a test

  •  The correlations (C) between tests of the same construct measured with different methods, must differ significantly from zero and be big enough to support construct-validity
    Tests that measure the same construct must show cohesion
  •  These correlations must be bigger than correlations between tests about different traits measured with different methods
    Test that measure the same trait with different methods must be more cohesive than tests that measure different traits with different methods.
  • Tests that measure the same trait with different methods must be more cohesive than tests that only used the same method but with different traits
  • the pattern of correlations between traits must be present for the same as different methods
Connect & Continue
WorldSupporter Resources

WSRt, critical thinking - a summary of all articles needed in the third block of second year psychology at the uva


This is a summary of the articles and reading materials that are needed for the third block in the course WSR-t. This course is given to second year psychology students at the Uva. The course is about thinking critically about scientific research and how such research is

Contributions, Comments & Kudos

Add new contribution

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.
Author: SanneA
Join World Supporter
Join World Supporter
Log in or create your free account

Why create an account?

  • Your WorldSupporter account gives you access to all functionalities of the platform
  • Once you are logged in, you can:
    • Save pages to your favorites
    • Give feedback or share contributions
    • participate in discussions
    • share your own contributions through the 7 WorldSupporter tools
Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private

JoHo kan jouw hulp goed gebruiken! Check hier de diverse bijbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en je een bijdrage laten leveren aan een mooiere wereld