Testconstruction and testresearch - a summary of an article by Oosterveld & Vorst (2010)

Critical thinking
Article: Oosterveld & Vorst 2010
Testconstructie en testonderzoek


Validity-theory

There are problematic theories about validity

Examples van viewpoints

Dorsboom (2003)

According to Dorsboom, it is plausible that the mercury thermometer is a valid measurement of temperature of objects, because differences in the real temperature cause differences in the measurement-instrument.
If the causal string is described exactly, and this is a plausible representation of reality, than is the instrument valid in reality.
Real validity is unknown as long as not all the relevant knowledge is available.
Because it is in principle unknown in what extent relevant knowledge is available, validity is hypothetical unsure.
Even if the causal string between true variation in the trait and the measured variation is known well, knowledge about the causal strings can change due to new knowledge. This is why real validity is hypothetical.
However, people can have a judgment about the validity of measurement-instruments. This validity-judgment doesn’t have anything to do with the real validity.
In psychology, true causal strings are (yet) impossible
That is why psychology temporarily deals with hypothetical validity-judgments. This is in suspense of more precise and true causal strings between true trait-variation and measurement-variation.
The quality of measurement, not the validity, must be proven from psycho-metrical analysis (reliability, one-dimensionality, representative content of the measurement-instrument, connections with external criteria, support of theoretical expected connections)

Science-philosophical viewpoint

  • If a test is valid depends on the state of affairs in reality (ontology)

Description of validity

  • Validity: assumed property of trait varies in values in the population; differences in trait-values cause differences in measurement.
  • No validity if: differences in measurement-results can’t (be) explained by differences in trait (if traits don’t exist or no variance in values or no causal relation)

Derived statements

  • Validity is present or not
  • due to the knowledge of reality, the real validity of an instrument is hypothetical and for the time being.
  • the validity-judgment is a subjective estimate of the true validity of an instrument
  • validity can be assumed if causal relations in reality are applied in the construction of the measurement-instrument
  • Validity doesn’t have anything to do with relations between properties of criteria
  • validity is only about the measurement-instrument
  • distinction between forms of validity and forms of validity-research is pointless

Research to measurement-quality/validity

  • research to the causal relations between variance in properties and variance in measurement in central
  • existing validity-research is research to the quality of measurement
  • impression-validity is an superficial, subjective judgment of the measurement-quality
  • content-validity is a judgment about the measurement-quality of the content
  • criterium-validity: a research to the predicting value of the measurement
  • construct-validity: research to the theoretical measurement-validity

Messick

According to Messick, the temperature-measurement is valid because a lot of research with thermometers has given strong empirical support to theoretic expected relations between outcomes of temperature-measurements and other measurements of criteria.
According to Messick, validity of measurements has to become apparent from research: reliability, uni-dimensionality, and representative content of the measurement, relations to external criteria, support of theoretical expected relations in the nomological network.

Science-philosophical viewpoint

  • If a test-interpretation is valid depends on empirical support for the interpretation (epistemological)

Description of validity

  • Validity: the assumed trait or property varies in values in the population; differences in trait-values correlate with the differences in measurement.
  • No validity if: the measurement has no relation with other measurements or criteria

Derived statements

  • Validity is gradual
  • Because of the limited research of validity, the validity of an instrument is for the time being.
  • The judgment of validity is the validity
  • validity is a judgment on the base of empirical properties after the construction of the instrument
  • validity is dependent on empirical support of the nomological network
  • validity is about the interpretation of scores and decisions based on scores
  • forms of validity and forms of validity-research lead to insights in diverse aspects of validity

Research to measurement-quality/validity

  • Research to causal relations between variation in traits and variation in measurement is relevant, but a side issue
  • Diverse aspects of the measurement-qualities concern different aspects of the validity
  • impression-validity have a weak contribution to the validity of score-interpretation
  • content-validity has an important contribution to the validity of score-interpretation
  • criterium-validity has an important contribution to the validity of score-interpretation
  • construct-validity has an important contribution to the validity of score-interpretation

Science-philosophical viewpoint and description of validity

According to Borsboom, the requirement of validity of an measurement-process, is that the measured trait exists in reality, the values of the traits vary in the population, and that the variation is due to the measurement.
The requirements can exists only in reality, so validity is a judgment of the ontology.

Messick describes validity as a judgment of scientific research, validity is a judgment of the knowledge in which theory and empirical testing play a role.
That variation in trait-values causes variation in measurement-values is not required, but validity in trait-values must correlate with measurement-values.
If these correlations support the expected values, this supports the supposed trait, the variation of values in the population, and he validity of interpretations of measurement-results.

Summary

The concept of validity is relatively limited in Borsboom’s view. The measurement-instrument is, or is not valid. The quality of measuring is a more or less complex evaluation of diverse aspects.
The validity of most psychological instruments doesn’t exists because the causality of the trait-values to measurement-variance is (yet) unknown for all instruments.
Despite the fact that real validity of an instrument is negative or unsure, it can have good measurement-qualities.

The concept of validity is extensive in Messick’s view. Measurement-results and possibilities for users are valid to a certain extent. The measurement is included in the concept of validity.
Measurement-results contribute to a combined validity-judgment about interpretations of measurement-results.

Validity and measurement-quality of measurement-instruments

Impression-validity – a subjective judgement of measurement-quality

Impression-validity: a subjective judgment of the usability of an measurement-instrument on the base of directly observable properties of the testing-material.

  • This concerns the judgment of test-takers and other people without certain knowledge, but it can also concern test-users without knowledge of the manual of the measurement-instrument.
  • The impression can be formed by the exterior quality of the test-material, asking for the nature of the questions, the nature of the answer-possibilities, etcetera.
  • The impression-validity can be measured by test-takers and users to get a subjective judgement of the usability of the measurement-instrument based on exterior test-material.
  • A high subjective value can help usability of the measurement-instrument.

Content-validity – the substantive measurement-quality

Content-validity: the judgement of the representativeness of observations, exercises and questions for a certain goal.

  • assumed is that the measuring-goal includes a domain of observations, exercises, or questions and that items are an a-select, representative sample. This can also concern the difficulty-level of items or the nature of items.
  • it can be determined by offering items of the measuring-instrument to potential respondents or expects in domain-descriptions, and telling them to sort items on domain-descriptions.
  • With high agreement between items and domain-descriptions within judges, the content-validity is high.
    This is especially important for tests and exams.

Criterium-validity – predicting value of measurement

Criterium-value: the (cor)relation between test-score and a psychological or social criterium.

  • The psychological criterium can be a psychological or medical judgment.
  • The criterium-value can be established in the present (simultaneous validity), in the past (postdiction) or in the future (prediction)
  • This validity can be established by researching test-score and criterium-score. The results can be shown in expectancy-tables, prediction-tables, or prediction-figures.

Process-validity – procedural measurement-quality

Process-validity: the manner in which the response of the test-taker is established.

  • an observation, exercise, or questions has a high process-validity if the behaviour the test-constructor wanted is performed by the respondent.
  • this validity can be established with thinking-out-loud protocols or with experiments with instructions.

Construct-validity – theoretical measurement-quality

Construct-validity: the judgement of the similarities between the hypothetical relations of the construct and other constructs, and the empirical showed relations between instruments that measure the constructs.

  • the expected relations must be similar to high empirical correlations between instruments that measure the construct.
  • the nomological network: the system of hypothetical relations around the construct.
    This can be a part of the theory.

Homogeneity or consistency-reliability

The homogeneity or consistency-reliability: the cohesion between the different (items) of a scale.
With psychological measurement, it is assumed the the items are repeated, independent measures of a trait.

  • The height of homogeneity-indices depends on the height of the interitemcorrelatoins and the number of items.

Generalizability

The generalizability of measurement-instruments can concern persons, circumstances and goals.
Groups of peopl
e may differ in such an extent that you are concerned that the same instrument will give different results for other groups of people.
Measurement-instruments can be given in such different circumstances that one instrument measures different properties.
The goal of measurement can be influential on the measured traits.

The validity and reliability/measurement-quality is principally independent of population and sample.

  • For each group of people that differ in one or more aspects, the validity and reliability must be established independently.

Borsboom: if the regularities on which the instrument is constructed are universal, the validity of the instrument is generalizable over groups.

Messick: measurement-quality or validity must be repeated for each group. It must be researched if the measurement-model and structure-model of relations between measurements is equal for all groups.

 

Join World Supporter
Join World Supporter
Log in or create your free account

Waarom een account aanmaken?

  • Je WorldSupporter account geeft je toegang tot alle functionaliteiten van het platform
  • Zodra je bent ingelogd kun je onder andere:
    • pagina's aan je lijst met favorieten toevoegen
    • feedback achterlaten
    • deelnemen aan discussies
    • zelf bijdragen delen via de 7 WorldSupporter tools
Follow the author: SanneA
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
Image
The JoHo Insurances Foundation is specialized in insurances for travel, work, study, volunteer, internships an long stay abroad
Check the options on joho.org (international insurances) or go direct to JoHo's https://www.expatinsurances.org

 

More contributions of WorldSupporter author: SanneA
WorldSupporter Resources
WSRt, critical thinking - a summary of all articles needed in the third block of second year psychology at the uva

WSRt, critical thinking - a summary of all articles needed in the third block of second year psychology at the uva

Image

This is a summary of the articles and reading materials that are needed for the third block in the course WSR-t. This course is given to second year psychology students at the Uva. The course is about thinking critically about scientific research and how such research is done. In total, nine articles are needed. The order in which the articles are shown bellow is the order in which they have been studied in the course.