Item Response Theory - summary of an part the science of psychological measurement by Cohen

Critical thinking
Article: Cohen
Item response theory (IRT)


Item response theory (IRT)

The procedures of item response theory provide a way to model the probability that a person with X ability will be able to perform at a level of Y.

Because so often the psychological or educational construct being measured is physically unobservable (latent), and because the construct being measured may be a trait, a synonym for IRT is latent-trait theory.

IRT is not a term used to refer to a single theory or method.
It refer to a family of theories and methods, and quite a large family at that, with many other names used to distinguish specific approaches.

Difficulty: the attribute of not being easily accomplished, solved, or comprehended.
Discrimination: the degree to which an item differentiates among people with higher levels or lower levels of the trait, ability, or whatever it is being measures.

A number of different IRT models exists to handle data resulting from the administration of tests with various characteristics and in various formats.

  • Dichotomous test items: test items or questions that can be answered with only one or two alternative responses.
  • Polytomous test items: test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct.

Other IRT models exits to handle other types of data.

In general, latent-trait models differ in some important ways from CTT.

  • In CTT, no assumptions are made about the frequency distribution of test scores.

Such assumptions are inherent in latent-trait models.
Rasch model: an IRT model with very specific assumptions about the underlying distribution.

Assumptions in using IRT

Three assumptions regarding data to be analysed within an IRT framework.

  • Unidimensionality
  • Local independence
  • Monotonicity

Unidimensionality

The unidimensionality assumption: the set of items measures a single continuous latent construct.
This construct is referred to by the Greek letter theta (θ).
It is a person’s theta level that gives rise to a response to the items in the scale.
Theta level: a reference to the degree of the underlying ability or trait that the test-taker is presumed to bring to the test.

The assumption of unidimensionality does not preclude that the set of items may have a number of minor dimensions (which, in turn, may be measured by subscales).
It does assume that one dominant dimension explains the underlying structure.

Local independence

Local dependence: items are all dependent on some factor that is different from what the test as a whole is measuring. Items are locally dependent if they are more related to each other than to the other items on the test.
Locally dependent items have high inter-item correlations.
In an effort to control for such local dependence, test developers may sometimes combine the responses to a set of locally dependent items into a separate subscale within the test.

The assumption of local independence: a) there is a systematic relationship between all of the test items and b) that relationship has to do with the theta level of the test-taker.
When the assumption is met, it means that differences in responses to items are reflective of differences in the underlying trait or ability.

Monotonicity

The assumption of monotonicity: the probability of endorsing or selecting an item response indicative of higher levels of theta should increase as the underlying level of theta increases.

IRT models tent to be robust. They tent to be resistant to minor violations of these three assumptions.
In the ‘real world’ it is difficult, if not impossible, to find data that rigorously conforms to these assumptions.
The better the data meets these three assumptions, the better the IRT model will fit the data and shed light on the construct being measured.

IRT in practice

Item characteristic curve (ICC), an item response curve, a category response curve, or an item trace line: the expression in graphic form of the probabilistic relationship between a test-taker’s response to a test item and that test-taker’s level on the latent construct being measured.

In theory, theta scores could range from negative infinity to positive infinity.

An useful feature of IRT is that it enables test users to better understand the range over theta for which an item is most useful in discriminating among groups of test-takers.
Information function: the IRT tool used to make such determinations.
Graphs of the information function provide insight into what items work best with test-takers at a particular theta level as compared to other items on the test.
Traditionally in such graphs, theta is set on the horizontal axis and information magnitude (precision) on the vertical axis.

Information in IRT: the precision of measurement.
The more information, the better the predictions made.
An item information curve can be a very useful tool for test developers.

  • To reduce the total numbers of test items in a ‘long form’ of a test and so create a new and effective ‘short form’. Shorter versions of tests are created through selection of the most informative set of items. Shorter versions are created through selection of the most informative set of items that are relevant for the population under study.
  • Can be useful in raising ‘red flags’ regarding test items that are particularly low in information, test items that evidence relatively little ability to discriminate between test-takers.
    Items with low information prompt the test developer to consider the possibility that:
    - the content of the item does not match the construct measured by the other items in the scale
    - the item is poorly worded and needs to be rewritten
    - the item is too complex for the educational level of the population
    - the placement of the item in the test is out of context
    - cultural factors may be operating to weaken the item’s ability to discriminate between groups

Under the IRT framework, the precision of a scale varies depending on what levels of the construct are being measured.

Join World Supporter
Join World Supporter
Log in or create your free account

Waarom een account aanmaken?

  • Je WorldSupporter account geeft je toegang tot alle functionaliteiten van het platform
  • Zodra je bent ingelogd kun je onder andere:
    • pagina's aan je lijst met favorieten toevoegen
    • feedback achterlaten
    • deelnemen aan discussies
    • zelf bijdragen delen via de 7 WorldSupporter tools
Follow the author: SanneA
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
vacatures

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

More contributions of WorldSupporter author: SanneA
WorldSupporter Resources
WSRt, critical thinking - a summary of all articles needed in the third block of second year psychology at the uva

WSRt, critical thinking - a summary of all articles needed in the third block of second year psychology at the uva

Image

This is a summary of the articles and reading materials that are needed for the third block in the course WSR-t. This course is given to second year psychology students at the Uva. The course is about thinking critically about scientific research and how such research is done. In total, nine articles are needed. The order in which the articles are shown bellow is the order in which they have been studied in the course.