What is the Item Response Theory (IRT) and which models are there? - Chapter 14

What is IRT?
What is item discrimination?
Which IRT models are there?
Which parameters can you estimate?
How can you describe the characteristics of the test as a whole?
For which purposes can IRT be applied?

What is IRT?

The Item Response Theory (IRT) is an alternative to the classical test theory (CTT). The IRT identifies and analyzes the measurements in behavioral sciences. The reaction of the individual to a certain test item is influenced by characteristics of the individual ( trait level ) and properties of the item (difficulty level).

For a difficult item/question someone needs a high ' trait level' to be able to give a correct answer.
Conversely, with an easy item / question, someone with a low ' trait level' is enough to give a good answer.

Example:

Statement 1: I like to chat with my friends.
Statement 2: I like to speak to a large audience.

Statement 1 needs a low extraversion level (= trait level) to agree with this.
Statement 2 needs a high extraversion level (= trait level) to agree with this.

IRT analysis has a distribution of (0; 1), the average is 0, and the standard deviation is 1.

So if an item has a difficulty level of 0 then:

Has an individual with an average trait level (so 0), 50% chance of a correct answer.
Has an individual with a high trait level (therefore higher than 0), a greater chance than 50% of a correct answer.
Has an individual with a low trait level (therefore lower than 0), a smaller chance than 50% of a correct answer.

What is item discrimination?

Item discrimination refers to distinguishing individuals in low and high trait levels. The discrimination value of the item indicates the relevance of the item in relation to the trait level being measured.

Positive discrimination ≥ 0: relationship between item and trait (property) that is being measured. This means that high trait scores provide a greater chance of answering the item correctly and low trait scores provide a smaller chance of answering the item correctly.
Negative discrimination ≤ 0: inconsistency between item and trait . This means that high trait scores provide a smaller chance to answer the item correctly.
Discrimination value = 0: no relationship between item and trait (property) that is measured by the test.

So: the greater (positive) the discrimination value, the more consistent, the better.

A third component that must be taken into account is gambling. With multiple choice or true / false questions, people might gamble if they don't know the answer. Because of this, they sometimes give the correct answer while they actually did not know the correct answer. IRT can include gambling as a component in the analysis.

Which IRT models are there?

According to the IRT perspective we can identify the components that influence the likelihood that a person will react to a certain item in a certain way. A measurement model expresses the relationship between the outcome (the response of an individual to a certain item) and the components that influence the outcome (the skills of the person, the quality of the item). There are different measurement models, each expressing this link in their own way. In other words; IRT models show the mathematical link between the observed scores and the components that influence the scores. These are both the characteristics of the individual and the characteristics of the item. In this section we will discuss the most common IRT models.

The one-parameter model (1PL): The Rasch model

The Rasch model (one-parameter logistic model) (= 1PL) only has the properties of the individual and the properties of the item as components that influence the scores.

P(Xis=1| Өs, βi) = (e ^{(Өs – βi)}) / (1 + e ^{(Өs – βi)} )

P = chance of a certain answer on item i of respondent s.

X is = response X to item i of respondent s. " X is = 1" indicates a correct answer for this item.

= S = trait level of respondent s.

β i = difficulty value item i.

e = logarithm, you can find this on your calculator.

The two-parameter (2PL) model

The two-parameter model (2PL) has three components that influence the scores, namely the characteristics of the individual, the characteristics of the item and the item discrimination.

The formula here is:

P(Xis=1| Өs, βi, αi) = (e ^{(αi (Өs – βi))} / (1 + e ^{(αi (Өs – βi))} )

α = the discrimination of item i.

The three-parameter (3PL) model

The chance of gambling is also included in the three-parameter model . The 3PL model can be seen as a variation on the 2PL model, where one component has been added (the chance of gambling): c i refers to the lower chance of answering item i correctly . According to the 3PL model, the chance of a correct answer is therefore influenced by:

The characteristics of the individual, i.e., the " trait level" Ө;
the item difficulty β;
the item discrimination α;
the "gamble parameter".

Graded Response Model

The 1PL and 2PL model are made for items with binary answer options. The Graded Response Model (GRM) is made for testing, etc. with more than two answer options. As with previous models, this model assumes that a person's response to an item is affected by that person 's trait level, item difficulty, and item discrimination. But the GRM has different difficulty parameters for one item.

If there are m number of answer options or categories, a distinction can be made m-1 time between answer options. For example, for an item with five answer options (strongly disagree, disagree, neutral, agree, totally agree) there are four differences. Such as the difference between 'agree' and 'totally agree'. Each of these differences can be represented in the following way:

P(Xis ≥ j| Өs, βij, αi) = (e ^{(αi (Өs – βi))}) / (1 + e ^{(αi (Өs – βi))} )

J = the answer option.

βij = difficulty parameter for answer option j on item i.

Other parameters are the same as with the previous models.

P is the chance that a person with trait level s on item i will choose answer option j or higher.

There are m - 1 difficulty parameters (βij) for each item.

You can also calculate the chance that someone will choose a specific answer to a certain item:

P(Xis = j| Өs, βij, αi) = P(Xis ≥ j – 1| Өs, βij, αi) - P(Xis ≥ j| Өs, βij, αi).

J = the answer option (eg completely agree).

J - 1 = the answer option for it (eg agree).

Which parameters can you estimate?

Proportion of correctly answered items for each respondent = divide the proportion of correctly answered items by the total number of answered items.
Trait level: Ө s = LN ( Ps / 1-Ps)
Ps = proportion of correctly answered items by respondent s.
LN = (natural) logarithm
Proportion of correct responses for each item: divide the number of respondents who answered correctly by the total number of respondents who responded.

Item difficulty: βi = LN (1-Pi / Pi)
Pi = proportion of correct responses / correct answers for item i
LN = ( natural ) Log

How can you describe the characteristics of the test as a whole?

Item characteristic curve (ICC)

An item characteristic curve gives the chance of a correct answer to an item for a person with a certain trait level.

x-axis: trait level (with 0.00 = average)
y-axis: chance of correct answer (between 0.00 and 1.00)
from left to right à easiest item (left) à hardest item (right)

An example of the item characteristic curves of four items from a test is shown below.

In this example , the item discrimination parameter is the largest for item 1. Suppose a person has a skill of Ө = 6, then the chance of success (ie, a correct answer) for item 1 is great, but for items 3 and 4 low (even almost 0). Suppose a person has a skill of Ө = 5, then the most likely score pattern (order item 1, item 2, item 3, item 4, where 1 = right and 0 = wrong): 1, 1, 0, 0.

Item information and test information

Perspective of the CTT: there is a single reliability for a test.

Perspective of the IRT: there is more than one reliability. The psychometric quality of a test is better in some people than in others. So a test may give better information for some trait levels than other trait levels.

For example if there are two difficult questions and four respondents: two of them have a low trait level, the other two have a high trait level. The test then provides more information about the two people with high trait levels. The people with low trait levels both answer the difficult questions incorrectly, so even if they have a different low trait level you won't see that on this test. For the two people with the high trait levels, one of them may answer one item correctly and the other answer both items correctly. The test therefore provides more information about people with high trait levels, because small differences in trait level are noted in this group.
Item information can be calculated using the following formula:

I (Ө) = Pi (Ө) (1 - Pi (Ө))

I (Ө) is the item information on a certain trait level (Ө).

Pi (Ө) is the chance that a respondent with a certain trait level will answer the item correctly.

Higher item information values indicate a better psychometric quality of the item.

If we calculate information values for different trait levels then we can display these in an item information curve. Higher curves indicate better quality. The top of a curve represents the trait level at which the item provides the most information.

Item information values of a specific trait level can be added together to determine the test information value of that trait level. If we calculate test information scores for multiple trait levels, we can display them in a test information curve. From this you can read how much information t

For which purposes can IRT be applied?

IRT is a theoretical perspective that is used for different purposes in psychological measurements. A number of applications of IRT are:

Evaluation and improvement of psychometric properties of items and tests.
Evaluate the presence of differential item functioning (DIF). DIF is when the properties of an item in one group are different than in another group. For example a man and a woman with the same trait level have a different chance to answer the item correctly.
Analyzing Person Fit. This is an attempt to identify people whose response pattern does not match the patterns of responses expected on a set of items.
Computerized Adaptive Testing (CAT). CAT is a method that is used to accurately and efficiently determine someone's trait level by conducting computer-controlled testing. The test adjusts the questions to someone's trait level. If you have answered a question correctly, the next question is more difficult, if you answer it correctly, you will get a more difficult question, if you answer the difficult question incorrectly, you will get an easier question. In this way someone can determine his trait level quicker.

For a difficult item / question, someone needs a high ' trait level' to be able to give a correct answer.
Conversely, with an easy item / question, someone with a low ' trait level' is enough to give a good answer.

Access:

Public

Check more: click and go to more related summaries or chapters

Summary of Psychometrics: An Introduction by Furr - 3rd edition

What is psychometrics? - Chapter 1

What is important when assigning numbers to psychological constructs? - Chapter 2

What are variability and covariability? - Chapter 3

What is dimensionality and what is factor analysis? - Chapter 4

What is reliability? - Chapter 5

How to empirically estimate the reliability? - Chapter 6

What is the importance of reliability? - Chapter 7

What is validity? - Chapter 8

How to evaluate evidence for convergent and divergent validity? - Chapter 9

What types of response bias are there? - Chapter 10

What types of test bias are there? - Chapter 11

What is a confirmatory factor analysis? - Chapter 12

What is the generalizability theory? - Chapter 13

What is the Item Response Theory (IRT) and which models are there? - Chapter 14

Join: WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant and sustainable world. Through physical and online platforms, it supports personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.