What is the Item Response Theory (IRT) and which models are there? - Chapter 14

What is IRT?

The Item Response Theory (IRT) is an alternative to the classical test theory (CTT). The IRT identifies and analyzes the measurements in behavioral sciences. The reaction of the individual to a certain test item is influenced by characteristics of the individual ( trait level ) and properties of the item (difficulty level). 

  • For a difficult item/question someone needs a high ' trait level' to be able to give a correct answer.
  • Conversely, with an easy item / question, someone with a low ' trait level' is enough to give a good answer.

Example:

Statement 1: I like to chat with my friends. 
Statement 2: I like to speak to a large audience.

Statement 1 needs a low extraversion level (= trait level) to agree with this. 
Statement 2 needs a high extraversion level (= trait level) to agree with this.

IRT analysis has a distribution of (0; 1), the average is 0, and the standard deviation is 1.

So if an item has a difficulty level of 0 then:

  • Has an individual with an average trait level (so 0), 50% chance of a correct answer.
  • Has an individual with a high trait level (therefore higher than 0), a greater chance than 50% of a correct answer.
  • Has an individual with a low trait level (therefore lower than 0), a smaller chance than 50% of a correct answer.

What is item discrimination?

Item discrimination refers to distinguishing individuals in low and high trait levels. The discrimination value of the item indicates the relevance of the item in relation to the trait level being measured.

  • Positive discrimination ≥ 0: relationship between item and trait (property) that is being measured. This means that high trait scores provide a greater chance of answering the item correctly and low trait scores provide a smaller chance of answering the item correctly.
  • Negative discrimination ≤ 0: inconsistency between item and trait . This means that high trait scores provide a smaller chance to answer the item correctly.
  • Discrimination value = 0: no relationship between item and trait (property) that is measured by the test.

So: the greater (positive) the discrimination value, the more consistent, the better.

A third component that must be taken into account is gambling. With multiple choice or true / false questions, people might gamble if they don't know the answer. Because of this, they sometimes give the correct answer while they actually did not know the correct answer. IRT can include gambling as a component in the analysis.

Which IRT models are there?

According to the IRT perspective we can identify the components that influence the likelihood that a person will react to a certain item in a certain way. A measurement model expresses the relationship between the outcome (the response of an individual to a certain item) and the components that influence the outcome (the skills of the person, the quality of the item). There are different measurement models, each expressing this link in their own way. In other words; IRT models show the mathematical link between the observed scores and the components that influence the scores. These are both the characteristics of the individual and the characteristics of the item. In this section we will discuss the most common IRT models.       

The one-parameter model (1PL): The Rasch model 

The Rasch model (one-parameter logistic model) (= 1PL) only has the properties of the individual and the properties of the item as components that influence the scores.

P(Xis=1| Өs, βi) = (e (Өs – βi) ) / (1 + e (Өs – βi)

P = chance of a certain answer on item i of respondent s.

X is = response X to item i of respondent s. " X is = 1" indicates a correct answer for this item.

= S = trait level of respondent s.

β i = difficulty value item i.

e = logarithm, you can find this on your calculator.

The two-parameter (2PL) model 

The two-parameter model (2PL) has three components that influence the scores, namely the characteristics of the individual, the characteristics of the item and the item discrimination.

The formula here is:

P(Xis=1| Өs, βi, αi) = (e (αi (Өs – βi)) / (1 + e (αi (Өs – βi))

α = the discrimination of item i.

The three-parameter (3PL) model

The chance of gambling is also included in the three-parameter model . The 3PL model can be seen as a variation on the 2PL model, where one component has been added (the chance of gambling): refers to the lower chance of answering item correctly . According to the 3PL model, the chance of a correct answer is therefore influenced by: 

  1. The characteristics of the individual, i.e., the " trait level" Ө; 
  2. the item difficulty β; 
  3. the item discrimination α; 
  4. the "gamble parameter".

Graded Response Model

The 1PL and 2PL model are made for items with binary answer options. The Graded Response Model (GRM) is made for testing, etc. with more than two answer options. As with previous models, this model assumes that a person's response to an item is affected by that person 's trait level, item difficulty, and item discrimination. But the GRM has different difficulty parameters for one item.

If there are m number of answer options or categories, a distinction can be made m-1 time between answer options. For example, for an item with five answer options (strongly disagree, disagree, neutral, agree, totally agree) there are four differences. Such as the difference between 'agree' and 'totally agree'. Each of these differences can be represented in the following way:

P(Xis ≥ j| Өs, βij, αi) = (e (αi (Өs – βi))) / (1 + e (αi (Өs – βi)) )

J = the answer option.

βij = difficulty parameter for answer option j on item i.

Other parameters are the same as with the previous models.

P is the chance that a person with trait level s on item i will choose answer option j or higher.

There are m - 1 difficulty parameters (βij) for each item.

You can also calculate the chance that someone will choose a specific answer to a certain item:

P(Xis = j| Өs, βij, αi) = P(Xis ≥ j – 1| Өs, βij, αi) - P(Xis ≥ j| Өs, βij, αi).

J = the answer option (eg completely agree).

J - 1 = the answer option for it (eg agree).

Which parameters can you estimate?

  • Proportion of correctly answered items for each respondent = divide the proportion of correctly answered items by the total number of answered items.
     
  • Trait level: Ө s = LN ( Ps / 1-Ps) 
    Ps = proportion of correctly answered items by respondent s. 
    LN = (natural) logarithm
     
  • Proportion of correct responses for each item: divide the number of respondents who answered correctly by the total number of respondents who responded.
  • Item difficulty: βi = LN (1-Pi / Pi) 
    Pi = proportion of correct responses / correct answers for item i 
    LN = ( natural ) Log

How can you describe the characteristics of the test as a whole?

Item characteristic curve (ICC)

An item characteristic curve gives the chance of a correct answer to an item for a person with a certain trait level.

  • x-axis: trait level (with 0.00 = average)
  • y-axis: chance of correct answer (between 0.00 and 1.00)
  • from left to right à easiest item (left) à hardest item (right)

An example of the item characteristic curves of four items from a test is shown below.

In this example , the item discrimination parameter is the largest for item 1. Suppose a person has a skill of Ө = 6, then the chance of success (ie, a correct answer) for item 1 is great, but for items 3 and 4 low (even almost 0). Suppose a person has a skill of Ө = 5, then the most likely score pattern (order item 1, item 2, item 3, item 4, where 1 = right and 0 = wrong): 1, 1, 0, 0.  

Item information and test information

Perspective of the CTT: there is a single reliability for a test.

Perspective of the IRT: there is more than one reliability. The psychometric quality of a test is better in some people than in others. So a test may give better information for some trait levels than other trait levels.

For example if there are two difficult questions and four respondents: two of them have a low trait level, the other two have a high trait level. The test then provides more information about the two people with high trait levels. The people with low trait levels both answer the difficult questions incorrectly, so even if they have a different low trait level you won't see that on this test. For the two people with the high trait levels, one of them may answer one item correctly and the other answer both items correctly. The test therefore provides more information about people with high trait levels, because small differences in trait level are noted in this group. 
Item information can be calculated using the following formula:

I (Ө) = Pi (Ө) (1 - Pi (Ө))

I (Ө) is the item information on a certain trait level (Ө).

Pi (Ө) is the chance that a respondent with a certain trait level will answer the item correctly.

Higher item information values ​​indicate a better psychometric quality of the item.

If we calculate information values ​​for different trait levels then we can display these in an item information curve. Higher curves indicate better quality. The top of a curve represents the trait level at which the item provides the most information.

Item information values of a specific trait level can be added together to determine the test information value of that trait level. If we calculate test information scores for multiple trait levels, we can display them in a test information curve. From this you can read how much information t

For which purposes can IRT be applied? 

IRT is a theoretical perspective that is used for different purposes in psychological measurements. A number of applications of IRT are:

  • Evaluation and improvement of psychometric properties of items and tests.
  • Evaluate the presence of differential item functioning (DIF). DIF is when the properties of an item in one group are different than in another group. For example a man and a woman with the same trait level have a different chance to answer the item correctly.
  • Analyzing Person Fit. This is an attempt to identify people whose response pattern does not match the patterns of responses expected on a set of items.
  • Computerized Adaptive Testing (CAT). CAT is a method that is used to accurately and efficiently determine someone's trait level by conducting computer-controlled testing. The test adjusts the questions to someone's trait level. If you have answered a question correctly, the next question is more difficult, if you answer it correctly, you will get a more difficult question, if you answer the difficult question incorrectly, you will get an easier question. In this way someone can determine his trait level quicker.

The Item Response Theory (IRT) is an alternative to the classical test theory (CTT). The IRT identifies and analyzes the measurements in behavioral sciences. The reaction of the individual to a certain test item is influenced by characteristics of the individual ( trait level ) and properties of the item (difficulty level). 

  • For a difficult item / question, someone needs a high ' trait level' to be able to give a correct answer.
  • Conversely, with an easy item / question, someone with a low ' trait level' is enough to give a good answer.

Image

Access: 
Public

Image

Join WorldSupporter!
Search a summary

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Activity abroad, study field of working area:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Psychology Supporter
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
1472