A conceptual introduction to psychometrics by G, J., Mellenbergh - a summary
- 4321 reads
A conceptual introduction to psychometricsChapter 4Observed test scoresThe aim of testing is to yield scores of test takers’ maximum or typical performance.Two main types of test scores are distinguishedObserved testComputed after the separate test items are scored.Derived from the item scores by taking the unweighted or weighted sum of the item scores.The latent variable is unobserved, and in general, the laten variable is not a simple sum of item scores.Latent variable (construct) scoresTo compute the latent variable score, a model is needed that specifies the relation between the latent variable and item responses.The latent variable score is derived from the item responses under the assumption of a latent variable item response model. Conventionally, items are scored by assigning ordinal numbers to the responses.The scoring differs slightly between maximum and typical performance tests.Maximum performance items are scored by assigning 0 to the lowest category, and consecutive rank numbers to subsequent categories.Typical performance items are indicative or contra-indicative of the latent variable that is measured by the test, and the scoring of contra-indicative item has to be reversed with respect to the scoring of indicative items.Dichotomous indicative typical performance items are scored assigning 0 to the ‘no’ (don’t agree), and 1 to the yes (agree) categorie.Whereas contra-indicative items are scored by assigning 0 to the ‘yes’, and 1 to the ‘no’ category.The categories of ordinal-polytomous items are scored by assigning rank numbers to the categories.Bounded-continuous items ares cored in measurement...
JoHo can really use your help! Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world
A conceptual introduction to psychometrics
Chapter 1
Introduction
Psychometric terminology sometimes differs depending on the types of test applications.
A psychological or educational test: an instrument for the measurement of a person’s maximum or typical performance under standardized conditions, where the performance is assumed to reflect one or more latent attributes.
Tests are distinguished form surveys. It is not assumed that survey questions reflect a latent attribute.
Subtest: an independent part of a test.
A (sub)test consists of one or more items.
Item: the smallest possible subtest of a test. The building blocks of a test.
A test consists of n items, and is called a n-item test.
One or more latent attributes effect test performance.
The number of latent attributes is the dimensionality of the test.
Dimensionality: equal to the number of latent attributes (variables), which effects test performance.
Unidimensional test: a test that predominantly measures one latent attribute.
Multidimensional test: a test that measures more than one latent attribute.
Two-dimensional test: a test that measures two latent attributes. And so on…
Psychological and educational measurement instruments are divided into:
Maximum perfromance tests
A performance can be considered maximum in two different respects. If the performance is accurate and if the performance is fast.
Classified according to time:
A conceptual introduction to psychometrics
Chapter 2
Developing maximum performance tests
Seven elements
The test developer must specify the latent variable of interest that has to be measured by the test.
Latent variable is a general term. The term construct is used when a subsantive interpretation is given of the latent variable.
The latent variable (construct) is assumed to effect test makers’ item responses and test scores.
Constructs can vary in many different ways.
A good way to start a test development project is to define the construct that has to be measured by the test.
This definition describes the construct of interest, and distinguished it from other, related, constructs.
Usually, the literature on the construct needs to be studies before the definition can be given. Frequently the definition can only be given when other elements of the test development plan are specified.
Different modes can be used to measure constructs.
The test developer must specify the objectives of the test. Tests are used for many different purposes.
Target population: the set of persons to whom the test has to be applied.
The test developer must define the target population, and must provide criteria for the inclusion and exclusion of persons.
A target population can be split into distinct subpopulations. The test developer must specify whether subpopulations need to be distinguished. And, if so, they need to define the subpopulations, and to provide criteria
A conceptual introduction to psychometrics
Chapter 3
Typical performance tests
Typical performance tests assess behavior that is typical for the person.
These tests are used to measure attitudes, interests, values, opinions, and personality characteristics.
The test developer has to specify the latent variable of interest that is assumed to effect test takers’ item responses and test scores.
The ususal constructs of interest of typical performance tests are:
The responses to typical performance tests are not evaluated on their correctness, but are considered to typify a person.
At the start of a test development project, the researcher needs information on the construct of interest. This information can be obtained from different sources
A study of the literature on the construct and existing measurement instruments is nearly always needed at the start of a test development project
Different types of research can be done on the construct.
The test developer can use information from different sources to define the construct and, later on the test development process, he or she can use this information for item writing.
Each of these four modes can occur in tow different varieties
The reactive/nonreactive distinction is only used for typical performance measurements, and not for maximum performance measurements.
A maximum performance test asks test takers to do the best they can to perform the task.
Each of the four response modes can occur in two versions
Self-report mode
Test takers are asked to respond to questions or stimuli to assess their
A conceptual introduction to psychometrics
Chapter 4
Observed test scores
The aim of testing is to yield scores of test takers’ maximum or typical performance.
Two main types of test scores are distinguished
Conventionally, items are scored by assigning ordinal numbers to the responses.
The scoring differs slightly between maximum and typical performance tests.
Measurement by fiat: the item scores are assigned to a test taker’s responses without any theoretical justification.
(for example, scores 0 and 1 are assigned to a correct and incorrect answer, ad the scores 1, - 5 are based on convention (by fiat) and are not based on psychometric theory)
The score of the jth test taker on the kth item is indicated by Xjk. The conventional test score of the jth test taker on a n-item test is the unweighed sum of his (or her) item scores:
Usj = Xj1 + Xj2 +… + Xjn
It may be argued that items differ in imporance, and that they should be weighted differently.
The weighed sum score of the jth item on an n-item test is:
Wsj = w1Xj1 + w2Xj2 + … + wnXjn
w1 is the weight assinged to the first item and so on.
A problem with
A conceptual introduction to psychometrics
Chapter 5
Classical analysis of observed test scores
Test scores are used in practical applications.
Measurement precision has two different aspects:
The concept of measurement precision applies to observed test scores as well as to latent variable scores.
Functional thought experiment: fulfils a function within a theory.
True test score: the expected value of the observed test scores of the repeated test administrations in the thought experiment.
Test taker j’s true test score is the expected value of his (or her) independently distributed observed tst scores from (hypothetical) repeated administrations of the test to the test taker.
The observed test score is a variable that varies across repeated test administrations.
The true score is constant.
Error of measurement: the difference between test taker j’s observed test score and his (or her) true score.
Test taker j’s error of measurement on an arbitrary measurement occasion is ht difference between his (or her) observed test score and his (or her) true test score.
The expected value of the errors of measurement is 0.
The within-person error variance is an index for the precision of the measurement of a person’s true score.
Test taker j’s standard error of measurement: the square root of his (or her) within-person error variance.
Information: the reciprocal of a person’s within-person error variance.
A small amount of information means that Test taker j’s observed test scores vary widely around j’s true score across repeated test administrations.
A large amount of information means that j’s observed test scores do not vary widely around j’s true score.
Reliability: the differentiation of test scores of different test takers from a population.
Psychometrics uses two definitions of reliability
Reliability concerns the differentiation between the true test scores of different test takers from a population.
The differentiation is good if test taker’s true scores can be precisely predicted from their observed test
A conceptual introduction to psychometrics
Chapter 6
Classical analysis of item scores
The conventional way of scoring items is by assigning ordinal numbers to the response categories.
Usually, these item scores are ordered with respect to the attribute that the item is assumed to measure. But, these assignment of these ordinal numbers lacks a theoretical justification.
Usually, the analysis of test scores is supplemented by an analysis of the item scores.
The scores of a given item have a distribution in a population of N persons.
Classical item difficulty and attractiveness
The location of the item score distribution is used to define the classical item difficulty (maximum performance tests) and classical item attractiveness (typical performance tests) concepts.
The two definitions are the same.
Classical item difficulty and attractiveness are defined in a population of persons.
Population-dependent and may differ between populations.
The mean in mainly used for this.
The mean of a dichotomously scored item is called the item p-value.
Item score variance and standard deviation
The most common parameters that are used in classical item score analysis are the variance and the standard deviation of the item scores.
Items that have a small item score variance, have little effect on the test score variance.
The variance of dichotomous item scores is a function of the item p-value.
For a given sample size, the variance has its maximum value at p=.5.
Location and dispersion parameters yield useful information on the items of a test.
But, these parameters do not indicate the extent to which an item contributes to the aim of a test to assess individual differences in the attribute that is measured by the test.
Classical item discrimination: a parameter that indicates the extent to which the item differentiates between the true test scores of a population of persons.
Defined in a population of persons, may vary between different populations.
The item-test and item-rest correlations
An appropriate index for discrimination between the true scores would be the product moment correlation between the item score and the true score in the population of persons.
Test taker j’s observed
In this bundle, the literature of the course test theory and practice is bundled.
There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.
Do you want to share your summaries with JoHo WorldSupporter and its visitors?
Field of study
JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld
Je vertrek voorbereiden of je verzekering afsluiten bij studie, stage of onderzoek in het buitenland
Study or work abroad? check your insurance options with The JoHo Foundation
Add new contribution