Classical analysis of observed test scores - a summary of chapter 5 of A conceptual introduction to psychometrics by G, J., Mellenbergh

A conceptual introduction to psychometrics
Chapter 5
Classical analysis of observed test scores

Measured precision of observed test scores
Information on a single observed score
Reliability of observed test scores in a population
Some properties of classical test theory
Parameter estimation

Measured precision of observed test scores

Test scores are used in practical applications.

Measurement precision has two different aspects:

Information
Applies to the test score of a single person
The within-person aspect of measurement precision
Reliability
Applies to a population of persons.
The between-persons aspect of measurement precision

The concept of measurement precision applies to observed test scores as well as to latent variable scores.

Information on a single observed score

Functional thought experiment: fulfils a function within a theory.

True test score: the expected value of the observed test scores of the repeated test administrations in the thought experiment.
Test taker j’s true test score is the expected value of his (or her) independently distributed observed tst scores from (hypothetical) repeated administrations of the test to the test taker.

The observed test score is a variable that varies across repeated test administrations.
The true score is constant.

Error of measurement: the difference between test taker j’s observed test score and his (or her) true score.
Test taker j’s error of measurement on an arbitrary measurement occasion is ht difference between his (or her) observed test score and his (or her) true test score.
The expected value of the errors of measurement is 0.

The within-person error variance is an index for the precision of the measurement of a person’s true score.

Test taker j’s standard error of measurement: the square root of his (or her) within-person error variance.

Information: the reciprocal of a person’s within-person error variance.
A small amount of information means that Test taker j’s observed test scores vary widely around j’s true score across repeated test administrations.
A large amount of information means that j’s observed test scores do not vary widely around j’s true score.

Reliability of observed test scores in a population

Reliability: the differentiation of test scores of different test takers from a population.

Psychometrics uses two definitions of reliability

A theoretical definition
Operational definition.
Yields procedures to assess reliability.

Reliability concerns the differentiation between the true test scores of different test takers from a population.
The differentiation is good if test taker’s true scores can be precisely predicted from their observed test scores. Differentiation is bad if test taker’s true scores cannot be precisely predicted from their observed test scores.

Theoretical reliability: the reliability of the observed test scores is the squared product moment correlation between observed and true test scores in a population of persons.

Parallel tests: tests that measure (1) the same true score with (2) equal within-person error variance, and (3) uncorrelated errors across (hypothetical) repeated test administrations for each of the test takers of a population.

Operational reliability: the reliability of the observed test score is the product moment correlation between observed test scores of parallel tests in a population of persons.

Some properties of classical test theory

Classical test theory is based on the definitions of Test taker j’s true score, is error measurement and the generalization to a randomly selected person from a population of persons.

The standard error of measurement of a test

Standard error of measurement of a test: the square root of the error variance in the population of persons.

Lower bounds to reliability

The importance of a lower bound is that a high value of a lower bound implies that the theoretical reliability is high.

Cronbach’s coefficient alpha.

Test length and reliability

The reliability of a test depends on the number of test items.
Usually, a larger test is more reliable for measuring the same latent variable than a shorter test.
The length of an original n-item test can be doubled to a 2n-item test by adding a n-item parallel test to the original test.

The relation between the reliability of the doubled test and the original test is given by the spearman-brown formula for double test length.
This formula can also be used when a test is shortened. It is assumed that the test is shortened by removing parallel parts from the test.

Correlation corrected for attenuation

The correlation between two tests, is attenuated by the errors of measurement of each of the two observed test scores.
The product moment correlation between the true scores of the two tests in a population of persons.

Signal-to-noise ratio

Signal-to-noise ratio: the ratio of the true score variance and the error variance of the test in a population of persons.

Parameter estimation

The classical theory of psychometics uses two different types of populations

The population of observed test scores across repeated test administrations.
Infinite and defined for each of the test takers by using a functional thought experiment.
The population of persons
Finite, it exists of N persons.
Characterized by parameters that are also defined at two levels
- Parameters at the level of the individual test taker
- Parameters at the population of persons level

In statistics, population parameters are estimated from sample data.
The sample data are summarized in a statistic that is used to estimate the corresponding population parameter.
The sample mean is called the estimator of the population mean.

Estimation of population parameters

The population parameters are:

Mean (expected value)
Variance
Reliability
Standard error of measurement of the test

These parameters have to be estimated from a sample of persons from the population of interest.
The number of persons in the sample is N_s.
Number of persons in the population is N.

The theoretical reliability can be estimated in two different ways:

Estimation from parallel test correlation
Estimation from split-half subtest correlation

The standard error of measurement of a test can be estimated in the following ways:

Estimation from the reliability
Estimation from the within-person error variances

Access:

Public

Verzekeren bij een faire en solidaire zorgverzekeraar?

Join WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant tolerant and sustainable world. Through physical and online platforms, it support personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.