Scientific & Statistical Reasoning – Summary interim exam 5 (UNIVERSITY OF AMSTERDAM)

This bundle contains everything you need to know for the fifth interim exam for the course "Scientific & Statistical Reasoning" given at the University of Amsterdam. It contains both articles, book chapters and lectures. It consists of the following materials:

Furr & Bacharach – Chapter 8

Furr & Bacharach – Chapter 9

Article: Cohen

Article: Cohen (Item response theory)

Field: Chapter 18

Bundle items:

"Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

There are four procedures to present the implications of a correlation in terms of our ability to use the correlations to make successful predictions:Binomial effect size display (dichotomous)This illustrates the practical consequences of using correlations to make decisions. It can show how many successful and unsuccessful predictions can be made on the basis of a correlation. It uses the following formula:Binomial effect size display can be used to translate a validity correlation into an intuitive framework. However, it frames the situation in terms of an ‘equal proportions’ situation. Taylor-Russell tables (dichotomous)These tables inform selection decisions and provide a probability that a prediction will result in a successful performance on a criterion. The size of the validity coefficient (1), selection proportion (2) and the base rate (3) are required for the tables. Utility analysisThis frames validity in terms of a cost-benefit analysis of test use. Analysis of test sensitivity and test specificityA test is evaluated in terms of its ability to produce correct identifications of a categorical difference. This is useful for tests that are designed to detect a categorical difference.Validity correlations can be evaluated in the context of a particular area of research or application.A nomological network refers to the interconnections between a construct and other related construct. There are several methods to evaluate the degree to which measures show convergent and discriminate associations:Focusses associationsThis method focusses on a few highly relevant criterion variables. This can make use of validity generalization. Sets of correlationsThis method focusses on a broad range of criterion variables and computes the correlations between the test and many criterion variables. The degree to which the pattern of correlations ‘makes sense’ given the conceptual meaning of the construct is evaluated.Multitrait-...

Lees verder over "Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

“Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses (e.g. to what degree does it measure what it is supposed to measure). Items of a test itself cannot be valid or invalid, only the interpretations can be valid or invalid.Validity is a property of the interpretation (1), it is a matter of degree (2) and the validity of a test’s interpretation is based on evidence and theory (3). Validity influences the accuracy of our understanding of the world, as research conclusions are based on the validity of a measure.Construct validity refers to the degree to which test scores can be interpreted as reflecting a particular psychological construct. Face validity refers to the degree to which a measure appears to be related to a specific construct, in the judgement of nonexperts, test takers and representatives of the legal system. Convergent validity refers to the degree to which test scores are correlated with tests of related constructs.Validity is important for the accuracy of our understanding of the world (1), decisions on societal level (e.g. laws based on ‘invalid’ research) and decisions on individual level (3) (e.g. college admissions).The validity of test score interpretation depends on five types of evidence: test content (1), consequences of use (2), association with other variables (3), response processes (4) and internal structure (5).Test content can be seen as content validity. There are two threats to content validity:A test including construct-irrelevant contentThe inclusion of content that is not relevant to the construct of interest reduces validity.Construct underrepresentationA test should include the full range of content that is relevant to the construct.Construct underrepresentation can be constrained by practical issues (e.g. time of a test). The internal structure of a test refers to the way the parts of a test are related to each other....

Lees verder over “Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

“Furr & Bacharach (2014). Scaling.” - Article summary

Scaling refers to assigning numerical values to psychological attributes. Individuals in a group should be similar to each other in the regard of sharing a psychological feature. There are rules to follow in order to put people in categories:People in a category must be identical with respect to the feature that categorizes the group (e.g. hair colour).The groups must be mutually exclusiveThe groups must be exhaustive (e.g. everyone in the population can fall into a category).Each person should fall into one category and not more than one. If numerals are used to indicate order, then the numerals serve as labels indicating rank. If numerals have the property of quantity, then they convey information about the exact amounts of an attribute. Units of measurement are standardized quantities. The three levels of groups are identity (1), order (2) and quantity (3).There are two possible meanings of the number zero. It can be the absolute zero (1) (e.g. a reaction time of 0ms) or it can be an arbitrary quantity of an attribute (2). This is called the arbitrary zero. The arbitrary zero does not represent the absence of anything, rather, it is a point on a scale to measure that feature. A lot of psychological attributes use the arbitrary zero (e.g. social skill, self-esteem, intelligence).An unit of measurement might be arbitrary because unit size may be arbitrary (1), some units of measurement are not tied to any one type of object (2) (e.g. centimetres can measure anything with a spatial property) and some units of measurement can be used to measure different features of the same object (3) (e.g. weight and length).One assumption of counting is additivity. This requires that unit size does not change. This would mean that an increase of one point is equal at every point. This is not always the case, as an IQ test asks increasingly difficult questions to increase one point of IQ. Therefore, the unit size changes.Counting only qualifies as...

Lees verder over “Furr & Bacharach (2014). Scaling.” - Article summary

"Cohen on item response theory” – Article summary

The item response theory (latent trait theory) provides a way to model the probability that a person with X ability will be able to perform at level of Y. It models the probability that a person with X amount of a personality trait will exhibit Y amount of that trait on a test that is supposed to measure is. This theory focusses on the relationship between a testtaker’s response to an individual test item and that testtaker’s standing on the construct being measured. Discrimination signifies the degree to which an item differentiates among people with higher or lower levels of the trait. Items can be given different weight in the item response theory. In classical test theory, there are no assumptions about the frequency distribution of test scores. There are several assumptions of the item response theory:UnidimensionalityThis assumption states that the set of items measures a single continuous latent construct. This assumption does not neglect minor dimensions, although assumes one dominant dimension underlying the structure.Local independenceThis assumption states that there is a systematic relationship between all of the test items and this relationship has to do with the level of a person on the construct of interest. If this assumption is met, then the differences in responses to items are reflective of differences in the underlying trait or ability.MonotonicityThis assumption states that the probability of endorsing or selecting an item response indicative of higher levels of the construct should increase if as the level of the underlying construct increases. Local dependence refers to the fact that items can be dependent on another factor than what the test as a whole is measuring. Locally dependent items have higher inter-item correlations and it may be controlled for by combining the responses to a set of locally dependent items into a separate subscale within the test. The theta level refers to the level of the underlying...

Lees verder over "Cohen on item response theory” – Article summary

Cohen on the science of psychological measurement” - Article summary

A utility analysis refers to a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment. It is done in order to see whether the benefits of using a test outweigh the costs of that test. The objective of a utility analysis determines the required information (1) and the specific methods that have to be used (2). One method of utility analysis is expectancy data. This is converting the test data to an expectancy table. It can provide a likelihood that a test taker will score within some interval of scores on a criterion measure. Taylor-Russel tables provide an estimate of the extent to which inclusion of a particular test in the selection system will improve selection. It gives an increase in base rate of successful performance that is associated with a particular level of criterion-related validity. The selection ratio is a numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired. The base rate refers to the percentage of people hired under the existing system. Top-down selection is a process of awarding available positions to applicants whereby the highest scorer is awarded the first position. A downside of top-down selection is that this may lead to unintended discriminatory effects. HitA correct classification MissAn incorrect classificationHit rateThe proportion of people that an assessment tool accurately identifies as possessing or exhibiting a particular trait, ability, behaviour or attribute. Miss rateThe proportion of people that an assessment tool inaccurately describes as possessing or exhibiting a particular trait, ability, behaviour or attribute. False positiveA specific type of miss whereby an assessment tool falsely indicates that the test taker possesses a trait.False negativeA...