Summary - Psychological Testing and Assessment - Van der Molen - Chapter 1 & 2

Which applications and consequences are part of psychological testing? - Chapter 1

Topic 1A: The nature and uses of psychological testing

Consequences of testing

During their entire lifetime, people are being tested. Examples of tests are the Apgar test for measuring the health of infants, driving and school examinations in adolescence, developmental tests, etc. Tests are being administered across many diverse contexts, and during our lifetime we are likely to have participated in so many tests that by the time of retirement they have had a major impact on our life course. A thorough knowledge of tests is therefore necessary for everyone in the field of psychology. Someone who develops and evaluates tests within the fields of psychology or education is called a psychometrist. Personality and intelligence tests are currently the most essential tests in psychology.

Definition of a test

Tests can be very different in their purposes and format, but in general they share the following characteristics:

A test is a standardized procedure for identifying behavior and describing it by means of categories or scores. There are a number of defining characteristics of tests. Firstly, a test is standardized, which means that the procedures for undertaking it are the same in different settings and under different conditions. Secondly, a test is based on a sample of the behavior that is to be measued. The items within the test do not have to cover the entire behaviour being viewed, as long as they are relevant in a way that permits making inferences about the behaviour.  It is important that the behavior in the sample represents the behavior that is predicted by the test. Thirdly, it must be possible to derive categories or scores from the test. A certain amount of measurement error must always be taken into account: X = T + e, where X is the observed score, T is the true score and e is the error. A test developer should try to make e as small as possible.

It should also be remembered that the abstract characteristic that is measured by a test does not represent a physical 'something' in the world. Fourthly, it is necessary to establish a standard with which scores of participants can be compared. This is done by means of a standardisation sample, whereby this sample must be representative of the population for which the test is intended. The standard indicates when people deviate. Finally, tests are intended to predict nontest behaviors. A test can therefore have more than one goal and goals can differ from the actual behaviour being measured by the test. To know whether the behavior is actually predicted by the test is done through validation research, which is mostly being conducted after the test has been released.

Distinctions in testing

The majority of tests are norm-referenced, whereby the score of each participant is interpreted in comparison with a relevant standardized sample. Other tests are criterion-referenced, where the aim is to determine where a participant stands with regard to clearly defined criteria. School exams are an example of the latter category, because students’ scores are classified into a predetermined grade system (instead of making a comparison with a reference group).

Another important distinction is that between assessment and testing. Assessment is a term used for more comprehensive research and refers to the entire process of collecting information about a person, on the basis of which something can be said about characteristics and behaviour can be predicted. Tests are therefore only one source of information for an entire assessment process.

Types of tests

Tests can be roughly divided into group tests, which can largely be taken with pen and paper and with several participants at the same time, and individual tests, which are taken one-on-one. The various categories of tests are discussed below. They occur in different forms (norm-referenced, criterion-referenced, individual tests and group tests).

  • Intelligence tests: the general intellectual ability of an individual is measured, based on the skills that are important in a particular culture. There are sub-scores, but usually the general score is being used. The test generally consists of a heterogeneous combination of items that measure different aspects of intelligence.
  • Aptitude tests: one or more specific aspects of competence are measured. This kind of testing is often used to predict success in a particular job or study.
  • Achievement tests: this measures the degree of learning, success, or achievement of an individual with regard to a specific task. The difference with the aptitude test is the purpose and content of the test. Aptitude tests measure the course of the performance of individuals, Achievement tests measure the abilities of someone at the test moment.
  • Creativity testing: the ability to develop new ideas, insights or creations is measured. For these tests one has to be able to think divergent: looking for different solutions for a complex problem. There are still doubts about whether creativity is a form of applied intelligence.
  • Personality tests: characteristics or behaviors are measured that determine the individuality and character of a person.
  • Interest inventories: the preference of an individual for certain activities or topics is measured, often to help with occupational choice.
  • Behavioral procedures: the antecedents and consequences of behavior are measured, or the frequency of the behaviour is objectively described
  • Neuropsychological tests: these are used to investigate persons with possible brain damage. For the most part, these are long and intensive one-on-one tests.

Different types of test use

There are 5 ways to use psychological tests:

  • Classification: assigning people to certain categories. This can be subdivided into placement (assigning to different programs based on skills), screening (short tests to identify persons with special needs or characteristics), certification (whereby obtaining a test yields certain privileges) and selection (whereby certification provides access to 'private' circles such as the university or an association).
  • Diagnosis and treatment planning: determining the nature and cause of abnormal behavior and classifying the behavior within an accepted diagnostic system. Diagnosis must be more than a label, but it must also take into account the underlying information. The diagnosis is also used in the planning of the possible treatment plan.
  • Self-knowledge: gaining more insight into yourself through a test.
  • Program evaluation: evaluating the success of certain social or educational programs.
  • Research: testing hypotheses by means of tests.

The goals of testing often overlap, making a clear distinction difficult. Many tests can also be conducted once and still used for multiple purposes.

There are several factors that can affect the reliability of a test. These factors are discussed above.

Standardized procedures in test administration

Non-standardized tests can significantly influence the results, rendering them unusable. In addition, they are not valid. In some cases, however, it is desirable, sometimes even necessary, to be flexible with regards to the test procedure. This is the case, for example, with participants with disabilities. Deviations from the standard procedure should, however, always be intentional and well though through.

Desirable procedures of test administration

For individual testing it is important that the examiner is familiar with the material, the instructions he/she must give and the way in which details and scores are noted. In addition, it is very important that all participants can understand the written and spoken instructions. Account must also be taken with any restrictions of the participant in, for example, hearing, vision, speech, or motor control.

For people with reduced hearing, it is important that the test leader is aware of this and responds well to it, so that the test results are not influenced. As with vision limitation, most adults report these by themselves, but children often do not mention any limitations. Finally, possible limitations in motor control or speech must be taken into account. This is important in tests that use time responses. Tests can be adapted slightly for people with disabilities without the validity or reliability of the test deteriorating. Sometimes there also are special forms of a test that are adjusted to a certain limitation.

There are also a number of important points for group testing that have to be taken into account by the examiner. For example, when testing with a time limit, it is important that enough time is available and that it is attended to closely. In addition, instructions must be read clearly and not too quickly and must not be paraphrased. Background noise should also be limited as much as possible. Moreover, it is important to clearly indicate whether guessing, if the participant does not know the answer, does have any consequences. Many tests have a built-in guess correction.

Influences of the examiner

It is important that the test leader ensures 'rapport': a good rapport with the participants creates a comfortable and motivating atmosphere. This increases the cooperativeness with the participant. Research shows conflicting results about the influence of race, experience and gender of the examiner on the results. In some unique cases this seems to be of influence.

Background and motivation of the examinee

Different aspects of the participant can influence the test results. Test anxiety refers to all behavioral reactions that come along with concerns about possible failure of a test. Research shows that test anxiety is both a cause and consequence of poor performance on tests. Especially in tests with time pressure, the results of participants with test anxiety can be strongly influenced.

In addition, it sometimes happens that participants create false results to get a certain test result. You should also take into account the motivation of the participant: An unmotivated participant can provide unreliable results.

Topic 1B: Ethical and Social implications of testing

Professional standards for testing

Usually tests are carried out responsibly, but there are of course exceptions where the irresponsible application or interpretation of a test can sometimes have disastrous consequences. That is why guidelines for responsible test use have been developed by professional organizations such as the American Psychological Association (APA). The responsibilities of test developers and test users are described below.

Responsibilities of test publishers

Test publishers must take into account various factors. Firstly, tests must meet all guidelines before they are issued. For example, it is mandatory to provide technical and user manuals with the test. Secondly, any marketing and advertising of the test must take place in an accurate and sincere manner. A test may only be published when the reliability and validity have been investigated. The test must state in what way the reliability and validity were investigated and what the results were. It should also be clear who can use the test and what qualifications a person must have for this. Often certain certifications are required for use.

Responsibilities of test users

The APA, among others, has published ethical guidelines and professional standards for test use to ensure the well-being of the participants and their surroundings. This includes, for example, the guideline that testing must always be to the benefit of the client. Confidentiality is also a duty of the test leader, although it is mandatory to report serious threats to the participants or others.

In addition, it is necessary that the examiner has the necessary expertise to take a test. Informed consent is another important condition. This means that all participants are informed in advance about the research and give their permission. Furthermore, account must be taken of what the standard of care is for a specific case, i.e. which method or test is most frequently used at that moment and is the most accepted.

For example, one must be careful with the use of outdated material. In addition, test results should be communicated correctly with the participant, giving effective and constructive feedback. This should not be beyond the boundaries of the tester's expertise. The psychological report written about the research should be direct and concrete.

This is important because the content of the report can have an impact on the life of the participant, for example when the report is requested by an employer. Finally, respect and recognition of individual differences is very important for test use.

Testing cultural and linguistic minorities

Psychological tests are mainly aimed at Western populations. It therefore cannot be assumed that existing tests are suitable for all population groups. Since the 1930s, there has been a rise in culture-sensitive testing, but the work is far from complete. Other cultures may have different norms, values ​​or beliefs. This may cause them to look differently at a test or respond differently to the results.

The influence of cultural background on test results

Research shows that people from different cultural backgrounds complete and interpret tests in different ways. For example, indigenous peoples in the US show a different conception of time than the white middle class in America.

In addition, it appears that, for example, African Americans qualitatively respond in a different way to testing than Anglo-Americans; children of African American origin turned out to be less spontaneous in their answers. Similar differences are also visible in adults. In addition, testing may involve the danger of stereotyping, whereby the participants unconsciously confirm the negative stereotype that exists about their own group. This is also called stereotype threat. Test scores are not always the same for individuals, but are created within a social psychological field that is influenced by different cultural factors.

Unintended effects of high-stakes testing

Another effect that can play a role in testing is fraud. This is particularly the case with tests whose results have a lot of influence, for example when selecting for a job or study. Mass fraud occurs seldomly. Moreover, fraud with the help of parents or teachers also tends to occur.

Another aspect of fraud is described by the Lake Wobegon Effect, which refers to the fact that in many schools more than 50% of pupils have above average grades. This is mainly because our society places great emphasis on performance and the excellence of schools. Teachers help to cheat the pupils by, among other things, coaching them on key answers, changing answer forms or giving them more time on tests and exams.

It seems that the national trend towards performance tests for selection and evaluation helps to encourage unwanted behavior, but it is not clear how large and widespread the problem is.


How did psychological tests look like over the centuries? - Chapter 2

Topic 2A: The origin of psychological testing

The start of testing in China of 2200 BC

A long time ago in China psychological tests were conducted by the government to test the fitness of officers. Although there are some similarities with modern testing, the tests in old China were unnecessarily exhausting and, moreover, not validated.

Physiognomy, phrenology and the psychograph

Physiognomy is the idea that we can read the character of people by their appearance, especially the face. Although this idea is no longer tenable, it represents one of the early forms of psychological testing of the 4th century BC. Authors writing about this subject included Aristotle and Lavater. A special form of physiognomy is phrenology, which represents the 'reading' of nodules on the head. This theory has been developed by Gall. Lavery developed a machine in the 1930s to read these bumps, which he called the psychograph.

The experimental era

The field of experimental psychology was growing quickly at the end of the 19th century in Europe. This period also constituted the beginning of using objective measurements. Although this was progress, there were also many misconceptions, such as that intelligence could be measured by sensory processes. Wundt founded the first psychological laboratory in 1879 in Leipzig. He made the first attempts to arrive at empirical analyses that could explain individual differences.

Galton introduced the new experimental psychology in Great Britain. He was inspired by the idea that everything is measurable and designed different techniques to emphasize it. An important development in his work was the collection of large amounts of data from thousands of subjects. In addition, he focused on measuring and investigating personal differences in both physical and behavioral characteristics. For example, he suggested measuring intelligence from reaction times. This was of course too simple, but he did indicate that objective tests could be developed and that meaningful scores could be achieved through standardized procedures.

Cattell (1860-1944) brought the experimental psychology of Wundt and Galton to the US. Together with Galton he immersed himself in individual differences through various mental tests. Among his students was Wissler (1901), who would have a major influence on the early history of psychological testing. He collected mental test scores and academic data to show that the test results predicted academic performance. However, his findings were not statistically significant. Another problem was that there were only very limited correlations between the mental tests. After the results of Wissler, the use of reaction time and sensory discrimination was abolished as a measure of intelligence. The gap that arose after the Galtonian tradition was filled by Binet, who introduced his intelligence scale in 1905. From Binet on, sensitive and reliable measurements were used more often.

Rating scales

Rating scales are widely used in psychology. The origin goes back to the time of Galen, a doctor from the second century. He believed that there were various 'fluids' in the body (yellow bile, black bile, phlegm and blood), the relationship of which determined the health of an individual. He used a 9-point scale for the presence of these fluids. The first to design rating scales and to apply them to psychological variables was German lawyer Thomasius (1655-1728). He designed a personality theory based on four dimensions, which he took from aspiring judges on a 12-point scale. This can be labelled as the first time in the history of psychology that empirical data was used systematically. After that, the application of rating scales slowly caught on and they were one of the reasons why the phrenologists were able to gain so much respect.

Changing notions of mental retardation

Before Binet developed his intelligence tests in the early 20th century to identify children with mental retardation, there was little interest in mental retardation in the context of education. The Western world of the late 19th century treated psychiatric and mentally disabled individuals in a hostile and careless manner. In the 19th century, a growing distinction was made between mental retardation (idiocy) and psychiatric disorders (dementia).

There was a new humanism with regard to individuals with intellectual disabilities. Esquirol (1772-1840) was the first to describe this difference. He thought that mental retardation was more of a lifelong development phenomenon and mental disorders had a more sudden onset during adulthood. He also thought that mental retardation was not treatable, while disorders were treatable. He put a strong emphasis on language skills for the diagnosis of mental retardation, which is reflected in these tests. He also suggested the first classification system for mental retardation:

  1. The use of only short sentences.
  2. The use of single-syllable words.
  3. No speech but only sounds.

Seguin may have achieved even more because he set up an educational program for people with mental retardation. He wrote a book about this treatment and even came close to what we now call behavior modification.

Binet's early research on intelligence testing

Binet designed the first modern intelligence test in 1905. Important influences on his invention were his background in medicine, restoration of previous sloppy experimental procedures, and scepticism regarding the spirit of experimental psychology. In addition, he was an avid experimentalist, and used his two daughters for his research into intelligence tests.

Binet and testing for higher mental processes

Binet and his assistant Henri published an article in 1896 on the importance of testing for intelligence by means of higher psychological processes instead of the elementary sensory processes. In 1904, the government in Paris instituted a committee to decide on educational measures for children who could not benefit from mainstream education. This committee decided that these children should be identified through research. As a result, in 1905 Binet and Simon developed the first formal scales for measuring intelligence. These tests were initially intended to classify children with very low intelligence. The test focused on verbal skills. It was only possible to obtain a total score and the test was therefore only intended for classification and not for measurements.

The revised scales and the advent of IQ

In 1908 Binet and Simon published a revision of the previous intelligence test. An important innovation of this test was that the concept of mental level was introduced. The test was administered to standard groups consisting of normal children, which made it possible to focus the test on different age groups. Binet and Simon also set up a rough scoring system for each age group. In 1911 they also published a scale for adults. Although Binet warned not to attach too much value to the outcome of his tests, people soon came up with the idea of ​​'mental age'. This indicated in what range a person of a certain age should score, so that a child of 6 with a mental level of a child of 3 was described as 'three years of falling behind'. In 1916 Terman came with the Stanford-Binet, a successful revision of the earlier scales. He also proposed to multiply the intelligence quotient by one hundred. With this the concept of IQ was born. During the development of this test, it was also discovered that the subtests were not suitable for all cultures and that intelligence was possibly associated with a cultural component.

Topic 2B: Testing from the Early 1900s to the Present

As a result of the Binet-Simon scales, people realized the significance of the invention of tests for other social contexts.

Early use and abuse of tests in the US

In 1906 Goddard adapted the Binet-Simon scale for use in American mentally handicapped children. In 1911 he applied it to normal children, which showed that 3% (an unlikely high percentage) of the respondents fell under his definition of 'mentally retarded'. According to him, people were weak if their mental age lagged four years behind their actual age. Goddard was of the opinion that these children should be separated from society, so that they would not ‘contaminate' anyone else.

In 1910, Goddard was invited to Ellis Island to make immigrant investigations more effective. He became convinced that the number of mentally retarded immigrants was much higher than initially thought. He therefore appointed experts to conduct intelligence tests on the immigrants.

Although Goddard was one of the most influential psychologists of the early 20th century, he is often criticized by modern authors. Perhaps this is because Goddard saw intelligence as hereditary and that he saw feeblemindedness as the cause of most social problems. The main reason, however, is the way in which Goddard abused the intelligence tests. The versions of the Binet-Simon that he applied had been translated several times and were completed by confused immigrants who had just passed the crossing over the Atlantic. He then interpreted the results according to the original French standards. The mental deficiency ratios that he found in this way amounted to 83% per cultural group. The results of his tests were therefore very similar to the prevalent ideological beliefs at that time. After he withdrew his his viewpoints and then returned to them again later, they had already contributed to the restriction of immigration. Therefore, it is important to remember that even leading individuals within generally accepted social norms can abuse psychological tests. Furthermore, the ideologies and belief systems of the time in which statements are made must always be taken into account.

In the 1930s, Hollingworth introduced the use of the Stanford-Binet for testing children's giftedness. Hollingworth was an idealist and proposed measures for additional financial support for gifted children. She was also active in the feminist movement; she was of the opinion that gender differences in intelligence and performance were due to social and cultural influences.

The revised version of the Binet scales by Stanford (1857-1956), the Stanford-Binet, was an improvement on many points. In addition to the introduction of the IQ as we know it now, the revision ensured that the test was suitable for mental retardation, children, normal, and gifted adults.

In addition, clear instructions were drawn up for the collection of data and a sample for standardization was carefully compiled. New tests were validated based on the correlation with the Stanford-Binet test. The Stanford-Binet remained the standard in intelligence testing for decades, even after the Wechsler scale had emerged. The Wechsler scale became a popular alternative because it gives more than a single, global IQ score as is the case with the Stanford-Binet test. The Wechsler results in both a verbal and a non-verbal score.

Group tests and the classification of WWI army recruits

Pyle (1913) was one of the first to develop group tests for schoolchildren in the US. The group tests, however, only slowly became popular, the main reason being the amount of work that scoring with paper and pencil entailed.

In 1917, Yerkes appointed a committee to develop a group intelligence test to test army recruits for intelligence, with the aim of classification and allocation. Two tests were developed: The Army Alpha and the Army Beta. The design and content of these tests had a lot of influence on the field of group tests.

The Army Alpha and Beta exams

The Alpha was based on the work of Otis (1918) and consisted of eight verbal tests for average and high-functioning recruits. The Army Beta was non-verbal and meant for use by illiterates and recruits with a mother tongue other than English. The Beta consisted of various visual-perceptual and motor tests. However, the enormous amount of data that emerged from both tests was not really taken into use. This was partly due to the resistance that existed in the army against scientific innovation. However, there were also good reasons to doubt the validity of the tests and the test conditions. On the other hand, the Army testing psychologists provided an enormous amount of experience in the psychometrics of test construction.

Early educational testing

After WWI there was a great demand for group tests by different institutions. The Army Alpha and Army Beta were released for public use and formed the prototypes for a large number of group tests, including the SATs in the secondary American schools. Other important developments for group tests were the establishment of the College Entrance Examination Board (CEEB) and the rise of machine scoring. The CEEB was later included in the non-profit organization Educational Testing Service (ETS), which oversaw the development, standardization and validation of well-known tests.

Meanwhile, Terman and colleagues developed the standardized performance test called Stanford Achievement Test (SachT).

The development of aptitude tests

Aptitude tests measure one or more specific skills. Through the newly invented factor analysis method, Thurstone (1938) concluded that general measures of intelligence failed to measure the intellectual strengths and weaknesses of a person. The development of aptitude tests lagged behind the development of more general intelligence tests. This was because factor analysis is necessary to find out which primary properties are involved, and that was only discovered in the 1930s. In addition, there was a social reason: Only by the time of the Second World War was there a need for aptitude tests to select candidates who were qualified for difficult and specialized tasks.

Personality and vocational tests after WWI

Personality tests only emerged in WW I. Woodworth (1919) developed his Personal Data Sheet to test recruits for susceptibility to psychoneurosis. Almost all modern personality tests find their basis in Woodworth's first tests.

The next big development was the neurosis questionnaire by Thurstone, the Thurstone Personality Schedule. This was the first test that used the method of internal consistency. From the Thurstone test resulted the Bernreuter Personality Inventory, which measured four personality dimensions in a sophisticated manner. An important point was that a single test item could apply to multiple personality dimensions. Finally, the Minnesota Multiphasic Personality Inventory (MMPI) was created, with scales that were compiled using Woodworth's method.

The origins of projective testing

Galton was the first to map out the projective approach in the late nineteenth century, through the association method. He stated that mental processing happens in the subconscious. This method was further extended by, among others, Jung (1910). His test included 100 stimulus words, in which the participant had to say the first word that came to him/her as soon as possible. Inspired by, among others, Jung, Rorschach (1884-1922) developed a projective personality test based on participants' reactions to ambiguous stimuli (ink stains). The Rorschach test was initially intended for discovering the inner workings of abnormal participants.

The Thematic Apperception Test was developed for studying normal personality. In this method, a participant is shown a picture, in which case he / she has to come up with a story.

During the same century, Payna (1928) developed a test in which participants had to complete sentences. Buck's (1948) House-Tree-Person-Test allowed participants to draw a house, a person and a tree from which the personality should appear.

The Szonditest (1949) was based on the idea that, on the basis of the choice of a certain image, recessive genes for certain psychiatric disorders could be identified.

The development of interest inventories

The interest inventory has its origins at Thorndike (1912), who researched the developmental trends in the interests of students. The Carnegie Interest Inventory was developed, tested and revised for a number of years until it was renamed in 1927 into Strong Vocational Interest Blank (SVIB). In the development of this test, a distinction was made for the first time between actual differences in results and error. For decades the only serious rival of the SVIB was the Kuder Preference Record, this test looked at the difference in strength of interests within a person. The results were therefore not compared with other participants. The interest tests were mainly used to see which profession best suited someone.

The emergence of structured personality tests

From the 1940s, personality tests became useful for the use of clinical evaluation and assessment of normal functioning. Especially the MMPI was important. Other tests used were the Sixteen Personality Factor Questionnaire (16PF), the California Psychological Inventory (CPI) and the Myers-Briggs Type Indicator (MBTI). Most recently, the 'big five' model has been used, on which many tests are based. It includes the five factors neuroticism, extraversion, openness to experience, agreeableness and conscientiousness.

The expansion and proliferation of testing

Nowadays tests are often used, both for clinical one-on-one use and for group tests with social purposes. In the clinical discipline, thousands of tests are available for different purposes, such as neuropsychology or forensic psychology.

Group tests are nowadays widely used for broad social goals, such as in education and admission to universities.

Evidence-based practice (EBP) has become important in recent years. The advantage of EBP is that it states that treatments and interventions must have measurable positive outcomes. To measure this, psychological tests are best used. This led to evidence-based psychological practice (EBPP), which mainly provides empirically supported interventions.

Connect & Continue
WorldSupporter Resources

Psychological Testing and Assessment - Van der Molen - Custom Edition, Leiden University

This summary is based on the customized edition of Psychological Testing and Assessment - Van der Molen - Leiden University. This book, and hence this summary, is composed of chapters from different books. By clicking on the provided links, you can access the different parts of the summary. The summary is based on the obligatory literature, needed to prepare for the exam of the course "Psychodiagnostics (Psychology, Leiden University)".

Contributions, Comments & Kudos

Add new contribution

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.
Summaries & Study Note of maxuxo
Join World Supporter
Join World Supporter
Log in or create your free account

Why create an account?

  • Your WorldSupporter account gives you access to all functionalities of the platform
  • Once you are logged in, you can:
    • Save pages to your favorites
    • Give feedback or share contributions
    • participate in discussions
    • share your own contributions through the 11 WorldSupporter tools
Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private

JoHo kan jouw hulp goed gebruiken! Check hier de diverse bijbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en je een bijdrage laten leveren aan een mooiere wereld