Summary of Introduction to Behavioral Research Methods by Leary - 6th edition

Research in the Behavioural Sciences - Chapter 1

What does the history of this research look like?

Many people think that psychologists are only concerned with treating people with mental problems. That is partly true, but psychologists also conduct research to find out more about people's behavior and mental processes. People have been explaining human behavior for centuries. Aristotle and Buddha for example systematically asked questions about why people behave in certain ways. In the past, however, explaining human behavior was not done scientifically. The statements that were made were mainly speculative. It was therefore impossible to test the validity of the statements. Consequently the accuracy and correctness of the statements could not be tested either. Declarations were often given on the basis of, for example, religious dogmas. Scientific psychology originated in the last 25 years of the 19th century. Scientists such as Wundt, James, Watson and Hall began to see that psychological issues can be answered using scientific methods that are also used in, for example, biology or physics.

What types of investigations can be distinguished?

Researchers distinguish between two types of research that are used for different purposes:

  1. ' Basic research': this type of research is conducted to gain a better understanding of psychological processes. It is not important in this context whether this knowledge can be applied immediately. The primary goal is just to increase knowledge about a psychological process.

  2. ' Applied research': this type of research is carried out to find solutions to specific problems instead of increasing our general knowledge about certain processes. For example, sometimes psychologists are hired to detect and resolve problems in the workplace. In this context research is a matter of understanding and eliminating problems.

  3. In addition to these types of research, some scientists also speak of a third type of research, namely 'evaluation research' (also known as 'program evaluation' ). This type of research is aimed at understanding the effects of programs on behavior using scientific research methods.
    In this context, think of new school programs that are being applied. It is important to investigate to what extent these programs are effective.

What is the purpose of these kinds of research?

Sometimes it is difficult to determine whether it is 'basic' or 'applied' research based on the design of a scientific study. Moreover, both types of research do not necessarily exclude each other. 'Basic research' can also be applied in the long term and 'applied research' can ensure that we gain more knowledge about a phenomenon. It is also often the case that problem solving ('applied research') can only be done when there is sufficient knowledge about a phenomenon ('basic research'). The main difference between 'basic' and 'applied' research lies in the intention of the researcher. It is often difficult to read from the research itself. Whether it is a matter of 'applied' or 'basic' research, research always has three goals:

  1. Description: Some studies are mainly designed to describe behavioral patterns, thoughts or emotions. Consider, for example, opinion polls held just before the elections. Developmental psychologists, as another example, are concerned with describing age-related changes in behavior.

  2. Prediction: in this case, scientists try to predict behavior. For example, some psychologists try to predict the academic performance of people based on scores on standardized tests. It is important that these types of tests are critically analyzed and meet all kinds of statistical criteria. The prediction is analyzed on the basis of various other data to ensure that the prediction is correct. You can deduce from this that description is also used when predicting behavior ​​​​​​.

  3. Explanation: Many scientists believe that explanation is the most important goal of scientific research. Scientists only have the feeling that they understand a phenomenon if they can explain it. For example, we can describe how many prisoners end up on the wrong path again after their release, but ultimately we want to be able to explain why some ex-prisoners go on the wrong path after their release, while that does not apply to other former prisoners.

What is meant by folk psychology?

In contrast to, for example, the natural sciences, the behavioral sciences often engage in research into phenomena that we all know.

The average person, for example, knows nothing about atoms, but knows a lot about memory, prejudice, sleep and emotion, because he or she has experience with these phenomena. This causes many people to think that behavioral sciences' findings are often logical and that they could have thought of the theories themselves. However, this is not always true. It is not the case that something is automatically true when almost everyone believes it. Scientists also have conducted many studies that show that much of our 'folk psychology' is wrong. An example is that many people think that very intelligent people are often a bit more weird than the average person. However, research shows that very intelligent people are often better adapted to their environment than other people. Another example is that many people think that the biggest differences between men and women are biological in nature. This appears not to be true; the role of socialization is also incredibly large. It is also important to consider the fact that folk psychology can cause scientists to be hesitant to start a research. Scientists often rely on common sense knowledge (folk psychology) when it comes to explaining behavior, thoughts and emotions. If folk psychology is incorrect with regard to a phenomenon, then this can cause scientists to explain psychological processes incorrectly.

Why is research in the behavioral sciences important?

It is important for us to learn more about conducting scientific research. This has four reasons:

  1. Knowledge ensures that people can understand research that is important to their own profession. This is important because we must always be up-to-date with regard to new findings. For example, teachers need to understand why some teaching methods are effective, while others are not.

  2. Knowledge about research methods also ensures that we can better analyze scientific findings in our daily lives. For example, if we want to buy a car, we can read several scientific studies that describe the pros and cons of a car. It is important that we can properly and critically analyze these findings.

  3. A third advantage is that knowledge about research methods makes us critical thinkers. Scientists ask critical questions, try to come up with alternative possibilities and explanations, improve their methods and aim to find strong evidence.

  4. A final advantage is that knowledge about research methods ensures that someone can become an expert; not only in the field of research methodology, but also in the field of specific subjects. In this way people can read and understand previous studies in their field of research, learn how to collect data and interpret results correctly.

What does the scientific method look like?

A method is scientific when it meets the following three criteria:

  1. Empiricism: the use of observation to draw conclusions about the world. However, it is necessary that empiricism is systematic . Scientists structure their observations in a systematic way so that they can draw valid conclusions.

  2. Verification ('public verification'): this means that the research results of one researcher must be able to be observed, replicated and verified (confirmed) by other researchers. This ensures that other researchers can see that what has been studied by one researcher actually exists and is observable. In addition, this process ensures that research can be improved: Other researchers can detect errors in the work of a researcher, so that these errors can be corrected. Verification often involves publishing articles in scientific journals. Replication does not only prevent errors, it also makes it possible for researchers to build upon the research of others and expand this research.

  3. Solvable problems: science only deals with solvable problems. For example, the question of whether angels exist is not scientific, because there is no way to study angels in an empirical and systematic way. This does not mean that angels do not exist, but it does mean that no statements can be made about this using a scientific method.

What are the tasks of a scientist?

The scientists focus on two different tasks:

  1. Discovering and recording new phenomena, patterns and relationships that they notice. Sometimes, however, it is not possible to conduct a research based on a hypothesis, because there is no theory yet about the phenomenon being studied. This can cause that there is too little information about the phenomenon to develop a theory. In that case it is better to design a survey to describe a phenomenon instead of testing hypotheses about the phenomenon.

  2. The development and evaluation of phenomena they notice. Once they have identified the phenomena that are to be explained, they focus on developing theories to explain patterns and conduct research to test the theories.

What is the role of theories and models?

A theory consists of a set of propositions that tries to explain the relationships between a number of concepts. For example, with his contingency theory Fiedler tries to make a connection between concepts such as leadership effectiveness, task versus relationship-oriented leaders, leader-follower relationships, task structure and power. Scientific theories are only valid if they are supported by empirical findings. This means that a theory must be consistent with the facts discovered by scientists. A good theory meets the following criteria:

  1. A good theory suggests a causal relationship. It therefore describes how one or more variables lead to a certain cognitive, emotional, behavioral or physical response.

  2. A good theory is coherent in the sense of it being clear, simple, logical and consistent.

  3. A good theory uses as few concepts and processes to describe a phenomenon as possible.

  4. A good theory generates testable, falsifiable hypotheses to test the theory.

  5. A good theory solves an existing theoretical question.

Researchers often use the concepts of 'model' and 'theory' interchangeably. However, a theory is different from a model. A model only describes how concepts are related to each other, while a theory also describes how and why concepts are related to each other. A model is therefore mainly descriptive in nature, while a theory is both descriptive and explanatory in nature.

What do research hypotheses look like?

Scientists mainly spend their time testing theories and models in order to discover if they really describe and explain behavior in a correct way. People can often find explanations for events after they have occurred. Such statements are also referred to as post hoc explanations - statements that are given after the fact has occurred. Scientists are very skeptical about this.

When a theory can explain phenomena afterwards, this hardly says anything about the accuracy of the theory. When a theory can predict in advance what will happen, this does say a lot about the correctness of the theory. For this reason, scientists come up with hypotheses before collecting data ('a priori') . Theories are often too broad to be tested immediately. That is why they are always tested in an indirect way. This is done using hypotheses.

  • A hypothesis is a proposal that logically follows from a theory.

  • Deduction is a process whereby a specific proposal (the hypothesis) is derived from a general proposal (the theory). In this process, the scientist is guided by the question of what would be discovered if the theory were actually correct. Hypotheses are therefore formulated in a ' if -A- then -B' form.

  • Sometimes, a hypothesis does not arise through deduction, but through induction . In that case, a hypothesis is derived from a set of facts instead of a general theory. Hypotheses only based on previously observed results are also called empirical generalizations . A hypothesis must always be formulated in a way that makes it possible to be tested and to be de-energized ( 'falsified'). For example, Freud's psychoanalysis is criticized because it is not possible to derive hypotheses from this theory that can actually be tested (and therefore also falsified). For example, it is impossible to come up with hypotheses about the subconscious, because they cannot be tested.

  • Some studies are stronger and designed better than others; giving stronger evidence for a hypothesis (and therefore a theory). In addition, the more different (measuring) methods are used to test a theory ('methodological pluralism' ) in different experiments, the more scientists have confidence in their findings.

  • Sometimes there are two conflicting theories about a phenomenon. Scientists then design a study which can test both theories at the same time. Because the two theories are contradictory, it can never be the case that both theories are correct. If one theory is correct, the other (opposite) theory is automatically incorrect. This method is also called 'strategy of strong inference' . The results of this method lead to stronger conclusions about the relative value of the opposing theories. Methods that only test one theory lead to less strong conclusions.

Which types of definitions are important?

In order to test a hypothesis and possibly falsify it, it must be clearly formulated. For example, if a researcher is researching the effects of hunger on our attention, then he or she must be able to define these concepts well. Scientists use two types of definitions: (1) conceptual definitions and (2) operational definitions:

  • conceptual definition of a word is the definition that we could find in a dictionary. Hunger in this context is 'the desire for food'.

  • An operational definition shows how a concept can be measured in a study. An operational definition converts an abstract, conceptual definition into concrete, situation-specific terms. For example, we could say in a survey that someone is hungry if he or she has not eaten for twelve hours. Multiple operational definitions can be devised for the same concept. Operational definitions are necessary because scientists can replicate each other's findings through these definitions. Using these definitions forces researchers to clearly describe their concepts and to circumvent any ambiguity.

How do we find evidence?

Because theories can only be tested indirectly through hypotheses, theories can never be proven. Scientists never say that a theory is proven , but only that a theory is supported . Hypotheses can be proven, but that a hypothesis is proven does not mean that the theory that goes with it is also true.

An example is that a murder has been committed and we are thinking of a theory about who the perpetrator is. The murder was committed at a beach party. Imagine that Pete is a suspect. If Pete is the killer, then he must have been present at the beach party (this is a hypothesis that takes an 'if -A- then -B' form). Then indeed it appears that he was present at the party. Does this mean that he is the killer because the hypothesis has been confirmed? Of course not. So we cannot prove a theory ('Pete is the killer') by confirming the hypotheses that follow ('Pete was at the beach party').

Proving that a hypothesis is not true is a logical valid operation. If it has been proven that Pete was not there at the beach party, Pete logically cannot be the killer. However, this is often different in daily life. The use of incorrect measurement techniques can, for example, lead to a hypothesis being rejected while the theory is correct, and vice versa. Disproving a hypothesis ('Pete was not at the beach party') does not necessarily mean that the theory ('Pete is the killer) is untrue. We think that Pete was not at the party, may be because we have made mistakes and his alibi, for example, is untrue. This is an example of an error in the measurement techniques. Because measurement techniques do not provide 100% certainty, a theory is never immediately rejected only because an investigation could not find any evidence for the theory.

The conclusion is therefore that we can never prove a theory, but we can prove that a theory is wrong. Science is developing because there is a lot of evidence for a theory or because there are dozens of studies that support it. The more studies that support a hypotheses, the greater the chance that the theory associated with those hypotheses is correct. You can also see science in terms of filters. First, a lot of different possible explanations for a phenomenon are thought of. Subsequently, the more plausible statements are retained by testing and further tested, while the non-plausible statements are rejected. The more filtered, the less potential explanations remain.

Scientific studies that have been read by other researchers and published in journals are in any case acceptable. Other scientists can then start replicating the findings that have been published.

Null findings are results that show that certain variables are not related to behavior. These results provide little information because they do not support a theory. The data may disconfirm the research for reasons that have nothing to do with the validity of a particular hypothesis. This does not make null findings informative. When studies containing null findings are not published, the file-drawer problem occurs: It often happens that researchers set up a design to test a theory, while this theory has yielded a null finding many times. All of these null findings have not been published, resulting in future researchers not knowing that "their" research has already been done before.

What is meant by the scientific filter?

You can also see science in terms of the so-called 'scientific filter'. Imagine a tube that gets narrower downwards. This tube consists of a number of layers, with a filter between each layer. The top layer consists of all unfiltered ideas, habits, thoughts, etc. that the researcher comes up with. Then, 4 filters follow:

  1. In the first filter, the researcher examines which ideas can and cannot be implemented. The researcher abandons ideas that he or she has learned (in training or education) to be impossible. The researcher also thinks about his or her professional reputation. The ideas that come through filter 1 are not necessarily valid, but certainly not immediately incorrect.

  2. The second filter consists of the researcher himself. At this stage, the researcher determines which ideas are worth investigating. If a research can lead to an interesting result and scientific publication, the researcher will probably want to continue with it. But if there is a good chance that the research will lead to null findings, the researcher will not further investigate the subject.

  3. Filter 3 consists of peer review: other researchers check the research. They remove or improve research that has a poor methodology. Studies that are not useful for the scientific community are also removed at this stage. It is not the case that this filter removes all research that is not necessary; the filter mainly removes errors.

  4. The last filter consists of the use, replication and addition by others. Only if a theory passes this filter it becomes a part of established scientific literature.

If a theory has passed all 4 filters, this does not mean that the theory is automatically true. Scientists rarely speak of their theory as 'the only truth'. Through the scientific filter and by constantly testing new hypotheses, we can only come closer to the truth. However, it remains uncertain whether our findings really contain the only truth.

What research techniques can be distinguished?

Scientists can use four types of research techniques to test hypotheses:

  1. Descriptive research: this type of research describes the behaviors, thoughts and attitdes of a group of individuals. For example, developmental psychologists try to describe the behavior of children of different ages. Further studies are being conducted in clinical psychology to describe the prevalence, symptoms and severity of certain psychological problems.
    Descriptive research forms the basis for all other research methods.

  2. Correlational research: In this type of research, the relationship between variables is studied. An example is an investigation towards the relationship between self-confidence and shyness. In such a case, a correlation between variables is calculated .However, with correlational studies, no statements can be made about cause-effect relationships. For example, we do not know if little self-confidence causes embarrassment or vice versa.

  3. Experimental research: in this case a variable is manipulated (the independent variable) to see if this causes changes in behavior (the dependent variable). If this is indeed the case, then we can conclude that the independent variable is the cause of the dependent variable. The most important thing about an experiment is that a variable is manipulated.

  4. Quasi-experimental research: this research technique is used when scientists cannot manipulate a variable. Consider in this context, for example, gender or age. The scientist then examines the effects of a variable or event that occurs naturally and cannot be manipulated. Quasi-experiments do not provide as much certainty as real experiments.

What role do animals play in these studies?

Most studies in psychology are conducted with humans, but it also happens that animals are used to find out more about psychological variables; often mice, rats and pigeons. The advantage of animal studies is that severely controlled studies can be conducted and most environmental influences can be eliminated. These two things are often not possible in studies in which humans participate. In addition, drugs are tested on animals so that people do not have to take medical risks. Through animal studies we now know much more about, for example, hunger, thirst and sexual behavior. We also learned a lot about vision, smell, taste and hearing. In addition, we know more about processes such as classical and operant conditioning through animal studies. Finally, we know a lot about the functioning of the brain through animal studies.

Behavioural variability and research - Chapter 2

What is a schema?

A schema is a cognitive framework of references that ensures that we can process and store information in a certain way. You have a schema for many different events, people and other stimuli that you have encountered during your life. If you have a schema of something, you process information that is relevant to that schema more efficient than information that is not. Many people find it difficult to develop a schema with regard to research methods and statistics. This makes it more difficult to process information about this. In order to be able to develop a schema about research methods and statistics, it is important to understand that research is being conducted to answer questions about differences in behavior ('behavioral variability'). With 'behavioral variability' we talk about how and why behavior differs per situation and per individual, and how it changes over time.

What role does variability play in research?

Doing research is about measuring variability. Five important points are distinguished in this regard:

  1. The behavioral sciences are concerned with studying variability in behavior. Behavioral scientists want to know why behavior varies (1) per situation, (2) per individual and (3) over time. People behave differently per situation, but different people also behave differently in the same situation. In addition, people's behavior changes over time. When they get older they behave differently than when they are young. Behavioral scientists are therefore concerned with investigating the causes that make behavior vary.

  2. Research questions in all behavioral sciences are questions about the differences in behavior. For example, if we want to know how performance on cognitive tasks is influenced by the amount of sleep, then we actually want to know how sleep causes changes in cognition.

  3. Investigations must be designed in such a way that it is possible to find answers about differences in behavior. Research designs must be strong and ensure that we can identify all factors associated with differences in behavior.

  4. By measuring behavior we can make statements about differences in behavior. All behavioral scientists try to measure behaviors, thoughts, emotions or physiological processes. We want differences in numbers to match differences in behavior. If someone scores a four on a test, this should be twice as good as someone who scores a two on a test.

  5. Statistical analysis is used to describe and explain the observed spread in behavioral data. For every research it is important that the collected data is analyzed. After a study has been completed, researchers have a whole list of numbers that represent differences in behavior. These numbers must be analyzed by means of statistics. This can be done by means of two types of statistics: (1) descriptive statistics and (2) inferential statistics. Descriptive statistics are used to make a summary and description of the behavior of the participants. Consider for example averages and percentages. Inferential statistics are used to draw conclusions about the reliability and generalizability of the research results found.

What is the variance used for?

Researchers calculate the variance of a dataset to gain an idea about the observed spread in the behavior of participants. The statistical variance indicates the amount of variability in the behavior of the participants. One possibility is to calculate the 'range'. You calculate the range by subtracting the lowest score from the highest score in the dataset. However, this method is not recommended to measure variability. The range namely only shows the difference between the largest and smallest score, but not how much scores differ. When we talk about variance, it is often in relation to a general score, for example the mean. The mean is calculated by adding up all scores and dividing by the number of added scores. The variance of a data set shows how scores deviate from the mean. When the scores in a dataset are clustered close to the mean, it means that the variance in the data is small. If the scores are more dispersed, then the variance will probably also be large. (see figure 2.1 to better understand it). The variance is therefore no more than an indication of how far, or close, the scores are from the mean. The variance can be found through five steps:

  1. Step 1: first of all, the mean of a data set must be calculated. You do this by adding up the scores, and then dividing by the number of added scores. The mean is abbreviated based with the symbol y(bar) or the symbol x(bar). 

  2. Step 2: Then, it be calculated to what extent each score in the dataset deviates from the mean. This yields the deviation score (y): It is the individual score from the data set and the mean.

  3. Step 3: during this step, every deviation score must be squared: (y-)². As a result, we no longer have problems with some scores being negative. If you square a number, the answer is always positive.

  4. Step 4: then all deviation scores must be added together: Σ (y-)². We also call the outcome 'total sum of squares' (SS).

  5. Step 5: Finally, the number found must be divided by all scores minus one (n -1). We use the s² symbol for the variance. In short: s² = Σ (y-) ² / (n-1). 

What is systematic variance and what is error variance?

The total variance in a dataset can be divided into (1) systematic variance and (2) error variance. The total variance = systematic variance + error variance. Scientists often conduct research to find out whether two or more variables are related to each other. Suppose a researcher wants to know if temperature and aggression are correlated. This scientist then wants to know whether variability in one variable (temperature) is systematically related to variability in the other variable (degree of aggression):

  • Systematic variance stands for that part of the total variance that is related in a predictable way to the variables that a scientist is investigating. If the behavior of a participant systematically changes when the variables change, then the scientist can conclude that these variables cause the changes in behavior.

  • Error variance arises when the behavior of participants is influenced by variables that the scientist does not investigate; for example the mood, concentration and traits of the test subject. For example, if someone scores high on aggression, this may also be due to his or her bad mood instead of the temperature. Error variance therefore stands for the variance that arises from the influence of variables that are not included in the study. This form of variance can therefore not be explained by the research. The more error variance there is in a data set, the more difficult it is to determine whether the manipulated variables (independent variables) are really related to the behavior they want to investigate (the dependent variable). Researchers therefore want as little error variance in their research as possible.

It is important that scientists know how much of the variance in a dataset is error variance and how much of the variance is systematic in nature. Statistical methods are used to find this out.

How can the strength of relationships be indicated?

Scientists are not only interested in whether variables are related, but also how strongly they are related.

For example, some variables are weakly correlated with emotional or cognitive behaviors, while other variables are not. It has already been explained that the total variance in a data set consists of systematic variance and error variance. When we have calculated these values, we can easily calculate the proportion of the systematic variance in a data set. You calculate this by dividing the systematic variance by the total variance. The strength of a relationship between two variables can be expressed by means of 'effect sizes' (also called 'measures of strenght of association' ).

  • The proportion of systematic variance is always expressed in a number between 0 and 1. If the proportion of systematic variance is .00, then the variables in the study are not related at all. However, if the systematic variance is 1.00, then all variability in the dataset can be attributed to the variables in the study. So in this case there is only systematic variance and no error variance: There is a perfect relationship between the variables in the study. The greater the systematic variance with respect to the error variance, the stronger the relationship between the variables studied.

  • Cohen states that the relationship between two variables is weak if the proportion of systematic variance is around .01, average if it is around .06 and strong if it is greater than .15. The .15 standing for a strong relationship may seem like a low number, because then 85% of the total variance is error variance. In this context, it is important to remember that many psychological variables (for example, self-confidence) are influenced by multiple variables. If for one variable it appears that 15% of the variance is systematic, then this is significant. Therefore, even low percentages can be significant and important.

What is a meta-analysis?

Different studies in which the same variables are investigated may result in different estimates of the systematic variance. This is because every research design is different. Every study also uses different test subjects. For this reason, scientists often try to base the strength of the relationship between two variables on multiple scientific studies . By looking at all the results, an average estimate can be made of the relationship between two variables. Through meta-analysis the results of many different studies are analyzed and integrated. The effect size is calculated or estimated per study. As explained above, the effect size describes the strength of the relationship between 2 variables. If this effect size is set for each study, the mean effect size of the different studies is calculated.

A mean estimate based on multiple studies provides stronger evidence for the relationship between two variables than a single study.

Most meta-analyzes do not only determine to what extent variables are related, but also look at which factors influence this relationship. For example, the relationship between two variables can be influenced by gender. This variable can therefore play a mediating role. Scientists first try to find out whether variables are related at all, only then do they look at how variables are related. They are always looking for systematic variance, because this says something about the relationship between the variables studied.

Why is systematic variance so important?

Almost all research into behavior is a search for systematic variance. After all, researchers want to find as much systematic variance as possible, and as little error variance as possible. Only when it appears that there is systematic variance, the researchers look at how exactly the variables are related.

The Measurement of Behavior - Chapter 3

What types of measurement can be distinguished?

The types of measurement that behavioral scientists use can be divided into three groups:

  1. Observational measures: in this case behavior is observed in a direct way. This can be done in any research where the behavior that is being investigated can be directly observed. Consider, for example, the measurement of eye contact between people, measuring how often a rat presses a lever and observing the amount of aggression between children during the school break. Researchers can directly observe the behavior, or can make audio or video recordings from which information about the test subjects can be derived.

  2. Physical measurement types ('physiological measures'): scientists use this method when they are curious about the relationship between behavior and non-directly observable body processes. This is about processes that take place in the body. They usually cannot be seen with the naked eye. These processes can be measured with certain measuring instruments. Consider, for example, the measurement of heart rate, sweating, brain activity and hormonal changes.

  3. Self-report measurements ('self-report measures'): in this case people provide answers to questionnaires (' questionnaires ') and interviews. There are three types of self-reports: (1) cognitive ('cognitive self-reports'): these measure what people think, (2) affective (' affective self-reports'): these measure what people feel and (3) behavioral (' behavioral self-reports') self-reports : these measure what people do.

The success of an experiment is highly dependent on the quality of the measuring instrument that is used. Because the measurement techniques are so important during the research process, psychometrics has been developed. Psychometrics is a science focused on researching the techniques of measurement in psychology. Psychometrists investigate the properties of the measurement techniques used in behavioral sciences and strive for the improvement of psychological measurement techniques.

What scales of measurement can be distinguished?

For all three of the above-mentioned types of measurement, it is necessary to assign numbers to people's responses.

With a self-report measurement it is for example possible that people can choose from five possible answers within the range from 'not at all true' to 'completely true'. The options can then be assigned the numbers 1, 2, 3, 4 and 5. What these numbers mean depends on the measuring scale that is used. In this context, a distinction is made between four types of scales ('scales of measurement'):

  1. Nominal: in this case, numbers serve only as labels. The number 1 can stand for 'male' and the number 2 for 'female'. With numbers on a nominal scale it is impossible to perform calculations, because they only serve as labels.

  2. Ordinal: in this case numbers represent behaviors or characteristics arranged in order. In this way it can be seen how one behavior or one characteristic relates (relatively) to other behaviors and characteristics. For example, we can put participants in a singing competition in order from 'best' to 'worst' based on the applause they receive. However, it is hard to perfectly judge how much more applause one candidate received compared to another.

  3. Interval: the use of 'real' numbers is involved: equal differences between the numbers represent equal differences between the participants. For example, on an IQ test, a difference between an IQ of 90 and 100 (10 points) is the same as the difference between 110 and 120. However, the interval scale does not have a real zero point. A zero indicates that a certain property is not present. So we cannot make such statements at an interval scale. For example, we cannot say that someone has no intelligence whatsoever. A temperature of zero degrees does not mean that there is no temperature. Because an interval scale does not have a real zero point, it is not logical to say that it is twice as warm at a temperature of 50 degrees than at a temperature of 25 degrees. Because there is no zero point,we cannot multiply or divide the numbers on an interval scale.

  4. Ratio: this is the highest possible measurement scale. Contrary to the interval scale, the ratio scale does have a true zero point through which numbers can be added, subtracted, divided and multiplied. For example, weight has a ratio scale: someone weighing 50 kg weighs twice as much as someone weighing 25 kg. Weight has a zero point because not everything has to have a weight.

Mathematical analyzes

Different scales of measurement are important to scientists for two reasons:

  • First of all, measuring scales differ in the extent to which they provide us with information. For example, a nominal scale says less than an ordinal, interval or ratio scale. For example, if people are asked whether they agree or disagree with a statement, the outcome provides less information than knowing to what extent they agree or disagree with a statement. Often the variables already show which measuring scale can be used. For example, sex can only be measured on a nominal scale. If researchers are given the choice, they prefer the highest measurement scale possible. Higher measuring scales provide more and more precise information about the characteristics of the test subjects.

  • The second reason that the existence of multiple different measuring scales is so important to scientists is that for some mathematical analyzes (such as t and F tests) it is necessary that numbers are measured on a ratio or interval scale. Researchers choose to use the measurement scale that makes it possible for them to use the test that gives them the most information.

How can we estimate reliability?

We want the numbers that we assign to behaviors, objects, or events to correspond to the characteristic that we are trying to measure in a meaningful way. Therefore, the intention of researchers is that variability in the numbers reflects the spread in the characteristic being measured. But how do we know if that is really the case? In this context, the term reliability comes into play. Reliability is about the consistency of a measurement technique. If you stand on a scale twice, and one time the scales say that you weigh 40 pounds and the next time you weigh 50, we call the scales unreliable ('unreliable'). Measurement techniques should be reliable. If they are not, we cannot assume that they will provide meaningful information about the participants in a study.

How do errors of measurement arise?

The score of a participant on a measurement is made up of two parts: (1) the true score of the participant, and (2) measurement error . So; the observed score = true score + measurement errors. The true score is the score that a participant would have if the measurement technique was perfect and therefore there would be no errors in the measurement. However, the measurement techniques that scientists use are never completely flawless. All measurement techniques contain measurement errors . Due to these measurement errors, scientists cannot find the true score of a participant.

Measurement errors can have five causes:

  1. Test subject's mood ('transient states'): for example mood, health, degree of fatigue and anxiety during participation in the study.

  2. Stable characteristics: these are characteristics that are constantly present in a test subject. Think in this context of paranoid thoughts, motivation and level of intelligence.

  3. Situational factors: these are factors in the research setting that can cause measurement errors. For example, if a researcher is very nice, then a participant may want to work harder, but if a researcher is unsympathetic, a participant may perform less well. Temperature, light and noise can also influence the research results.

  4. Characteristics of the measurement ('characteristics of the measure'): this concerns, for example, unclear questions that test subjects must answer or the use of difficult language in a questionnaire,which results in the test subjects making unnecessary mistakes while completing their questionnaire.

  5. Errors: these are errors that the researcher makes while saving the answers of the participants. Consider, for example, incorrect processing of the answers on the computer or losing count during observing how often a rat presses a lever.

Measurement errors and reliability

Measurement errors reduce the reliability of a measurement. If a measurement has low reliability, then the measurement errors are large and the researcher only knows little about the true score of a participant. If a measurement has a high reliability, then there are few measurement errors. The observed score of a participant is then a good (but in no case perfect) reflection of the true score of a participant.

How can we view reliability as systematic variance?

Scientists never know exactly how many measurement errors there are in their study and what the true scores of participants are. They also do not know exactly how reliable their measurements are, but they can make an estimate about how reliable their measurement is based on statistical analyzes . If they see that their measurement is not reliable enough, then they can try to make their measurement more reliable. If it is also not possible to make the measurement more reliable, they can choose not to use the measurement at all in the study.

  • The total variance in a dataset of scores consists of two parts: (1) variance of the true scores and (2) variance caused by measurement errors. Formulated in a formula: total variance = variance due to true scores + variance due to measurement errors.

  • We can also say that the proportion of total variance associated with the participants' true scores is systematic variance, because the true scores are systematically related to the measurement.

  • The variance that results from measurement errors is error variance because this variance is not related to the factors or variables that the scientist is investigating.

  • We therefore say that the reliability is calculated by dividing the true score variance by the total variance, so: reliability = true score variance / total variance. The reliability of a measurement always lies between 0 and 1. A reliability of .00 tells us that there is no true score variance in the data at all and that the scores only represent measurement errors. With a reliability of 1.00 it is exactly the opposite; there is only true score variance and there are no measurement errors. The rule of thumb to interpret reliability is that a measurement is reliable enough if it has at least a reliability of .70. This means that 70% of the variance in the data stands for true score variance.

What types of reliability can be dinstinguished?

Researchers use three types of reliability when analyzing their data: (1) 'test-retest' reliability, (2) 'inter-item' reliability and (3) 'interrater' reliability. The correlation coefficient is a test statistic that indicates how strong the relationship between two measurements is. This statistic always lies between .00 (indicating that there is no relationship between the measurements) and 1.00 (there is a perfect relationship between the measurements). Correlation coefficients can either be positive or negative. If the number is squared, then we see to what extent the proportion of the total variance of both measurements is systematic. The higher the correlation, the more two variables are related. Three types of reliability are discussed below:

1. Test-retest reliability

Test-retest reliability indicates the consistency of the responses of participants over time. Subjects participate in an experiment twice. Usually there is a period of several weeks between these two measurements. If we assume that a characteristic is stable, then someone should have the same score twice on the same test. If someone has a score of 110 on an IQ test for the first time, then it is expected that they also have a score of around 110 on the same test the next time. This is because intelligence is a relatively stable characteristic. However, the scores will most likely not be the same, because there are always measurement errors involved. If both IQ scores have a high correlation (at least .70), then the measurement (so in this case, the IQ test) has good test-retest reliability. We expect high test-retest reliability for intelligence, attitude and personality tests. With less stable characteristics, such as hunger or fatigue, measuring test-retest reliability is of course of no use.

2. Inter-item reliability

Inter-item reliability is important for measurements that consist of more than one item. Inter-item reliability is namely about the degree of consistency between multiple items on a scale. For example, personality questionnaires often consist of multiple items that say something about, for example, the level of extraversion or the amount of self-confidence of participants. These items are related to a score. When scientists add different responses from participants to get a single score, they must be sure that all these items measure the same construct (for example, extraversion). To see to what extent items are interrelated, an item-total correlation can be calculated for each combination of items. This is the correlation between an item and the rest of all items combined. Each item on the scale should correlate with the rest of the items. An item-total correlation of .30 or higher per item is considered sufficient. In addition to the fact that it must be calculated whether each item is related to all other items, it is also necessary to calculate the reliability of all items together. In the past, split-half reliability was calculated for this. 

  • With split-half reliability, the items are first divided into two sets. Sometimes the even numbers are put in one list and the uneven numbers are put in another list. However, sometimes it is decided, for example if there are ten items, to put the first five items in one list and the next five items in the other list. Then for each list (or 'set') the total score calculated. The correlation between the two sets is then calculated. If the items in both sets measure the same construct, there should be a high correlation between both sets. A high correlation means a correlation of .70 or higher. The disadvantage of split-half reliability is that correlation that is found is dependent on the way items are assigned to the sets. If you divide the sets slightly differently, the researcher could yield a completely different split half-reliability. Calculating the split half-reliability therefore provides relatively little certainty.

  • For this reason, nowadays the "Cronbach's alpha coefficient" is calculated. With the Cronbach's alpha you calculate (by means of a simple formula) the average of all possible split half-reliability. Scientists assume that inter-item reliability is good if Cronbach's alpha is .70 or higher.  

3. Interrater reliability

Interrater reliability is also called "interjudge" or "interobserver" called reliability. It concerns the extent to which two or more researchers observe and interpret the behavior of the test subjects in the same way. If one researcher states that a rat has pressed a lever 15 times and the other researcher states that the same rat has pressed a lever 20 times, then something is wrong with interrater reliability. Researchers often use two methods to calculate the interrater reliability. If researchers only have to note whether a behavior has occurred, we can calculate a percentage that represents how often the researchers agree with one another. However, if the researchers have to assess the behavior of the participants on a scale (for example, a score for anxiety between 1 and 5), then we can see whether the researchers give all individual participants the same assessment. If the researchers make similar assessments (i.e. with a high interrater reliability), the correlation between their assessments should be .70 or higher.

Increase the reliability of measurements

It is important that a researcher strives to maximize the reliability of a measurement. This is possible in four ways:

  1. Standardizing the administration of a measurement. Every participant must be tested under exactly the same conditions. Environmental differences while doing the measurements can lead to measurement errors, thus to a decrease in reliability.

  2. Clarifying instructions and questions. Measurement errors occur when participants do not fully understand instructions or questions. It is good to test as a researcher if the questions are understandable for participants in advance.

  3. Training observers. If the behavior of participants must be observed and assessed, then it is necessary to train the observers properly.

  4. Minimizing errors in coding the data. No matter how reliable a measurement technique is, measurement errors occur when researchers make mistakes while encoding or processing the data; for example when storing data in the computer. 

What is meant by validity?

Measuring techniques should not only be reliable, but also be valid. Validity refers to the extent to which a measurement technique measures what it should measure. So; do we measure what we want to measure? It is important to note that reliability and validity are two different things. A measuring instrument can be reliable but invalid at the same time. High reliability tells us that the measuring instrument measures something, but not exactly what the measuring instrument measures. To find out, it is important to look at the validity. Validity is not an established feature of a measurement technique. A measurement can be valid for one purpose, while it is not valid in another context. Researchers distinguish between three types of validity: (1) 'face' validity, (2) construct validity and (3) 'criterion-related' validity.

1. Face validity

Face validity is about the extent to which a measurement appears to measure what he should measure. A measurement has face validity if people think it is valid. This form of validity can therefore not be calculated statistically, yet it is a kind of emotional judgement of people considering the measurement. The face validity is determined by the researcher, by the test subjects and / or by experts in a relevant field. If a measurement does not have face validity, then test subjects often do not find it really important to participate. If a personality test has no face validity and test subjects are asked to complete this test, then they don't understand the usefulness of the test; which might reduce their motivation to participate actively in the study. It is important to remember three things: (1) that a measurement has face validity does not mean that it is really valid, (2) when a measurement does not have face validity, this does not necessarily mean that there is no validity and (3) sometimes researchers want to mask their goals. For example, if they are worried that participants will not answer sensitive questions fairly, they can design measuring instruments that have no face validity.

2. Construct validity

Researchers are often interested in hypothetical constructs. These are constructs that cannot be observed directly on the basis of empirical evidence. In this case you should think of constructs such as intelligence, status, self-concept, morality and motivation. The question arraises how we can know whether the measurement of a hypothetical construct (which is not perceptible) is valid. Cronbach and Meehl state that we can determine the validity of the measurement of a hypothetical construct by comparing it with other measurements. It is therefore important to look at the relationship between the measurements. For example, scores on a self-confidence measuring instrument should be positively correlated with optimism measurements, but should be negatively correlated with instruments that measure uncertainty and anxiety. We determine the construct validity by analyzing the extent to which a measuring instrument is related to other measuring instruments. This is often done through calculating the correlation coefficients.

In contrast to reliability (where the correlations must be above .70), there is no rule of thumb as to how large the correlations must be to determine the construct validity. A measuring instrument has construct validity if it (1) highly correlates with measuring instruments with which it should correlate (convergent validity) and (2) when it does not correlate (or correlates to a low degree) with measuring instruments with which it should not correlate (discriminant validity).

3. Criterion validity

Criterion validity is about the extent to which a measuring instrument ensures that we can distinguish between participants on the basis of a certain behavioral criterion. An example of a question is whether the scores on a motivational test during pre-university education say something about who will and will not do well during their university study. The behavioral criterion in this case is the university study. Criterion validity is most often applied in research settings; think of educational settings or applications. Researchers distinguish between two primary types of criterion validity: (1) concurrent and (2) predictive validity. The most important difference between these two types of criterion validity is the amount of time that passes between taking the measuring instrument and determining the behavioral criterion.

  • Concurrent validity is determined when two instruments are used around the same time. The central question is whether the measuring instrument can properly distinguish between people who score high and people who score low on the behavioral criterion at that specific moment. If scores on the measuring instrument are related to behaviors to which they should be related at that time, the measuring instrument has unequal validity.

  • We speak of predictive validity when a measuring instrument is able distinguish between people on a behavioral criterion in the future (for example, the motivation test during pre-university education and a performance test while studying at the university). Can a score on the motivation test during pre-university education predict how someone will do it during a university study? Criterion validity is especially important for studies conducted in an educational setting.

When are biases involved?

In recent years much attention has been paid to the idea that some measuring instruments are biased towards certain population groups. This would in particular apply to intelligence tests and tests that measure academic abilities. Test biases arise when a specific measuring instrument is not equally valid for everyone who completes the test. This means that the test scores are better at reflecting the capacity of one group than of another group. In that case, it would seem as if one group is more skilled in that area than the other, while it doesn't have to be that way in reality. 

It is often difficult to determine the presence of the test bias. When one group performs worse on a test than the other, it does not necessarily mean that there is a test bias. It is possible to establish test bias by determining the predictive validity of a measuring instrument for different groups. If there is a bias, then the future outcomes will be predicted better for one group than the other.

Approaches to Psychological Measurement - Chapter 4

What different studies can be distinguished?

In their study, researchers can use observational methods, physiological methods, self-reports and archive material. All these research methods can be combined with all four research methods discussed in Chapter 1 (descriptive, correlational, experimental or quasi-experimental).

In a correlational study, for example, a scientist can observe participants' shyness, measure their physiological responses during social interaction, expect them to answer questions (self-report), and keep a diary that can be studied later (archive material). This part successively examines (1) observational methods, (2) physiological methods, (3) self-reports and (4) archive material.

Observational methods

Many scientists observe behavior to answer their research questions. For example, they can observe whether people eat, blush, smile, help, blink or yawn. Scientists who use observational methods have to make three choices: (1) will the observation take place in a natural environment or in an artificial environment, (2) are the participants aware of that they are being observed and (3) how will the behavior of participants be measured?

Naturalistic observation and artificial observation

  • Naturalistic observation occurs when a scientist observes a naturally resulting behavior without having to intervene. The scientist does not design an artificial situation to observe the behavior. Scientists who want to know how animals behave in their natural environment often use naturalistic observation.

  • Participant observation is a form of naturalistic observation. In that case the scientist participates with the participants in the behavior he observes. For example, a scientist can join devil worshipers or gays and gangs for research purposes. However, participating in the behavior of a group that a scientist is studying can cause problems. When scientists begin to identify with the group members, they may no longer be able to observe the group process from a objective point of view. Moreover, the researcher could (unintentionally) also influence the behavior of the participants. 

  • The opposite of naturalistic observation is artificial observation ('contrived observation'). With artificial observation, behavior is observed in situations that are specifically designed for this purpose. Often, the behavior is observed in a laboratory; so participants know that they are being observed. However, it is also possible to do artificial observations outside of a laboratory. Scientists can, for example, stage an 'emergency' on the street and observe whether people are helpful to the 'victims'.

Masked and unmasked observation

The second choice that scientists have to make is whether the participants are aware that they are being observed.

  • When the participants know that they are being observed, there is unmasked observation ('undisguised observation'). However, the problem with this method is that people often do not respond naturally when they know they are being observed. This is also called reactivity .

  • To prevent reactivity, the researcher can use masked observation ('disguised observation'): In this case, subjects do not know that they are being observed. The problem with masked observation, however, is that this form of observation can violate the privacy of the involved test subjects, who participate in an investigation without their knowledge. Therefore, they cannot sign a contract in advance ('informed consent') stating that they want to be a participant in the study. The fact that the test subjects are unable to choose for themselves whether they want to participate in the research causes an ethical dilemma. This problem can be solved partially by implementing the partial secrecy ('partial concealment') strategy. In a study that applies this method, the test subjects are aware that they are participating in a study, but do not know exactly which aspect of their behavior is being investigated.

  • Because people often do not behave naturally when they know they are being observed, scientists sometimes try to measure the desired behavior not directly, but indirectly. For example, researchers can ask people who know the test subject well ('knowledgeable informants') to observe and assess the behavior of the test subject in question; consider, for example, parents, best friends, colleagues and teachers. Another form of masked observation is the implementation of an unobtrusive measurement. This is a measurement that can be performed without participants knowing that they are being studied. With an unobtrusive measurement, the researcher does not ask his or her test subjects about their behavior directly, but he or she observes the behavior indirectly. For example, if a researcher fears that test subjects will lie about their alcohol consumption, he or she can count the number of empty alcohol bottles that are in the waste bin that is in front of the test person's house.

Measuring behavior

The third question that a scientist should think about is how behaviors should be measured and recorded. Some behaviors are very complex, making them relatively difficult to clearly describe and record. Researchers can make use of (1) a narrative record, (2) a checklist, (3) a temporal measurement or (4) rating scale:

  1. A 'narrative' / 'specimen record' is a complete description of the behavior of a participant. The goal is to record everything the participant says and does as accurately as possible. In the past, these stories were mostly written down, but nowadays they are audio- or video recorded more often. Sometimes scientists do not record exactly everything the participants do and say, but they mainly make short summaries of the behaviors they observe. To be able to analyze a story, a content analysis must be done. This will be discussed in more depth later on.

  2. A checklist can also be used . A checklist is structured , while a story is not structured. This is because a researcher with a story is free to use whatever language he or she wants to use. However, on a checklist behaviors are described more specifically. All the scientist has to do then is to state whether these behaviors have occurred. A checklist sounds simple, but it is often difficult to come up with good operational definitions for the behaviors that appear in the checklist. For example, if the checklist says 'has smiled', it may be difficult to state exactly a smile is and what kind of expressions should no longer be considered a smile.

  3. Sometimes researchers want to know when and for how long behaviors are carried out. In that case it is possible to take time measurements. Scientists are often interested in the time delay between two behaviors ('latency'). This can be measured by the response time; representing the time that passes between the appearance of a stimulus and the observation and / or response to this stimulus. Latency ('latency') can also be measured based on the time required to complete a task ('task completion time'). Another way to measure latency is 'interbehavior latency':  the time that elapses between one behavior and another kind of behavior that follows. For example, a scientist can investigate the particular eye movements a person makes after he detects that he has been bleeding. In addition to latency, a researcher may also be interested in the duration (duration) of a behavior. For example, a researcher may be interested in how long people have eye contact during a conversation.

  4. Sometimes researchers want to study the quality and intensity of a behavior. For example, a scientist may wonder how intense a child cries when he or she is bullied. This can be done through observational assessment scales. For example, the crying of a child can be assessed on a three-point scale (1 = no crying, 2 = average crying and 3 = crying hard). When assessing the crying of a child, there is of course always a degree of subjectivity. That is why it is important that there are non-ambiguous criteria that, for example, distinguish between 'average crying' and 'loud crying'.

Increase the reliability of observational methods

Observational systems must have interrater reliability; which concerns the extent to which the observations of two independent researchers are similar. If the interrater reliability is low, this means that the researchers do not use the observation system in the same way. The reliability of observation systems can be increased in two ways: (1) by using clearly defined operational definitions and (2) by talking as researchers with one another prior to the conduction of the experiments about how behaviors will be encoded, compared and discussed.

2. Physiological measurements and neurological tests

Neuroscience is the interdisciplinary field that deals with the biochemical, anatomical, physiological, genetic and developmental processes that influence the nervous system. Many neuroscientists are curious about how changes in the brain and nervous system are related to psychological phenomena such as perception, thoughts and emotions. Psychophysiological and neuroscientific measurements can be divided into five categories: measurements of (1) neural electrical activity, (2) neuroimaging, (3) the activity of the autonomic nervous system, (4) blood and saliva and (5) external responses:

  1. Measurements of neural electrical activity: Measuring electrical activity in the brain can be done by EEG scans, which measures brain waves by attaching electrodes to the skull. This means that the electrical activity of the brain can be analyzed. Sometimes scientists insert the electrodes into certain parts of the body, for example in muscles, to see what the physiological responses are to e.g. stress and emotions.

  2. Neuroimaging: this method allows scientists to see how the brain works and what kind of activity occurs in the brain. Neuroimaging can be divided into two different types: structural neuroimaging and functional neuroimaging. Structural neuroimaging is used to investigate the physical structure of the brain. Functional neuroimaging is used to measure brain activity. An example of functional neuroimaging is the fMRI. With fMRI, the head of a person is placed in a kind of large scan. This scan provides a strong magnetic field. Brain areas where much oxygen and blood is going are more active than brain areas where less oxygen and blood goes.

  3. Activity of the autonomic nervous system: for example heart rate, blood pressure or body temperature are measured.

  4. Blood and saliva: physiological processes are measured by analyzing blood and saliva. For example, there may be found an increased level of testosterone in the blood, which can be an indication of aggression. 

  5. External responses: in this case, physical responses are measured that can be seen with the naked eye, but that must be measured with a special measuring instrument. For example, when examining the facial expression associated with shame, special sensors can be applied to the face that detect when someone is blushing.

Neuroscientific and physiological methods are not used very often. Researchers are usually not interested in the physiological response itself; the physiological reaction is only interesting to them if it is an indicator of another phenomenon. A change in the face, for example, is not measured to study the movement of the muscles, but, for example, to determine the corresponding emotion.

3. Self-reports

Researchers of behavior prefer to make direct observations. However, this is sometimes not possible, for example due to ethical reasons. In this case, self-reporting is also a good option to measure something like emotions or thoughts. There are two types of self-reports: (1) questionnaires: when questions are answered in a written manner and (2) interviews: when the researcher asks questions verbally and the participant answers them verbally. For both types of self-reports, single-item measuring instruments and multi-item measuring instruments can be used. An item is a part of a questionnaire that elicits a certain response from the person completing the questionnaire. It can be a statement, but also a question. A single-item measuring instrument is when the question (the item) is not linked to other items. The items are completely separate here. With a multi-item measuring instrument, the different items (which measure the same construct) are actually linked to each other. A multi-item measuring instrument has more reliability and validity than a single-item measuring instrument.

When compiling interviews or questionnaires, it is important to pay attention to the following:

  1. Questions must be specific and formulated in a precise manner. This ensures that every question is properly interpreted and understood.

  2. Questions must be formulated as simple as possible. Difficult words must be avoided as much as possible. It is not wise to use more than 20 words per item.

  3. No initial assumptions should be made about the participants. Researchers often think that other people are exactly like them. However, this does not have to be the case. As a researcher you have to take this into account. The question "how is your relationship with your mother?" can feel very natural, for example because a researcher has a mother himself. However, other people may have been adopted or grew up without a mother. 

  4. Conditional information must precede the most important part of the question. This means that it is better to "if a good friend is depressed for a long time, would you recommend him to a psychologist?" instead of "would you recommend a good friend to a psychologist if he is depressed for a long time?"

  5. Do not use questions that contain more than one question. A example is 'do you eat healthy and do you exercise regularly?'

  6. Choose a good response format. The response format refers to how a participant is ought to respond to the item. There are three types of reaction formats:

    1. An open question ('free-response format ' or 'open-ended format' ): the participant can give any answer he or she likes to give; he or she isn't required to restrain to certain options. The disadvantages of open questions ('free-response format') are (1) that when giving an answer, the participant is considering how extensive the answer should be and (2) that it is difficult for the researcher to encode and analyze these answers.

    2. A rating scale response format ('a rating response format'): questions according to this format contain multiple answer options that differ in intensity, for example five answer options ranging from 'totally disagree' to 'totally agree'. An example of this is a 7-point scale.

    3. The last possibility is the multiple choice ('multiple choice' / 'fixed-alternative response format'):  questions contain multiple answer options that clearly differ. Unlike the assessment scale response format, the answer options differ not only in intensity, but in many more aspects. An example is asking someone's opinion about abortion and offering the following options: (1) 'I am against abortion in all situations', (2)' I only approve abortion in certain situations, such as the life of a person. woman is in danger 'and (3)' I think a woman should know that herself '. A true-false response format ('true-false response format') is a variant of the multiple-choice option.

  1. Test the questions first. If it is possible, a researcher should first try to answer the questions him- or herself to see if they are formulated clear enough.

4. Questionnaires

Questionnaires are used often in psychology. People can tell more about their feelings, attitudes and lifestyle through questionnaires.

For example employers try to get an idea of applicants on the work floor by having them fill out questionnaires. Sometimes, researchers have to design new questionnaires, but often they can use previously published questionnaires. This is because previously published questionnaires are often proven to be reliable and valid and have been used in several studies. Designing new questionnaires takes a lot of time and is risky. There are four main points when it comes to questionnaires: (1) they are often published in scientific articles, (2) there are books written on how to critically evaluate questionnaires, (3) there are databases on the internet that describe psychological measuring instruments and (4) some questionnaires can only be purchased from publishers.

5. Interviews

An interview schedule stands for the questions used in an interview. A number of tips are given below to make an interview run as smoothly as possible. This increases the quality of the test subjects' answers.

  1. Create a pleasant atmosphere. This makes the test subject feel comfortable.

  2. Be friendly and interested in the participant.

  3. Do not give react to the answers of the participant with facial expressions which indicate your own opinion.

  4. Make sure that there is a logical structure in the interview and start with the most simple questions.

  5. Ask questions exactly as they are formulated on paper.

  6. Do not lead the participant in a certain direction.

What are the benefits of questionnaires and interviews?

Both questionnaires and interviews have their advantages and disadvantages. Taking questionnaires requires less training of the researcher, they are cheaper and take less time than interviews. If a study is about sensitive subjects, then a questionnaire is also more useful, because the anonymity of the participants can be guaranteed. Participants will therefore probably be more honest on questionnaires than during interviews. Interviews are more useful for illiterate people, children, people with cognitive deficits and other people who cannot fill out questionnaires for some other reason. In addition, during an interview, the researcher is able to know for sure whether the participant has understood the question. In addition, more detailed information may be requested about complex topics. In this case, interviews are also more suitable.

Which errors can occur in self-reports?

There are three types of errors (biases) that occur with self-reports:

  1. The social desirability response bias: sometimes people do not want to admit that they have certain thoughts or perform certain behaviors because they know that other thoughts or behaviors are not socially accepted. The social desirability bias makes test subjects less honest, which negatively influences the validity of the measuring instrument. Social desirability can never be completely eliminated, but the possibility of this phenomenon occurring can be reduced. This can be done by asking neutral questions or by making it clear to participants that they can process their answers completely anonymously. In observational research, the researchers must be as unobtrusive as possible. This makes the test subjects start to behave more naturally. 

  2. Approval ('acquiescence'): this is the tendency some people have to agree with propositions without looking at the content of these propositions.

  3. Rejection ('nay-saying'): this is the tendency for some people to disagree with propositions without looking at the content of these propositions. Research shows that approval and rejection only have a small effect on the validity of self-reports, only if people have the same amount of options to agree with propositions as to disagree with them.

When do researchers use archive material?

Archive material ('archive data') is used by scientists to analyze existing data (such as letters, court rulings and newspaper articles). Archive material is useful when (1) psychological and social events in the past have to be studied, (2) social and behavioral changes have to be studied over the years, (3) some subjects for which archive material is necessary are studied, (4) an event is being studied that has occurred before (of which it is not possible to wait until the event occurs again) or when (5) large amounts of data are needed about certain events that take place in the 'real' (non-manipulated) world.

The major disadvantage of archive research is that the researcher must be satisfied with the information that is available. However, often there is not enough information on the subject. And even if there is enough information available, the researcher is left with the question of whether the information he or she uses is sufficiently reliable and valid.

Content analysis

Often the content of what test subjects say during an interview must be studied. To be able to do this, researchers must convert the test subjects' verbal responses into meaningful data. This can be done by means of content analysis . This is a set of procedures that is designed to convert text into numerical data. The purpose of content analysis is to classify words, sentences or other units of text into a number of meaningful categories. It is irrelevant here whether the text is written or spoken.

  1. The first step is to decide what type of text to analyze. For example: does the text contain words or sentences?

  2. The next step is to determine how the text is to be encoded. The researcher must make a choice between two options: (1) classifying the text into different categories or (2) assessing each piece of text on specific dimensions, for example by means of a five-point scale. Clear rules must be established to classify or assess the text, so that interrater reliability is as high as possible.

  3. Once these rules have been created, encoding can start. Encoding systems can be devised especially for the content analysis of a certain text, but often it is useful to search for existing encoding systems first. Moreover, a number of computer programs have been designed that can encode texts.

Selecting Research Participants - Chapter 5

How does sampling work?

Psychologists use descriptive studies less often than correlational and (quasi-) experimental studies. Nevertheless, descriptive studies remain important and it is therefore necessary to make use of a reliable sample. A sample consists of a number of people selected from a population, of whom the researcher wants to make statements. When drawing samples (sampling) a researcher aims to select a representative sample of participants from a population.

In what ways can samples be taken?

A representative sample is a sample from which accurate and error-free estimates can be made about the population. A sample is representative for the population if a certain characteristic occurs just as often in the sample itself as in the population. However, a sample is often not a perfect reflection of the population. The difference between a sample and the corresponding population is called a 'sampling error' . For example, if a researcher studies the grade average of 100 students at a university, that average may differ from the average of all students at a university. Fortunately, researchers can estimate to what extent their sampling results are influenced by sampling error. The error margin ('error of estimation' / 'margin of error') gives an indication of the extent to which the data of a sample is expected to deviate from the entire population. For example, in an election poll the margin of error associated with choosing President X (45%) can be 3%; which means that we can assure with 95% that President X will receive between 42 and 48% of the votes. The smaller the error margin, the better the results from the sample resemble the population data. The error margin is influenced by three factors: (1) the size of the sample, (2) the size of the population and (3) the distribution in the data:

  • The larger a probability sample is, the better it resembles the population and the more representative it is for the population. So the greater the probability sample, the smaller the error margin. However, scientists do not opt ​​for the largest sample possible. They choose an economic sample; which gives a more or less accurate picture of the population and costs as little effort and money as possible.

  • The size of the population is of course also of importance. A sample of 100 people drawn from a population of 500 people is more representative than a sample of 100 people drawn from a population of 100,000 people.

  • Finally, the more dispersion the data contains, the harder it is to accurately estimate the population value.

A margin of error is only meaningful when we make use of a probability sample. This is a sample of which the researcher knows the mathematical probability that a specific individual from the population will be selected in the sample. If a probability sample is not used, it is not clear whether the sample data really says anything about the population. Probability samples can be selected in three ways: by (1) 'simple random sampling', (2) 'stratified random sampling' and (3) 'cluster sampling'.

1. Simple random sampling

With 'simple random sampling' the sample is chosen in such a way that every possible sample has an equal chance of being selected from the population. For example, if a researcher wants to select a sample of 100 people from a population of 5,000 people and every combination of 100 people has the same chance of being selected, then it is a 'simple random sample'. To select such a sample, a researcher must make use of a 'sampling frame''; which is a list showing the entire population from which the sample will eventually be drawn. The test subjects are then selected from this list at random. With large populations, it is difficult to make a list of all possible participants. In that case researchers use a table of random numbers. A disadvantage of 'simple random sampling' is that we need to have information in advance about how many individuals the population contains. We also need a 'sampling frame'. In some situations it is not possible to make a sampling frame. In such cases the method of 'systematic sampling' is chosen. With 'systematic sampling', a list (sampling frame) cannot be made in advance. For this reason, random research is only used during the research itself. Every umpteenth person is chosen to be in the sample. For example, it can be said that every eighth person may participate in the study. 

2. Stratified random sampling

Stratified random sampling is a form of simple random sampling. In this case, however, people are not directly selected from the population, but the population is first divided into multiple strata. A stratum is a part of the population that corresponds to a certain characteristic.

For example, we can divide the population into men and women or into three age groups (20-29, 30-39 and 40-49). Subjects are then randomly selected from each of these strata. This procedure allows researchers to be sure that the same number of people are selected from each stratum. Researchers often use a 'proportionate sampling method'. Individuals from each stratum are selected by proportion; which means that the percentage of people participating in the study (from a certain stratum) corresponds to how often these people occur in the population in reality. If 55% of the people in a population are male and 45% are female, the intention is to contain these same percentages in the sample.

3. Cluster sampling

The major disadvantage of simple and stratified random sampling is that before a selection can be made, information must be known about how many (and which) individuals occur in a population. This is particularly a problem with populations that are very large (for example, in a study towards 'the Dutch citizen'). When it is difficult to obtain the required information in advance, cluster sampling is used. In this case, the researcher does not draw individuals from the population first, but rather, he composes clusters of potential test subjects. These clusters are often based on clusters that naturally exist, such as areas in a country. Cluster sampling often uses the method of 'multistage sampling': First, large clusters are established. Then smaller clusters are established within these larger clusters. And so this process continues until a sample is created, containing test subjects who were randomly chosen from each cluster. Cluster sampling has two advantages: (1) it does not require a sampling frame (but only a list of clusters, which is much easier to compose) and (2) each cluster consists of a group of participants who live close together geographically As a result, it takes less time and effort to contact the test subjects. 

What is meant by non-response and misgeneralization?

The non-response problem arises when individuals who are selected to be in the sample fail to respond. For example, consider approaching the selected test subjects by phone. Not everyone who receives this phone call actually wants to participate in the study. In this case, there is a good chance that they hang up the phone. Non-response is a problem since people who do not want to participate in the study may differ significantly on certain characteristics from people who do want to participate. People who don't want to participate are not examined; so their characteristics are not taken into account in the results of the study. This reduces the benefits of the probability sample. The non-response problem can be solved in two ways: (1) Researchers can ensure that the response rate increases. They can do this, for example, by calling the test subjects after the first examination, to ask them to participate in the follow-up study as well. (2) Investigate whether the people who do and do not respond systematically differ from each other. For example, when recruiting test subjects, researchers can promise a small sum of money as a reward for participating. People with little money will be more sensitive to this than people with a lot of money. Relatively, more poor people will participate in the study. 

Even when a probability sample is used, the results can be misleading and the researcher can therefore make up incorrect statements. In such a case there is misgeneralization . An example of misgeneralization is an investigation of children in primary schools: To get a representative picture, both poor and rich children should be included in the study. If only children in private schools would be examined, and a statement would then be made about the entire population of primary school children, then this would be a misgeneralization.

What other kinds of sampling can be distinguished?

In some situations it is hard or impossible to select a probability sample. In that case a 'nonprobability sample' is used. In this case, researchers do not know what the chances are that a particular individual will be chosen for the sample. For this reason, the error margin cannot be calculated either. Therefore, researchers do not know how representative their sample is. A lot of psychological research is done on the basis of samples that are not representative of the population. Nevertheless, these types of samples are suitable for certain studies: For example, nonprobability samples are suitable for studies in which testing hypotheses is important, and in which the population is not described. The generalizability of nonprobability samples can be discovered in experimental studies by repeatedly replicating studies: The second experiment can be performed with individuals who differ in age, education level and / or socio-economic status from the individuals who participated in the first experiment. There is more confidence in the validity of a study when different samples (with similar characteristics) produce similar results. There are three types of nonprobability samples:

  1. Convenience sampling: This is a sample for which researchers make use of the participants who are immediately available. For example, a researcher can address the first 150 people on the street and ask them if they want to participate in his or her experiment. A major advantage of the convienience sample is that it is much easier to recruit test subjects with this method than it is with representative samples.

  2. Quota sampling: in the case of a 'quota sample', the researcher determines what percentages must be met before doing the experiment. The sample is then selected based on these percentages. For example, a researcher could aim to select exactly 20 men and 20 women for his experiment instead of selecting 40 people randomly, without regard to gender. 

  3. Purposive sampling: with a 'purposive sample', researchers have a certain idea about what kind of test subjects they believe to be typical for the population. Based on this idea, they select people who they would like to participate in their experiment. The problem with purposive sampling, however, is that this process is very subjective. That is why, in general, it is better not to use this method for scientific research.

What role does power play?

Power refers to the extent to which a study is able to demonstrate the effects of the variables that are studied. A study with a lot of power shows which effects are present, while a study with little power doesn't notice these effects. The power of a study is influenced by many factors. One of these factors is the number of test subjects. In general, the more test subjects, the greater the power.

Strong effects are easier to demonstrate in a study than weak ones. A study with little power often recognizes the stronger effects, but not the weaker ones. The power of a study increases when more test subjects participate. Great power is needed to recognize weak effects. To recognize weak effects, it is therefore useful to include many test subjects.

Descriptive Research - Chapter 6

Why do we conduct descriptive research?

The purpose of descriptive research is to systematically and accurately describe the characteristics or behaviors of a population. It provides information about the physical, social, behavioral, economic or psychological characteristics of a group of people. There are three types of descriptive surveys: (1) surveys, (2) demographic surveys, and (3) epidemiological surveys.

1. Surveys

Surveys are the most common descriptive surveys. Many people think that surveys and questionnaires are the same, but that is not the case: A survey can be both a questionnaire and an interview. There are four types of surveys:

  1. Cross-sectional surveys ('cross-sectional survey design'): A group of participants from the population is studied. This method can provide important information about the characteristics of the group that is being studied. When more than one group is examined with a cross-sectional survey, information can be collected about how the groups differ from each other.

  2. Successive independent surveys ('successive independent samples survey design'): two or more samples from the population are asked the same questions at different points in time. Although both samples consist of different people, statements can be made about changes over time.

  3. Long-term surveys ('longitudinal' / 'panel survey design'): A group of participants fills out a survey multiple times. In this way, the researcher can see if the same people have changed in their characteristics over time. However, it often happens that people drop out; for example, they come to the first measurement moment, but not to the second. This causes the sample to change, and consequently less reliable statements can be made about the changes over time.

  4. Internet surveys: This method has both advantages and disadvantages. The advantages are that the method is cheap, the data is processed automatically and people can be contacted to participate in te study who are normally difficult to reach. The disadvantages of these kind of surveys are that the researcher has little control over the people who participate, and the people without an internet connection are automatically excluded from participation to the study. In addition, it is difficult to check whether people have completed the questionnaire more than once.

2. Demographic research

With demographic research researchers try to describe and explain patterns of life events such as birth, divorce, migration, health and death. Sociologists are often the ones who carry out demographic research, but psychologists also occasionally use the method. For example, they may be interested in investigating marriage patterns and divorce rates in different populations.

3. Epidemiological research

Epidemiological research is used to analyze how many people in different groups suffer from a certain disease. Psychologists are interested in epidemiological research for two reasons: First, the occurrence of many diseases is influenced by the lifestyle of the patient. Through epidemiological research we can find out which groups are more at risk of contracting certain diseases. In this way, many diseases can be prevented. Secondly, this type of research describes how often mental disorders occur. The prevalence and incidence are considered. Prevalence considers how many people have a disease at a certain point in time. Incidence considers how many cases of a certain disease are added in a certain period of time.

How are data described and presented?

Scientists need to think carefully about how to describe and present the data obtained in their research. Descriptions of data must meet three criteria: (1) accuracy, (2) conciseness and (3) comprehensibility. Therefore, after the data has been accurately described, the researcher must also consider how the data can be described in a concise manner. Presenting the raw data is very accurate, but not concise. Raw data contain the scores of all test subjects on all measurements. Presenting all the data looks very unclear to the reader. Therefore, the raw data must be summarized. The data must remain understandable to the reader. Data can be described and presented using two types of methods: (1) numerical methods and (2) graphical methods:

  • Numerical methods summarize data in the form of numbers, such as percentages and means.

  • With graphical methods, graphs and images are used to summarize the data.

Frequency distributions

Often data are first described on the basis of a frequency distribution. This is a table that summarizes the raw data. It shows how often certain scores occur in each category.

  1. A simple frequency distribution shows how often a specific score was obtained by the test subjects. A simple frequency distribution consists of two columns. In the first column the scores are arranged in order from lowest to highest score. The lowest scores are at the top, the highest at the bottom. The second column shows how often each of these scores were achieved (which is called the frequency of the score).

  2. If a lot of different scores are obtained, a simple frequency distribution may appear a bit unclear to the reader. In that case it is better to use a grouped frequency distribution; in which you can see how many people have a score within a certain category of scores. Categories are made of, for example, all scores within the range of 1-10, 11-20, 21-30, 31-40 and 41-50. We end up with an overview that shows how many people have a score within a specific category. These groups are called class intervals. Class intervals must always have the same size: It cannot be the case that one interval consists of 5 possible scores, and the other interval consists of 10. Scientists often also add the relative frequencies to the grouped frequency distribution. The relative frequency of each class is the proportion (percentage) of the total number of scores that fall in each class compared to the total number of scores. A grouped frequency distribution has three characteristics: (1) the classes are mutually exclusive, which means that each score can only fall into one category, (2) the categories represent all possible scores, and every possible score is included in the frequency distribution and (3) ) the categories are equal in size.

  3. Often the information in a frequency distribution is more understable when the information is displayed in a graphical way. This is often done in the form of a histogram or a bar graph . The classes are on the horizontal (x) axis of histograms and bar charts, and the frequencies on the vertical (y) axis. The bars of each class show the corresponding frequency; how often a certain answer was given in that category.

  4. There is a clear difference between histograms and bar charts. A histogram is used when the variable on the x-axis is of the ratio or interval scale. Equal differences in the values ​​then represent equal differences in the characteristic being measured. This is why the bars can touch can each other in the histogram. If a variable on the x-axis is on the nominal or ordinal scale (and equal differences between scale values ​​do not necessarily represent equal differences in the characteristic being measured), a bar graph is used. The bars are separated from each other.

  5. Researchers can also use a frequency polygon to present data. The axes represent the same as in histograms and bar charts. With a frequency polygon, however, no bars are used, but the different points are connected to each other with lines.

The midpoint

Aside of frequency distributions, researchers can also make use of descriptive statistics to describe their data. Descriptive statistics summarize the data of the entire group of test subjects.

Measurements of the center point ('measures of central tendency') provide information about the mean or typical score in a dataset. Three statistics are distinguished:

  1. The mean: this descriptive statistic is used most often. The mean is calculated by adding up all scores and dividing this by the number of scores. The corresponding formula is: = Σ xi / n. The major disadvantage of calculating the mean is that this statistic is strongly influenced by outliers; scores that differ markedly (much lower or much higher) from the other scores.

  2. The median: If we arrange all distribution scores in order from lowest to highest, the median is the middle score. In other words; 50% of the scores are lower than the median, and the remaining 50% is higher. The advantage of the median is that it is robust. Unlike the mean, the median is also not influenced by outliers that muchThe median is easy to determine when there is an uneven number of scores, because then it is simply the middle number. However, if there is an even number of scores, then the median is the mean of the two middle numbers.

  3. The mode: this is the most common score in a data set. It is possible that there is more than one mode in a data set.

Error bar

Sometimes you come across graphs of means containing a vertical I-shaped line at the top of the bar (see Figure 6.7 in Leary). This is called the 'error bar' . The "error bar" provides information about the researcher's confidence in the value of each average. We have learned already that we can only estimate the true value of the mean. No sample perfectly reflects the population. Presenting the mean in a figure or a bar in a graph is therefore misleading. The value of the average of the sample is most likely not perfectly similar to the real mean of the population. Because we know that the mean of the sample probably differs from the mean of the population, we need to calculate how accurately the mean in our study reflects the actual mean of the population. This is then confirmed by the confidence interval  or " CI") . The most commonly used confidence interval is that of 95%. We can determine a lower limit and an upper limit based on the mean and the confidence interval. Then we can state that the true population mean falls between these limits with 95% certainty. The confidence interval therefore gives us an idea of ​​the limits within which the population mean is likely to fall. With a relatively small confidence interval, it is more likely that the mean of the sample corresponds to the mean of the population. The wider the confidence interval (i.e. the farther the two boundaries are apart), the smaller this chance becomes. A small confidence interval is therefore better. The confidence interval is indicated in histograms and bar charts by the error bar.

What measures of spread can be distinguished?

It is not only important to be able to calculate the mean score, it is also important to determine how much spread there is in a data set. When calculating this distribution, scientists use distribution measures ('measures of variability'). Three measures of variability can be distinguished:

  1. The 'range': this is the difference between the highest and lowest score in a data set. The range is the least useful measure of spread of all three, because it is only based on the two most extreme scores. The range therefore does not consider the spread within the data set, but only the spread between the highest and the lowest score.

  2. The variance (s²): in chapter 2 it was already shown how the variance is calculated. The advantage of the variance is that all scores are taken into account in the calculation of this statistic. The variance is based on the total of the squared differences between each individual score and the mean. The variance displays the mean difference between the score of each participant and the mean of the data expressed in squared numbers.

  3. The standard deviation (s): this statistic is the square root of the variance. Often, the variance is a large number and hard to interpret, because it is a squared number. It is therefore more convenient to calculate the standard deviation.

Standard deviation and normal distribution

Most of the variables studied by scientists fall within a normal distribution. This applies to, for example, intelligence.

  • A normal distribution peaks in the middle and spreads out in two lower tails on the left and right. This means that most of the scores fall in the middle (around the mean), while just a few very low and very high scores are achieved (these are in the tails). In the case of intelligence, most people have an average IQ. Relatively few people have an extremely low or extremely high IQ.

  • However, sometimes the data are not distributed normally, yet it follows a skewed distribution. With a positively skewed distribution, there are more lower scores than high score: The peak is shifted to the left tail. With a 'negatively skewed distribution' there are more higher than low scores; in this case the peak is shifted to the right tail.

  • In a normal distribution, approximately 68% of the scores fall within one standard deviation from the mean.
    For IQ, for example, the mean is 100 and the standard deviation is 15. Approximately 68% of people score within one standard deviation from the mean, i.e. between (100-15 =) 85 and (100 + 15) = 115. Moreover, 95% of the scores in a normal distribution fall within 2 standard deviations from the mean. So for IQ, 95% of people have an IQ between (100-30 =) 70 and (100 + 30 =) 130.
    Knowing the standard deviation and the mean is very useful. If we know these two variables, we do not only know how much the data varies, but we also know how the scores are distributed across the different ranges of scores.

The z score

The individual scores aren't very meaningful to us. To give meaning to the scores, we must be able to compare the scores with each other and especially with the mean. This can be done by means of the z-score (also known as the standard score). The z-score shows how many standard deviations a score deviates from the mean. If someone has a z-score of -1.00, then we know that his or her score is one standard deviation lower than the mean. A z-score of +2.96 shows that a person's score is almost three standard deviations above the mean: 

  • If the mean and standard deviation of a sample are known, then a person's z-score can be calculated according to the following formula: z = (yi - x) / s. In this formula, yi stands for the score of the participant, x for the mean, and s for the standard deviation of the sample. 

  • Sometimes, scientists standardize their data set by turning all raw scores into standard scores (z-scores). This is a handy method for determining outliers. An outlier becomes more apparent when the score is standardized. Outliers have a very high or very low z-score, for example -3.00 or +3.00.

  • Remember that when a data set is standardized, the mean is always 0 and the standard deviation is always 1.

Correlational Research - Chapter 7

Scientists are often interested in whether two variables are related. These variables can be, for example, embarrassment and self-confidence, temperature and aggression or fear of failure and academic achievement. We determine whether a variable is related to another variable by looking at whether the variables covarate; so the extent to which they vary together. If self-confidence were associated with shyness, then high scores on self-confidence should go together with low scores on shyness and vice versa. However, if it appears that there is no such consistent relationship between shyness and self-confidence, then we can conclude that these variables are not related. This would be the case, for example, when high scores on self-confidence are just as often accompanied by high scores for shyness as low scores. Two variables therefore covary when they increase and decrease together. When researchers want to know if variables are related, they use correlational research. Correlational research is used to determine whether two or more variables are related to each other, and if so, to what extent. The relationship between two variables is expressed in a correlation coefficient. For example, a scientist may be interested in the extent to which an attribute such as extraversion is inherited or acquired. To determine this, he could calculate the correlation between the extraversion of a child and the extraversion of their biological parents and also calculate a correlation between the extraversion of a child and the extraversion of their adoptive parents. When a higher correlation is found between the extraversion of the child and that of their biological parents, a researcher could conclude that extraversion is a congenital rather than an acquired trait.

What is the correlation coefficient?

A correlation coefficient is a statistic that shows the extent to which two variables are linearly related. A correlation can be calculated between any combination of two variables. The only condition is that we must have the scores of participants on both variables. The Pearson correlation coefficient (r) is most commonly used as a correlation measure. A calculated r always falls within the range of -1 to +1. Two aspects of a correlation coefficient are important: the sign of the correlation (- or +) and the magnitude of the correlation.

  • The sign of the correlation shows the direction of the relationship. Variables can be positively (+), but also negatively (-) correlated. With a positive correlation, high scores on one variable go together with high scores on the other variable. Logically, low scores on one variable also go together with low scores on the other variable. For example, high scores on extraversion could go hand in hand with high scores on adventure. A negative correlation occurs when low scores on one variable coincide with high scores on the other variable or vice versa. For example, a high score for fear of failure can go hand in hand with a low score for an exam.

  • The magnitude of a correlation is a numerical value; independent of the sign of the correlation. When a correlation is zero (r = .00), the variables involved are not linearly related to each other. However, if the value of the correlation increases, then the relationship between the variables becomes stronger. A correlation of +.78 is therefore stronger than a correlation of +.30. The sign of a correlation (+ or -) does not tell anything about the size of a correlation; it only indicates the direction. A correlation of +.78 is therefore just as strong as a correlation of -.78 The magnitude of correlation (numerical value) shows how large a correlation is.

How are correlations displayed in a coordinate system?

The relationship between two variables can be presented in an coordinate system with an x ​​and y axis. A point can be drawn for each participant where the scores on the x and y axes come together. In this way a scatter plot is created in the coordinate system .

  1. Positive correlations have a specific shape: the point cloud starts at the bottom left rises towards the top right: High scores on variable x go together with high scores on variable y.

  1. Negative correlations start at the top left and descend to the bottom right.

  1. The stronger a correlation is, the narrower the point cloud is. With a perfect correlation (-1.00 or + 1.00) all points are on a straight line. With a correlation of .00, however, there is no coherence between the variables: All points are randomly scattered in the plot and there is no regularity in the data.

  1. Sometimes relationships are curved (curvilinear). We cannot calculate a correlation for such relationships; that is only possible for linear relationships.

Calculating with the correlation coefficient

If we know that the correlation between two variables is +.25, what exactly does this tell us? To fully understand a correlation coefficient, we need to square it. This is because a correlation coefficient is not at the ratio scale. For this reason, we cannot perform arithmetic calculations with this coefficient; we cannot multiply and divide with this statistic. Moreover, we cannot compare correlation coefficients that have not been squared. The fact that the coefficient is not on a ratio scale means that a correlation of +.80 is not twice as large as a correlation of +.40.

Squaring a correlation coefficient leads to the coefficient of determination ('coefficient of determination'); which is on a ratio scale. The total variance in a data set can be divided into systematic variance and error variance. Systematic variance is that part of the total variance that is related to the variables being investigated. Error variance is the part that is not related to this. The greater the proportion of systematic variance, the stronger the relationship between the variables. The coefficient of determination shows how much of the variance in one variable is explained by the other variable. If the correlation between the extraversion of a child and the extraversion of their parent is +.25, then the coefficient of determination is therefore .25x.25 = .0625. This means that 6.25% of the variance in the total variance of the extraversion score of the child is related to the extraversion scores their parents. This also means that 93.75% of the variance in the child's score is not explained by the extraversion scores of their parents. Thus, there are many more factors that explain the variance in the extraversion scores of the child; but we do not know these variables or we have not included them in the study.

If two variables are unrelated (r = .00), then one variable does not explain any variance of the other variable at all. In other words: there is no systematic variance between the data. The Pearson formula shows that both the x and y variables must be calculated in order to calculate the correlation. First, Σxy - (Σx) (Σy) / n must be calculated. This must then be divided by the square root of the outcome of (Σx²- (Σx) ² / n)) (Σ y²- (Σy) ² / n)).

When do we speak of statistical significance?

Scientists often want to know if the r they have calculated is statistically significant. A correlation coefficient is of statistical significance when it is found to be based on a sample which very likely is not zero in the population. Suppose the relationship between two variables is .00 in the population. The question is whether a scientist will actually find this on the basis of a small sample that he or she has drawn from the population. Due to measurement errors, it may happen that a researcher does find a correlation based on the sample, while the correlation in the population is zero. Fortunately, we can estimate the probability that the value of r that results from a sample is the same as in the population.

If the chance that the correlation in the population is not zero is very low (often less than 0.05), then there is statistical significance. The statistical significance of a correlation coefficient is influenced by three factors: (1) the size of the sample, (2) the magnitude of the correlation and (3) the degree to which we want to be sure of the conclusion that we have drawn from the sample (the degree of caution of the researcher).

The size of the sample is important for the statistical significance of a correlation coefficient. Using a larger sample, the researcher is more confident that a correlation he has found is indeed not zero in the population. The magnitude of the correlation is related to statistical significance, because the greater the correlation found, the less likely it is to be zero in the population. A scientist who finds a correlation of .70 between two variables is therefore more confident that these variables are also related in the population than a scientist who finds a correlation of only .15.

Finally, it is important how cautious a scientist wishes to be when drawing his conclusions about whether the correlation between two variables could actually be zero in the population. Scientists often argue that a sample's correlation is statistically significant when there is a probability of less than 5% that it is zero in the population.

Strong and weak correlations

When researchers use large samples, even very small correlations quickly become statistically significant. A significant correlation only shows us that the probability is very small that the correlation is actually zero in the population. The presence of statistical significance does therefore not tell us whether the relationship between the variables is strong. The strength of a correlation is dependent on the size of the correlation and not on the statistical significance of the correlation. The rule of thumb is that a correlation of .10 is considered to be weak, a correlation of .30 is considered average and a correlation of .50 and higher is considered strong .

What is the difference between directional and non-directional hypotheses?

When a researcher sets up a correlational study, he or she can choose from two types of hypotheses. A directional hypothesis predicts what direction the correlation will take: This means that this hypothesis predicts whether the correlation will be positive or negative. A non-directional hypothesis only predicts that two variables will be correlated, but not whether this correlation is going to be positive or negative. Directional hypotheses are used more often than non-directional hypotheses.

What factors can influence the correlation negatively?

All kinds of factors can cause researchers to underestimate or overestimate the correlation coefficient. It is therefore important to pay attention to three factors that can illigitemately increase or decrease the strength of a correlation. These factors are (1) 'restricted range', (2) outliers ('outliers') and (3) the reliability of measuring instruments.

  1. Restricted range: this refers to the values ​​that are chosen on the x-axis and y-axis. For example, when the y-axis ranges from 200 to 1600, this yields a different point cloud when making a scatter plot than when the y-axis ranges from 1000 to 1500. As a result, the shape of the point cloud changes. This is especially a problem if the scores follow the shape of a curve: It is possible that (due to the restricted range) only half of this curve is displayed. When only one half of the curve is shown, it can result in the false assumption that there is a linear relationship between the two variables, instead of a curvilinear relationship.

  2. Outliers: these are scores that deviate so clearly from the data that a researcher could wonder whether these scores belong in the data. Most scientists assume that a score is an outlier when it deviates more than three standard deviations from the mean. A distinction is made between two types of outliers: (1) 'online outliers': the score falls in the pattern of the line in the point cloud, but is nevertheless extremely high or low and causes the value of r to rise, and (2) 'offline outliers': the score does not fall in the pattern of the line in the point cloud and causes the value of r to decrease erroneously.

  3. The reliability of measuring instruments: unreliable measuring instruments cause the correlation coefficient being too low. The less reliable the measuring instrument is, the lower the correlation.

What is the difference between correlation and causality?

It is important to consider the fact that a correlation does not mean causality. Even if there is a perfect correlation, it cannot be said that one variable causes another. For example, if there is a correlation between extraversion and self-confidence, we cannot say that extraversion is a cause of self-confidence or that self-confidence is a cause of extraversion .

Before we can conclude that there is causality, three conditions must be met: (1) covariance, (2) direction and (3) exclusion from the influence of other variables.

  • First of all, variables must vary together; covary: A high score on the x variable must be systematically associated with a low score on the y variable or, on the contrary, with a high score on the y variable. Researchers determine whether this condition is met by calculating the correlation.

  • The second condition is direction. This means that the cause must precede the effect. However, researchers cannot determine whether this condition is met just by calculating a correlation. Because the x and y variables are often measured at the same time, it is difficult to make statements about the direction: does x lead to y or does y lead to x?

  • Researchers cannot determine either whether the third condition is met either simply by calculating the correlation. It may well be that a third variable (z) influences both variable x and variable y. For example, research shows that loneliness and depression are correlated. However, this does not mean that loneliness causes depression, because it could also be that both loneliness and depression are caused by another variable. This variable could be, for example, the size of a person's social network. 

Partial correlation

Although we can never speak of causality just on the basis of a correlation, we can use a number of techniques to estimate which variable is likely to cause the other. These techniques can of course never result in 100% conclusive claims, but they can support a causal explanation. In addition, they can indicate with certainty if two variables do not correlate.

If it appears that x and y are correlated, then three relations between the two are possible: (1) x causes y, (2) y causes x and (3) another variable (called z) causes both x and y. A spurious correlation is a correlation between two variables (x and y) that is not due to a direct relationship between these variables, but because of their shared relationship with a third variable (z). The effect of this third variable can be tested by means of a statistical procedure called partial correlation: A partial correlation is the correlation between two variables after the influence of one or more other variables has been statistically removed. For example, if we want to know whether the relationship between x and y is caused by variable z, then we can disregard the variability in x and y associated with z and see whether x and y are still correlated. If x and y then remain correlated, then it is unlikely that z will cause the relationship between x and y. If x and y are no longer correlated, then z is probably the cause of the relationship between x and y.

Other correlational measures

The Pearson r correlation can be used when both variable x and variable y are on the interval or ratio scale.

  • When one or both variables is / are on the ordinal scale, the 'Spearman rank-order correlation' is used as a correlational measure.

  • Sometimes the x and y variables are dichotomous. This means that there are only two possible answers per variable: An example is that variable x stands for gender and variable y for whether or not someone has obtained his or her driver's license. If there are two dichotomous variables, then the phi coefficient is used as a correlational measure.

  • If one of the variables is dichotomous and the other is on the interval or ratio scale, the point biserial correlation can be used as a correlational measure. For example, when we look at the relationship between gender and virginity, then a phi coefficient should be used, but if we look at the relationship between gender and height, then the point biserial correlation is used.

It is important to realize that we do not always use correlation coefficients when analyzing our data. Often other types of statistics are used. These will be dealt with later.

Correlational Techniques - Chapter 8

What are the purposes of correlational techniques?

It is not only essential to scientists to investigate whether variables are correlated. In addition, it is also important to them to study how and why certain variables are correlated. They can determine this using four different techniques, which we will discuss in this chapter. These techniques ensure (1) that we can design formulas that describe how variables are related. These formulas also ensure that we can make predictions and (2) that we can find out the direction of causality between two or more correlated variables, (3) that we can measure the relationship between variables based on different levels of analysis, and lastly, (4) ) that we can determine dimensions that underlie a number of correlations . These dimensions can be determined by means of factor analysis.

Predicting behavior

Regression analysis is a method that is applied only after it appears that certain variables are correlated. By means of regression analysis we can design regression equations , with which we can predict scores. If it is known that someone has a certain x-score , then we can enter this score in the formula and predict the score on the y-variable. For example, we can design a regression formula that predicts the academic achievement of students based on their average grade in high school . It is important to remember that correlation can only consider linear relationships: Linear relationships are characterized by straight lines when you put them into a graph. A linear relationship can for example be drawn by a point cloud that seems to be heading in a certain direction. Scientists are trying to design a regression equation for that line. When they have computed this regression equation , they can predict scores fairly accurately.

  • A regression formula looks like this: y = β0 + β1x. In this formula , β0 is the slope coefficient and β1 is the intercept. We also call β0 the constant ('regression constant' / 'beta zero'). β1 is also referred to as the regression coefficient (slope). 

  • We call the variable that we want to predict the dependent variable, criterion variable or the outcome variable. This is the y variable.

  • The x-variable is also called the predictor variable.

What does multiple regression analysis look like?

Sometimes scientists use multiple x-variables to predict the y-variable . Academic achievement can for example be predicted on the basis of the student's motivation as well as their average grade in high school. In general , the use of multiple x-variables increases the accuracy of the prediction: The more x variables that are used in a research, the more accurate the prediction will be.

If multiple x variables are used to predict the y variable, then researchers use another method; multiple regression analysis. There are three forms of multiple regression analysis: (1) standard multiple regression, (2) step-by-step multiple regression and (3) hierarchical multiple regression. These three forms differ from each other considering how the predictive variables (x variables) are added to the regression equation when the comparison is made. 

1. Standard multiple regression

With standard multiple regression (also referred to as 'stimultaneous multiple regression'), all x variables are entered in the regression analysis simultaneously. For example, we can predict academic achievement on the basis of three predictors: motivation, inquisitiveness, and the average grade in high school . The formula could for example be y (work result) = -2.80 + .17 (motivation) + 1.30 (inquisitiveness) + .90 (numerical average). 

2. Step-by-step multiple regression

With step-by-step multiple regression ('stepwise multiple regression') , not all predictors are entered in the regression analysis at once: However, in this case the predictors are entered one by one . In the first step, the predictor that is probably the strongest predictor of the y-variable is entered. So , this is the x variable that correlates most with the y variable. During the second step, the x variable is added, which adds the second most to the y variable prediction . The variable entered during the second step explains the most variance in y after the the variance explained by the first predictor. The predictor that is entered during the second step does not necessarily have to be the variable that correlates second best with the y-variable.

A step-by-step regression analysis adds x variables to the equation based on their ability to predict the unique variance of the y variable. When predicting academic performance during a step-by-step multiple regression analysis, the numerical average can be entered as a predictor variable first , then motivation and, finally, inquisitiveness. Motivation would be introduced as a second predictor if it can predict the most unique variance in academic achievement after the first variable . This is not the case, for example, if the variance predicted by motivation is completely ' swallowed up ' by the figure average. Then motivation adds little to the unique variance, so it would be better to introduce inquisitiveness during the second step . The step-by-step introduction of predictors comes to an end when (1) all predictor variables that appear to make a unique contribution to predicting the outcome variable have been used and / or (2) when the other predictor variables are no longer able to predict a unique variance.

3. Hierarchical multiple regression

With hierarchical multiple regression, the order in which the predictors are entered is determined by the researcher. The researcher bases his or her choices on predefined hypotheses.

So , new variables are always added to the regression analysis. However , the unique contribution of each variable can be determined. This ' unique contribution ' can be calculated by statistically removing the effect of all other variables (which have been added previously).

By hierarchical multiple regression (1) disrupting variables (' confounding variables ') can be removed and (2) hypotheses about mediating variables (' mediation variables ') can be tested. Disturbing variables and mediating variables can inform us about, among other things , that we cannot conclude that there is a causation , even if there is a perfect correlation. These two types of variables can have an influence on the outcome variable without the researcher being aware of it. Disturbing variables (' confounding variables ') often occur together. As a result, their effects are often difficult to distinguish . However, we can test their effects. We can also test the effect of mediators by means of the hierarchical multiple regression analysis. A mediator variable is a variable that influences the relationship between x and y. It may therefore appear that x and y directly correlate to a high degree , while this correlation is due to a shared underlying variable; the mediator variable. If the relationship between x and y is influenced by a third variable ( called z), then this variable is a mediator. 

When do multiple correlations occur?

Scientists do not only want to design a regression formula in order to predict people's y-scores ; they also want to know how well a predictor variable (x) predicts the outcome variable (y) . Therefore, researchers aim to describe the usability (or reliability) of their regression formula . When there are multiple predictors (predictors, or x -variables) , researchers calculate the multiple correlation coefficient (R).

R describes the relationship between y and a set of predictors . The multiple correlation coefficient is always a value between .00 and 1.00 . The greater R, the better the regression formula is at predicting the outcome variable. The multiple correlation coefficient can be squared . The procedure works in the same way as when we square the Pearson correlation (r). When R is squared, we know what the proportion of variance is in the outcome variable (y) that can be explained by this particular set of predictors. For example , curiosity, motivation and the average grade in high school can together explain 40% of the variance in academic achievement. This means that 60% of the variance in academic performance can be explained by other variables that are not included in this study.

How can the direction of the correlation be determined?

As stated earlier , we cannot make statements about causality based on only the presence of a correlation. However, the direction of the relationship can be determined, although this is never results in complete certainty. With direction we can determine whether y is likely to be explained by x, or whether x is likely to be explained by y. There are two techniques with which we can say a lot about the direction of the relationship between two variables : (1) 'cross-lagged panel design' and (2) 'structural equations modeling' .

1. Cross-lagged panel design

With the cross-lagged panel design , the correlation between x and y is calculated twice; at two different points in time. Thereafter different combinations can be made of which the correlations can be calculated. The correlation between x on the first measurement and y on the second measurement can thus be calculated. The correlation between x on the second measurement and y on the first measurement can also be calculated. If variable x causes variable y, then the correlation between x on the first measurement and y on the second measurement should be greater than the correlation between y on the first measurement and x on the second moment. This is because the relationship between a cause (in this case x) and the effect (y) should be stronger when the causal variable is measured before the effect takes place, instead of afterwards.

2. Structural equations modeling

A better but more complicated way to test causal hypotheses on the basis of correlational studies is ' structural equations modeling '. Some causal explanations for the relationships between variables are more logical than others. Suppose we try to understand the relationship between variable x, y and z. For example, we can predict that variable x causes variable y, and variable y causes variable z . Research must show that x and y are correlated, but also that the relationship between x and y is stronger than the relationship between x and z. If this is not the case, then we can conclude that the hypothesis is incorrect. To perform structural equations modeling, the researcher must first make precise predictions (also known as models) . As the example shows, this prediction consists of the question of how three or more variables are causally related to each other. Each prediction gives an idea of ​​how the variables may be related to each other. Take the following two hypotheses, for example: Hypothesis A states that x causes y and y causes z . Hypothesis B states that x causes z, and z causes y. Because they differ from each other in order, we expect that these hypotheses each have a different pattern of correlations.

Structural equations modeling allows us to compare the correlation matrix that is expected on the basis of a hypothesis (model) with the real correlation matrix. If the expected and the actual correlations are similar, we have found evidence for the hypothesis. Structural equations analysis always provides a fit index. This shows to what extent a hypothesis model fits the data that is found. By comparing the fit indexes of all predicted models, we can determine whether some models fit better with the data than others do. Structural equations modeling can be very complex. This is especially the case when we do research that includes many different variables. When only one measurement is done for each construct in the analysis, the structural comparisons are also called 'path analysis' . If multiple measurements are used for each construct, this is also referred to as 'latent variable modeling' . This term is used because it is thought that all measurements measure an underlying (latent) variable. Through structural equation modeling we can get a picture of the plausibility of causal hypotheses.

What is meant by ' nested data '?

Much of the research data in the behavioral sciences are nested . What the term 'nested' means exactly, can be explained best with the following example: When a researcher is interested in which variables affect primary school performance, he or she can select a number of schools. Then, he or she can select a number of classes within these schools. Finally, the researcher can select a number of children within these classes. We then say that the classes in the school, and within it the children in those classes, are nested. A nested data structure has its advantages and disadvantages. A major disadvantage is that with nested designs, the responses of participants are not independent of each other. For example, children who are in the same class have the same teacher, use the same teaching materials and are also directly influenced by each other. For statistical analyzes, however, it is important that the scores of participants are independent of each other. Fortunately, 'multilevel modeling' is a good way to deal with the fact that responses are not independent of each other. With multilevel modeling, different levels are distinguished first. Next, we look at which factors influence these levels. The method of multilevel modeling makes it possible to investigate the relationship between the variables over the entire research design. All ' levels ' in the research design can therefore be compared. In the example of the study of school performance at primary schools, the variables within the schools can be compared with those within the classes. Multilevel modeling is also called " multilevel random coefficient analysis " or " hierarchical linear modeling ."

How can underlying dimensions be discovered?

Factor analysis is a collective name for a number of statistical techniques that is used to analyze the relationships between a large number of variables. The purpose of factor analysis is to identify underlying dimensions or factors : These explain the relationships found between the variables.

If we analyze the correlations between many variables, we see that some variables are mainly correlated with each other, while other variables are mainly correlated with other variables. This is probably due to the fact that the highly correlated variables measure the same general construct, while low correlated variables measure different constructs. The consequence of this is that it is more likely to determine an underlying dimension (factor) when the correlations between certain variables are high. Factor analysis is therefore used to determine underlying factors (latent variables) that can explain the correlations found between multiple variables.

What is factor analysis?

When there are few variables, factors can be determined relatively easily. However, if there are many variables, it is better to use complex mathematical calculations. These are not discussed here, but it is useful to know the underlying idea behind the calculation . Factor analysis determines the minimum number of factors needed to explain the relationship between the many variables. If all variables are perfectly correlated (r = 1.00), then only one factor will result from this factor analysis. If the variables are not correlated at all (r = .00), then as many factors flow from the factor analysis as there are variables. A factor matrix can be used to see to what extent different variables load ( correlate with ) a factor. To what extent a variable loads on a factor is indicated by the factor's loading . If variables load at least .30 on a factor, then these variables are put together in a table. This makes it easier to look at what these variables have in common. This is advantageous, because in this way the factors can be interpreted and labeled.

The use of factor analysis

Factor analysis has three basic purposes:

  1. Determining the underlying structure of psychological constructs. This provides a framework for understanding the behavior.

  2. Reducing a large number of individual data to a smaller, clearer set of data. Making the data clearer makes it easier to perform statistical procedures on it. In addition, it removes unnecessary elements. Finally, analysis that is based on factors generally has greater power and is more reliable compared to measurements that are only based on individual items.

  3. Designing attitude and personality tests . When answers to questions in a questionnaire have to be translated and interpreted into a single score, we first have to know for sure that these items actually measure the same construct. In this case, factor analysis is used to find out which items belong to one factor. However, if such an analysis shows that the items go together with more than one factor, then it is not wise to use these answers to these items to make statements about how someone scores on a single construct (for example depression). After all, in this case there are several underlying variables.

Experimental Research - Chapter 9

What are the conditions for experimental research?

Descriptive and correlational research are important, but they do have a defect: They do not provide information about the causes of behavior, thoughts and emotions. When researchers are however interested in uncovering these cause-effect relationships, that is, into causality, they conduct experiments . A well-designed experiment meets three conditions: (1) the researcher must manipulate at least one of the independent variables to see what the consequential effect is on the behavior of the participant. In addition (2), the researcher must assign participants in the same way to different experimental conditions . Finally (3) the researcher must control external variables that could influence the behavior of the participants and preferably keep them constant.

How can the independent variable be manipulated?

In an experiment, a researcher manipulates one or more independent variables (the x- variables) to see how this affects the dependent variable (the y- variable). An independent variable has multiple levels. These are the different values the independent variable can take on. Levels are also called conditions . Sometimes conditions are the result of quantitative differences . This means that the independent variable varies per condition in terms of the amount or extent to which the subject is influenced. In other cases the conditions differ qualitatively . This means that participants in different conditions receive, for example, different instructions.

Multiple types of manipulation of the independent variable are possible:

  1. Environmental manipulations: in this case the physical or social environment of the participant is manipulated. In social, developmental and personality psychology, the conferates of the researcher are sometimes used: They take on the role of a test subject while in fact, they are not. In such an experiment, the influence of the presence of henchmen on the other test subjects is measured. This is a form of environmental manipulation.

  2. Instructional manipulations: in this case the instructions and / or information that the test subjects receive is manipulated. Instruction manipulations are designed to see how likely certain information or comments are to change people's thoughts, emotions or behavior.

  3. Invasive manipulations: in this case changes are made to the body of the participant. This can be done, for example, by having people take certain drugs to see how those influence their emotions or behavior.

Experimental groups and control groups

In some studies, an extra condition is added to the experiment in which the independent variable is not manipulated at all . Participants who are exposed to a certain amount of the independent variable are in an experimental group . Participants who are not exposed to the independent variable at all are in the control group. Researchers must make a choice themselves whether or not they want to use a control group in their research design. In most cases, a control group is chosen to determine the basic level ('baseline') of a behavior. The basic level of a behavior is the extent to which a particular behavior is present when the independent variable is not manipulated at all.

It often happens that the research hypotheses of a researcher are correct, but that the desired results are not found because the independent variable has not been manipulated well enough. If the independent variable is not present strong enough to find the predicted effects, then the research is already doomed to fail in advance. Scientists often test their experiment on a small number of test subjects first to see if the independent variable is manipulated well enough. In this case we speak of a pilot test . In other words; a pilot test can be used to see whether the different conditions of the independent variable differ well enough from each other to have a significant influence on the behavior of the test subjects. In addition, scientists use manipulation checks during their research. A manipulation check is a question (or set of questions) that is / are designed to determine whether the independent variable has been successfully manipulated. However, manipulation checks are not always necessary and are sometimes not even possible . Yet it is important that researchers always consider whether it is useful to perform a manipulation check.

Sometimes scientists cannot manipulate certain variables, because these variables cannot be changed. Examples of these variables are gender, age and intelligence level. We also refer to this type of unchangeable variables as subject variables or test subject variables . Dependent variables are the variables that are influenced by the independent variables in a study. In most cases, the person's score on the dependent variables is observed, measured physically or obtained through self-reporting.

In which way are participants assigned to conditions?

It is important that participants are assigned to conditions in fair manner, so that participants who are very similar to each other do not systematically always end up together in the same condition.

There are three different ways to assign participants to conditions:

  • Simple random assignment : in this case, every participant has the same chance of being assigned to each of the conditions. This is possible, for example, by throwing a coin . Simple random assignment ensures that participants in the groups do not differ on average. However, there is a small chance that simple random assignment does not produce exactly equal groups. Simple random assignment is also called 'randomized groups design' .

  • Matched random assignment : with this method, the researcher first tests the participants on a variable that is relevant to the research. Subsequently, the scores of the participants are interpreted and participants who resemble each other are matched. This design is also called 'between-subjects design' . This creates a number of clusters. Then, all participants from a cluster are randomly assigned to the different conditions.

  • Repeated measurements ( 'repeated measures design' ) : with this method, the same test subjects are tested multiple times in all different experimental conditions. In this case, the researcher can use both simple random assignment and matched random assignment. This design is also called an 'within-subjects design' .

What are the pros and cons of repeated measurements?

The biggest advantage of the repeated measurements design (within-subjects design) is that the power is greater than in the case of between-subjects designs . Power refers to the extent to which a study can detect the effects of an independent variable accurately. When the power is great, the test subjects are the same in all conditions. After all, the same people participate in all conditions. As a result, there can be no question of differences in test subjects that could influence the results of the study. A second advantage is that fewer participants are needed for the research, because each participant participates in all different experimental conditions.
However, the design also has a number of disadvantages . The design suffers from order effects: the behavior of the participants is influenced by the order in which they are exposed to the conditions. A distinction is made between four types of order effects.

  1. Exercise ('practice effects'): participants perform better and better as they do more experiments, because they experience the same dependent variable multiple times. They can therefore 'practice' with the dependent variable.

  2. Fatigue ('fatigue effects'): participants often loose motivation or get tired as the research progresses. As a result, treatments can become less and less effective as the experiments progress.

  3. Sensitization: at some point, participants can guess the research hypotheses because they have already participated in several other conditions. Because they know what the researcher is looking for, they can behave unnatural.

  4. Transfer ('carryover effects'): the effect of a condition can 'linger': A variable can still have an effect when the subject is already in another condition. This is rather problematic because the researcher is no longer able to distinguish between the effects of the different conditions. To counteract this effect, researchers can use a method called 'counterbalancing' . With counterbalancing, all participants are exposed to the different conditions in different orders. A ' Latin Square design ' is an example of counterbalancing. For example, if a design has four conditions, then participants can be exposed to the conditions on the basis of four different sequences.

What is meant by experimental control?

Experimental control refers to the strategy of eliminating or keeping external factors constant that could possibly influence the outcome of an experiemnt. This is important because if there would be no control over external factors, it would not be clear whether the dependent variable is influenced by the independent variable (s) or by other factors.

What is systematic variance and when does it occur?

Systematic variance (also known as 'between-groups variance' ) is the part of the total variance that represents real differences that occur between experimental groups. The big question in a study is whether the distribution in the scores of participants is systematic; if the spread is caused by the independent variable. We should therefore find systematic differences between the scores that were obtained in different conditions. There is systematic variance when the scores between the conditions differ systematically from each other. Systematic variance can be the result of two sources: (1) the independent variable  ('treatment variance' / 'primary variance' ) or (2) external variables ( 'confound variance' / 'secondary variance' ). If nothing else but the independent variable has influenced the scores of the participants, then we only have treatment variance. However, this is rarely the case. There are also two other sources of variance: (1) variance from external variables and (2) error variance.

Subjects can differ from each other in several areas. When they do not solely differ from each other through the influence of the independent variable, external variables are also involved. External variables influence the research results, while that is not the intention of the researcher. It causes unreliability. The proportion of variance explained by external variables is called "confound variance." Fortunately, it is impossible for researchers to distinguish treatment variance from confound variance. It is necessary to eliminate the confound variance as much as possible.

What is error variance and when does it occur?

Error variance is also called 'within-groups variance' . It is the result of unsystematic differences between test subjects. This can involve differences in personality, mood and capacities. But it can also be caused by differences in interaction with the researcher. The researcher can (often unintentionally ) treat test subjects differently .

The presence of error variance is less problematic than confound variance . That is because we are able to make a statistical distinction between systematic variance and error variance; which makes it easier for us to eliminate error variance. Error variance therefore has less influence on the experiment than confound variance. However, the fact remains that the more error variance there is, the harder it is to detect the true effects of the independent variables. The error variance therefore reduces the power of the experiment. That is why we want to keep the error variance in a study as low as possible.

Variance summarized

In summary, you can subdivide the total variance into systematic variance and error variance: total variance = systematic variance + error variance.
The systematic variance can then be subdivided into treatment variance and confound variance: systematic variance = treatment variance + confound variance.

Hereby it is important to eliminate the confound variance , to keep the treatment variance as high as possible, and to keep the error variance as low as possible.

What is internal validity and what role does it play?

Internal validity is about the extent to which a researcher draws the right conclusions about the effects of the independent variable. An experiment has good internal validity when all sources of confound variance are eliminated. Internal validity is often the result of good experimental control. Experimental control ensures that the independent variable is the only variable that differs between the different conditions. If participants in different groups systematically differ from each other on more than just the independent variable, we speak of 'confounding' variables. It is very important to prevent and eliminate confounding variables.

Dangers of internal validity

The internal validity of an experiment can be threatened by several factors:

  • Incorrect allocation ( 'biased assignment' ): this is the case when participants are not distributed over the conditions in a random way. In this case, there are systematic differences between the groups. The resulting scores may be influenced by the differences between the groups, rather than being due to the independent variable. The groups then differ in more areas than just the independent variables.

  • Loss ( 'differential attrition' ): Loss ( 'attrition' ) is the loss of participants during the study. This occurs when participants no longer want or are no longer able to participate in the study. When loss occurs randomly and affects all experimental conditions to the same extent, it is not very dangerous for the internal validity of the study. However, when more participants drop out in one condition than in other conditions ( 'differential attrition' ) the internal validity of the research is at risk.

  • Pre-test sensitization: In some experiments, participants are tested in advance. This gives the researcher an idea of ​​the behavior before the manipulation of the independent variable has taken place. It can also help with the fair distribution of test subjects to conditions. A disadvantage of testing the test subjects in advance is that they might react differently to the independent variable. They become more sensitive to this variable. When test subjects react differently to the manipulation of the independent variable as a result of a pre-test , we speak of pre-test sensitization. 

  • History: the results of some investigations are influenced by external events that occur outside the investigation setting. In this case, the results found are not the result of the independent variable, but rather an interaction between the independent variable and history effects.

  • Development ( 'maturation' ): if an experiment takes a long time, the development ( 'maturation' ) of the participants may be the cause of the dependent variable instead of the independent variable.

  • Design errors ( 'miscellaneos design confounds' ): these are errors that occur in the research design itself. For example, a researcher may interact with different participants in a different way. It is very important that a researcher always avoids design errors. Design errors are always caused by the researcher, which makes them easy to eliminate. It is therefore important that the researcher pays close attention to this.

Expectations

The internal validity of an investigation is also determined by the expectations of the test subjects and researcher with regard to what should happen in the experiment. A distinction is made between three problems:

  1. Expectations of the researcher ( 'experimenter expectancy effects' ): Researchers often already have an idea of how the test subjects will probably behave in the experiment. The expectations of a researcher are based on the research hypotheses. Expectations of researchers can cause them to misinterpret the research results. Moreover, the researchers can unconsciously and unintentionally influence the behaviour of the test subjects.

  2. Expectations of test subjects ( 'demand characteristics' ): Test subjects can suspect what the researcher expects of them. Consequently, they sometimes start to behave according to these expectations, because they want to be good test subjects. As a result, they no longer behave in a natural way, and a bias is created in the research.

By using a double-blind procedure , both the influence of the expectations of the researcher and the influence of the expectations of the test subjects can be eliminated. In a double-blind procedure, both the researchers and the test subjects do not know which test subjects are assigned to which condition. As a result, both have no expectations.

  1. Placebo effects: A placebo effect is a physiological or psychological change that occurs due to the suggestion or belief that change will occur. This change takes place because the subject expects there will be a change. For example, a medicine can heal without it containing an active substance. This is because patients think the drug helps and expect to be healed . When a placebo effect is possible, researchers often use a 'placebo control group'. Participants in this group receive treatment that actually does not work. The effects of a placebo control group can be determined by adding a control group. The control group receives no pill at all, while the placebo control group receives a pill that has no effect. If it appears that participants in the placebo control group perform better (for example, heal faster) than participants in the real control group, then there is apparently a placebo effect. 

What are the causes of error variance?

Error variance can have five causes:

  1. Individual differences: Often, there are individual differences between test subjects that already exist before they participate in the study. This can cause the participants in the experimental conditions to behave in different ways to the independent variable. Nothing can be done to eliminate these differences, but the fewer individual differences there are, the less error variance there will be. Researchers should therefore aim to use groups of test subjects that are as homogeneous as possible.

  2. Temporary moods (transient states): This refers to factors that can always change; for example the attitude and mood of the participant. The only thing researchers can do is try to ensure that there is as little room as possible for this. For example, researchers should always try to be friendly to all participants.

  3. Environmental factors : An example of an environmental factor is the presence of distracting sounds in the research setting. It is important to perform all experiments in the same situation. Researchers try to keep the environment in which the experiment is conducted as constant as possible.

  4. Different ways of treatment ( 'differential treatment' ) : This is the case when a researcher deals with test subjects in different ways. For example, a researcher may be very nice towards happy and spontaneous participants, while he or she is ruder towards other participants. A solution to this problem can be to automate the investigation as much as possible. As a result, the subject has less direct contact with the researcher. As a result, there are fewer moments at which the researcher could possibly influence the test results.

  5. Measurement errors: Each test contains measurement errors that contribute to the error variance. To reduce measurement errors, it is important to use reliable measuring instruments.

The dilemma of the researcher

External validity is about the extent to which the results of a study can be generalized to other samples. Internal validity, on the other hand, is about the certainty with which a researcher can state that the dependent variable is influenced by the independent variable and not by other variables. Often , a high degree of internal validity goes together with a low degree of external validity, and vice versa. The discrepancy between internal and external validity is also referred to as the experimenter's dilemma . The more a researcher wants to have control over the experiment, the more the internal validity the experiment will have. But by increasing control, the research becomes less natural and often less generalizable. The external validity will then decrease. An increase in internal validity is therefore at the expense of external validity. However, researchers attach more value to internal validity. After all, it is more important to be sure of the results of the research ( internal validity ) than to be able to generalize the research ( external validity ): Imagine that if you are not sure about the results of your research , it is of no use to generalize them.

In addition, experiments are rarely desgined to be able to generalize. The purpose of experimental research is not to be able to generalize to the 'real world', but to test generalizations from the real world (by means of hypotheses). Remember that the results of one experiment can never be generalized, no matter how good the research is. The results of each study are too much related to the context in which the study was conducted.

What are the advantages and disadvantages of using the internet?

Advantages of the Internet

Many scientists use the Internet to conduct research. That is why attempts are being made to increase the validity of web-based research . Conducting research via the Internet has both advantages and disadvantages. The benefits of this type of research are described below:

  • By using the Internet, researchers can use much larger samples.

  • Research via the Internet costs researchers less time and money.

  • The samples that are compiled on the Internet are often more diverse than samples that are compiled in a different way.

  • Researchers who search for participants on the Internet often find it easy to find participants with specific characteristics.

  • Because participants on the Internet are often anonymous , their reactions are less influenced by social desirability. There is also fewer experimental expectancy effects.

Disadvantages of the Internet

The disadvantages of using Internet surveys are described below:

  • Researchers often find it difficult to control a sample that has been compiled via the Internet.

  • The environments of the test subjects can differ greatly. For example, they can all fill in the questions at a different location, resulting in the problem that one test subject may be more easily distracted than another. This influences the research results.

  • Participants on the Internet often do not complete a study.

  • Internet surveys can only be used if test subjects have to fill out a survey or respond to written stimuli. It is often not possible, for example, to test the effects of certain drugs via the Internet. It is also not possible to conduct experiments with multiple sessions. Finally, it is not possible for a researcher to have face-to-face contact with test subjects .

Experimental Designs - Chapter 10

What are experimental designs and what levels do they have?

Experimental designs in which only one independent variable is manipulated are called one-way designs . The simplest form of a one-way design is the two-group experimental design ; in which there are only two levels of the independent variable. In other words; there are only two conditions. For an experiment we need at least two conditions, because otherwise we cannot make any comparisons between the responses of participants in the different conditions. Only then can we determine to what extent the differences in the level of the independent variable have led to differences in the behavior of test subjects . Scientists usually use more than two levels.

One-way designs

There are three types of one-way designs: (1) randomized group design, (2) matched subjects design and (3) repeated measurement design.

  1. Randomized group design ('randomized groups design' ): in this case participants are randomly assigned to two or more conditions.

  2. Matched subject design ('matched subjects design'): participants are first matched in so-called clusters. The matched pairs are classified based on a variable that the researcher believes is relevant. Then all participants in each cluster are randomly assigned to one of the experimental conditions or the control condition.

  3. Repeated measurement design ('repeated measures design' / 'within-subjects design' ): this design assigns a participant to all experimental conditions.

What is the difference between pre-test and post-test measurements?

All the aforementioned one-way designs fall under the 'post-test-only design' category. With a post-test-only design, the dependent variable is only measured after the experimental manipulation. However, it sometimes happens that the dependent variable is measured twice: once before and once after manipulation of the independent variable. In this case we also speak of a 'pre-test - post-test design'. Each of the three designs discussed above can be converted into a pre-test post design by adding a pre-measurement.

A pretest-post-test design has three advantages over a post-test-only design:

  • The pre-test allows the researcher to determine whether the test subjects already differed on the dependent variable before the experiment began. He or she can see whether the test subjects are divided over the conditions in a fair way.

  • In addition, the researcher can determine for each individual participant how much change the independent variable has caused, because their scores from before and after the experimental manipulation can be compared. Therefore, a baseline can be established by the pre-measurement. However, in the case of post-test-only designs, a baseline can also be established, namely through the use of a control condition.

  • Finally, pretest-post test designs have more power. Therefore, they are better at detecting the effects of the independent variables on the dependent variable than the post-test-only design . This is because the variability of the scores can already be removed during the pre-measurement. This reduces the error variance in the study. Remember: the smaller the error variance, the greater the power.

A disadvantage of pretest-post test measurements is that the use of a pre-test can lead to 'pretest sensitization' : With this we mean that the use of a pre-test can cause people to react differently to the independent variable. They become more receptive or sensitive to the dependent variable the second time they encounter it. Even when there is no sensitization , the pre-test can influence the responses of test subjects . The test subjects receive a ' hint ' about the purpose of the research or the expectations of the researcher. The bias of characteristics can then occur here; Subjects start to behave according to the expectations of the researcher.

Pretest-post-test designs are useful, but not necessary. A post-test-only design also provides sufficient information about the changes in responses after the manipulation of the independent variables.

What do factorial designs look like?

People's expectations can influence their reactions : Consider, for example, the placebo effect.

Another example is that people expect expensive products to be of better quality. If we want to investigate this, we have to deal with two factors: (1) the price of the product and (2) whether people have an expectation with regard to the quality of the product. We cannot use a one-way design to test this hypothesis, because we have two independent variables, namely (1) price and (2) expectation. A design in which two or more independent variables are manipulated is called a factorial design .

The independent variables in a factorial design are called factors . In this study , for example, two levels of the factor price can be made ( discount and the full price ) and two levels of the factor expectation ( low and high expectation ). In total , there are therefore four conditions (two per factor) .

A one-way design always includes only one independent variable, a two-way (factorial) design has two independent variables, a three-way (factorial) design has three independent variables, and so on. A number of examples are discussed below:

  • If a 2x2 factorial design is used, this means that there are two factors with two levels per variable. So in total there are four conditions.

  • A 2x3 factorial design has two independent variables, but each variable has three levels.

  • A 2x2x4 factorial design also has two independent variables. One has two levels, and the other four.

  • A 2x2x3x3 factorial design has four factors, two have two levels and two have three levels. There are a total of 2x2x3x3 = 36 conditions.

In which way are test subjects assigned to conditions?

Just as in the case of one-way designs, factorial designs can also take on a number of different forms:

  • ' Randomized groups factorial design' (also known as 'completely randomized factorial design' ) : participants are randomly distributed over the various conditions.

  • ' Matched factorial design' : participants are first divided over different clusers. Each cluster contains as many participants as there are conditions. There are six experimental conditions in a 3x2 factorial design. So in this case, the clusters will consist of six test subjects. Just as in the case of one-way designs, a match (cluster) consists of test subjects who match with a variable that, according to the researcher, influences the dependent variable. If the clusters are classified, the test subjects within each cluster are randomly distributed over the conditions. The aim of matched factorial design is to distribute the test subjects over the conditions as fairly and precisely as possible .

  • ' Repeated measures factorial design " : in this case, each participant is assigned to all conditions. Small factorial designs (such as a 2x2 design) can be used in the case of repeated measures factorial design, since there are not as many conditions. With larger factorial designs, however, it is not convenient to use the repeated measures design. Imagine that for example, in the case of a 2x2x2x4 design , each test subject should participate in 32 conditions. In such a case, sequence effects could become a major problem.

  • ' Mixed factorial design' : because factorial designs consist of more than one factor, it is possible to combine characteristics of a randomized group design and repeated measurements. A design in which one or more inter-subject variables are combined with one or more in-subject variables is called a 'mixed factorial design' ; 'between-within design' or a 'split-plot factorial design' .

How do researchers determine main effects and interactions?

The major advantage of factorial designs is that they do not only provide us with information about the individual effects of each independent variable, but also tell us something about the combined effects of all independent variables together. The latter is not possible in the case of a one-way design. In the case of a one-way design we can divide the total variance in the responses of participants into systematic variance and error variance. With a factorial design, however, we can also investigate whether the spread (variance) in the scores is due to (1) the effects of the independent variable only, (2) the combined effects of the independent variables or (3) due to error variance. Factorial desings therefore give us a more complete picture of the influence of multiple variables, that influence the behavior of the test subjects together.

Main Effects

The effect of an individual factor is called a main effect in a factorial design . A main effect shows the influence of one factor on the dependent variable . The effects of the other factors are therefore not taken into account. A factorial design has as many main effects as there are factors . After all, each factor has one main effect. So in a 3x2x2x2 design there are four main effects . 

Interaction

In addition to the main effects of both independent variables, it is also possible to calculate the interaction effect between the variables . An interaction is present when the effect of one variable differs on the various levels of the other independent variable (s) .

A factor can have a different effect on one level of an independent variable than on another level of that independent variable . If this is the case, then we say that the variables interact . Suppose we have a factual design with two factors (A and B) . If the effect of A is different at a certain level of B, then we speak of an interaction. If A has the same effect on the scores of participants , regardless of the level of factor B, then there is no interaction between the two factors .

What are 'higher order designs'?

Until now, we have talked mainly about factorial designs with only two factors. However, it may also be the case that an experiment contains more than two factors. But the more factors there are, the more complex the research becomes and the harder it is to interpret the results. A two-way factorial design with two factors gives three forms of information: (1) the main effect of A (without B), (2) the main effect of B (without A) and (3) the interaction between A and B .

If we opt for a three-way factorial design with three factors (A, B and C), then we can calculate the main effects for A, B and C. In addition, we can also calculate three more types of interactions. This involves the interaction between A and B ( without C ), the interaction between A and C ( without B ) and the interaction between B and C ( without A ). Finally, a three-way factorial design provides information about the combined effect of all three factors together: A, B and C .

In practice, researchers rarely use more than three factors . When you use more than three factors, so many different conclusions can be drawn causing the research results to become very unclear and therefore difficult to interpret. It is therefore difficult to draw useful conclusions from the study when there are too many factors .

How can variables be combined?

It has been known for a long time that behavior is the result of both situational factors and personality traits . To understand behavior both factors must be considered. Personality traits are also called subject variables . Examples are age, gender, intelligence, personality and attitudes . Personality factors have a specific effect on behavior in combination with situational factors. After all, not everyone responds to a situation in the same way . Sometimes researchers design experiments to investigate the combined effects of situational factors and subject variables. In such a case, environmental factors are manipulated by the researcher and the subject variables are measured and not manipulated . There is not really a specific name for such a research, but the author of the book calls it an 'expericorr factorial design' or 'mixed design' . The name 'expericorr' is a combination of experiment and correlation . According to the author, such a design has the characteristics of an experimental study, but also of a correlational study. The experimental side is that the situational factors are manipulated . The correlational part is because the relationship between the situational factors and the subject variables is measured.

An expericorr factual design is used for three reasons:

  • First, researchers that use such a design want to analyze the generality of the effect of the independent variable. Subjects with different personalities can respond differently to the same situation. That is why the manipulation of certain independent variables only affects test subjects with certain characteristics . With an experimental factorial design, researchers can find out whether the effects of an independent variable occur among all participants or only among participants with certain personality traits. 

  • In addition, researchers use an experimental factorial design when they want to know with how much certainty we can state that certain personality traits lead to certain behavior in different situations. Analyzing how people with a specific characteristic react to conditions does not necessarily say something about the manipulation of the independent variable, but something about the personal characteristics (subject variables) of the test subjects. It therefore does not provide information about the independent variable, but it does provide information about the personal characteristics of the participant.

  • By splitting participants into groups based on a subject variable, researchers can ensure that participants are more similar within the conditions. The condition then becomes more homogeneous . This causes the error variance to decrease. As discussed earlier, the power increases when the error variance decreases. That is also the case here . The research becomes more sensitive to determining the effects of the manipulation of the independent variable.

Create groups

When researchers use an experimental factorial design, they often design groups to which test subjects with the same subject variable (for example, gender) are assigned . The test subjects within the groups are then randomly distributed over the levels of the independent variable. Sometimes scientists are interested in subject variables that can play a mediating role.

In that case they can use the 'median split procedure' and the 'extreme groups procedure'.

  • ' Median split procedure' : in this case the researcher determines the median of the scores in a data set. As we have explained earlier on, the median is the middle score of the data set. This is the score that exactly falls on the 50th percentile. This distribution is about the scores on the subject variable in which the researcher is interested. The researcher will label the participants' scores below the median as low and the scores above them as high . A variant of the median split procedure is to split the data set into three or more groups, instead of two (as with the 'normal' median split procedure) .

  • ' Extreme groups procedure': here the participants are not divided into two groups ('high' or 'low') on the basis of the median, but the researcher does a pre-measurement with as many participants as possible . The researcher then selects participants who score extremely high or extremely low on the subject variable in which the researcher is interested.

Criticism

There is a lot of criticism on both the extreme groups procedure and the median split procedure. That is why both procedures are rarely used . The most important point of criticism is that the subjects are divided into two groups in both procedures . So; there is no consideration of the spread in the scores at all . Very valuable information is being thrown away.

Research also shows that splitting people into two groups can lead to incorrect research results (bias).

It is important that researchers make decisions based on an experimental factorial design . The factors can be manipulated. That is why we can say that the factors caused certain outcomes on the dependent variable . In that case it is normal to talk about causality. Because we cannot properly manipulate subject variables, we cannot say that subject variables have caused certain outcomes .

If there is an interaction between a factor and a subject variable, then we can say that both groups reacted differently to the factor. However, we cannot say that being a member of one group (for example, being male or female) has caused participants to respond differently to the levels of the factor. We can only say that the subject variable is a moderator variable . A moderator variable influences how people respond to the independent variable.

Analysis of Experimental Data - Chapter 11

The role of systematic variance in the process of analysis

Scientists must perform statistical analysis to answer their research questions . After performing the experiment, the investigator must determine whether the manipulated independent variable has had the predicted effect on the dependent variable . In general we determine this by considering whether the total variance in the data contains systematic variance as a result of the manipulation. If systematic variance is present, the means of the different experimental groups are different. Therefore, we can determine whether there is systematic variance by comparing the means of the experimental groups. Differences in means are caused by the manipulation of the independent variable that has caused the groups to score differently. When the groups score differently, there is systematic variance. After all, the manipulation of the independent variables had an effect on the scores on the dependent variable. However, if there is no difference in the group means, then there is no systematic variance and the independent variable has had no effect.

The role of error variance in the process of analysis

However, the mean scores of the conditions may differ from each other while the independent variables in fact have no effect on the dependent variable . Therefore, just looking at the mean scores of the groups is not a sufficient method to determine whether there is systematic variance . Differences in mean scores can arise, for example, because the two variables (x and y) vary together or differ together on another variable (z) (a 'confounding variable ') . As a result, it may appear that the two means are different, and therefore there is systematic variance, while in reality this may not be the case . A confounding variable may also make it seem as if the two means do not differ, leading us to suspect that there is no systematic variance, while in fact this is the case.

Even if there is no confounding variable, differences between the mean scores of the different conditions are naturally often still present. It is almost impossible for people in different conditions to have exactly the same mean score. But it can happen as a coincidence. So , differences in scores may not always have to be caused by the manipulation of the independent variable. Small differences in group means are the result of error variance. Error variance is the result of extra variables that influence the behavior of participants, such as mood and personality traits. In short, the group averages always differ to a certain extent . How can we find out whether the difference in group means is the result of the independent variable or error variance? We can figure this out using inferential statistics .

What is the function of inferential statistics?

There is a way to find out whether the difference in group means is the result of error variance or systematic variance . We do this by means of inferential statistics. This method assumes that the independent variable has had an effect; if the difference between the mean scores of the conditions is greater than we would expect based on error variance alone . We therefore compare the group means that we found with the group means that we expected to find if there were only error variances. Unfortunately , this method does not guarantee complete certainty when drawing conclusions . We can only determine the probability that the differences in group averages are the result of error variance.

How does testing hypotheses work?

Scientists try to test their research hypotheses by analyzing the different group averages. First they formulate a null hypothesis . This hypothesis states that the independent variable has had no effect on the dependent variable . The experimental hypothesis is often formulated in the exact opposite way . This states that the independent variable does have an effect on the dependent variable. The experimental hypothesis can indicate a direction ( "directional" ) or not ( "nondirectional " ). A directional experimental hypothesis (with a direction) is called one-sided . This is because the researcher already indicates whether he expects the independent variable to cause an increase or decrease in the dependent variable. When a researcher has no expectations about the direction of an effect, he or she performs a two-way test. In this case, the researcher does not indicate any direction in his formulation of the hypothesis . This is therefore a non-directional hypothesis. Based on statistical analyzes, the null hypothesis can be rejected (' rejecting the null hypotheses' ) or retained (' failing to reject the null hypothesis' ) .

Rejecting the null hypothesis means that the independent variable has had an effect . Rejecting the null hypothesis means that you state that the null hypothesis is not true . The null hypothesis states that there is no difference between the means. By rejecting the null hypothesis, you indicate that there is a difference between the means . The independent variable has therefore had an effect, and there is systematic variance. When rejecting the null hypothesis, the difference in group means is greater than what we would expect based on the error variance alone. If the null hypothesis is retained, this means that the independent variable has had no effect on the dependent variable. In this case, differences in group means are not the result of the independent variable, but of error variance. The group means then do not differ more than we would expect based on the error variance alone. 

Type I and type II errors

When the research data is statistically analyzed, four options are possible :

  1. Correct decision: the null hypothesis is incorrect, and the researcher rejects it .

  2. Correct decision: the null hypothesis is correct, and the researcher retains it .

  3. Type I error: the null hypothesis is correct, but the researcher rejects it. The researcher therefore incorrectly believes that the independent variable has had an effect . The chance of making a type I error is called the alpha level . In most cases, researchers use an alpha level of 5 %. This means that they reject the null hypothesis when there is a 5 % chance that the differences found between the group means are the result of error variance . This way there is only a  5% chance that they are wrong . Sometimes scientists use a stricter alpha level, namely an alpha of 1 %. They then only have 1 % chance of making a type I error .
    The difference between the group averages is considered statistically significant when we reject the null hypothesis with a low chance of a type I error. A statistically significant result is a result of which we know that there is only a small chance (often less than or equal to 5%) that it is the result of error variance.

  4. Type II error: the null hypothesis is incorrect, but the researcher retains it. The researcher therefore argues that the independent variable had no effect, whereas in reality it did have an effect . The probability of a type II error is called beta . Unreliable measurements of the dependent variable increase the beta. Effects that do exist in reality are yet not noticed when we do an unreliable measurement. This leads to a greater chance of a type II error, so a greater beta. Errors in the collection and encoding of responses, extremely heterogeneous samples and poor experimental control can also lead to a greater beta . To reduce the chance of a type II error, scientists try to design experiments that have a lot of power. Power is the chance that a research will reject the null hypothesis if the null hypothesis turns out to be incorrect. In other words ; power refers to the extent to which the effect of manipulation is detected by the independent variable. The more participants there are, the greater the chance that true effects will actually be noticed. To determine how many people are needed to determine true effects, a power analysis is performed. Researchers often want a minimum of .80 power. This means that the study has an 80 % chance of noticing an effect when it exists in reality. Researchers generally do not go for a .99 power because this requires an extremely large number of test subjects.

Effect size

If a researcher concludes that the independent variable has had an effect , then he or she often also wants to know how great the effect of the measured independent variable was. They determine this by calculating an effect size ( 'effect size' ) . The effect size is the proportion of variability of the dependent variable, which is caused by the manipulation of the independent variable. With a factorial design - that is, experiments with more than one independent variable - an effect size can be calculated for each independent variable that is tested. Effect sizes always lie between .00 and 1.00. An effect size of 1.00 states that 100 % of the variance in the dependent variable was explained by the independent variable. An effect size of .00, on the other hand, says that no independent variance in the dependent variable is explained by the independent variable. Two different measures of effect sizes are eta-squared and omega-squared. 

The t test

To test whether differences in group means are statistically significant, t and F tests can be used. For both the t and F tests, the group means that the researcher found are compared with the group means that the researcher would have expected if the results were only caused by error variance. In this part we will talk about the t-test. A t-test is used to analyze a two-group randomized group experiment.

Performing a t-test consists of five steps:

  • Step 1: Calculate the means of the two groups. We call these means ²1 and ²2.

  • Step 2: Calculate the standard error for the difference between the two means . Based on this number we can see which group means we should find if the data were only influenced by error variance. To find the standard error, we must first calculate the variances of the two experimental groups.
    The variance is calculated for each condition as follows: s² = Σ²i - [(Σi) ² / n] / n-1. Calculate this variance for each condition. In this case, it must be calculated twice, namely for condition 1 and condition 2. Then the pooled variance must be calculated: s²p = (n1-1) s²1 + (n2-1) s²2 / n1 + n2 -2. In this formula, n1 and n2 are the sample sizes of the conditions and s²1 and s²2 are the variances of the two samples. Finally, the square root must be drawn from the pooled variance. This leads to the pooled standard deviation: sp. After all, as discussed earlier, the square root of the variance (s²) is equal to the standard deviation (s). 

  • Step 3: Determine the value of t. The t-statistic can be calculated based on the calculated group means, the pooled standard deviation and the sample sizes. First calculate ²1 - ²2. Multiply the pooled standard deviation (sp) with the root from 1 / n1 + 1 / n2. Then divide ²1 - ²2 by the answer to this last sum. 

  • Step 4: Determine the critical value of t. To determine the critical value, two steps must be followed. First, the degrees of freedom for the t-test must be calculated. The degrees of freedom (df) are equal to the total number of participants minus two; (n1 + n2 -2). Then (2) the alpha level for the test must be determined. An alpha of .05 is often chosen. An alpha of .01 is also common.

  • Step 5: Determine whether the null hypothesis should be rejected. Do this by comparing the calculated t with the critical value of t. so compare the answer from step 3 with the answer from step 4. You can perform this comparison by using the t-table.

Paired t test

A paired t-test ( 'paired t-test' ) is used when there is a matched design or repeated measurements. The paired t-test assumes that the participants are similar in the two conditions.

With an authorized design, the researchers forms pairs of participants who resemble each other. With repeated measurements, the same participants participate in all the conditions. Therefore, matched scores in the two conditions should be positively correlated. This also applies to the repeated measurement design : participants who score high on one condition should also score high on the other condition. The paired t-test uses these positive correlations to calculate t. The error variance is automatically reduced in the paired t-test. This is due to the use of matched groups of test subjects. The subjects resemble each other in the authorized design, or are the exact same in the repeated measurements design . This leads to a test with more power: if the independent variable has a true effect, this should also be apparent from the test. The less error variance there is, the greater the power of the experiment. High power ensures that the pooled standard deviation (sp) becomes smaller. The reduction of the pooled standard deviation leads to a larger t value.

What are the pros and cons of computer analysis?

In the past, statistical analyzes were all performed by hand. Nowadays, computer programs have been developed that can test research hypotheses quickly and accurately. We must keep in mind that it is possible that data is entered incorrectly. Even an accurate statistical analysis cannot compensate for this. Researchers who enter data must therefore be very careful and accurate. This can therefore lead to the statistical analysis being a very time-consuming process. It is also true that nowadays almost everyone can perform all kinds of statistical analyzes with the computer. However, not everyone understands these statistical analyzes and the results that it yields ( which are provided by the computer program ). This can lead to incorrect interpretations and conclusions. Researchers should therefore only conduct studies that they themselves truly understand, so that they have sufficient insight into the conclusions from the results that statistical research with the computer yields.

Analysis of Complex Designs - Chapter 12

What is a Type I Error?

If we want to study more than two groups, we would have to perform many separate t-tests. The problem with this is that the chance of a type I error increases when we perform many t-tests. A type I error is the chance that we reject the null hypothesis while in reality it is valid. If we perform one t-test, the chance of a type I error is 5 % , but the more we tests we perform, the higher this percentage becomes. Researchers try to minimize the chance of a type I error. The strategy is to apply a stricter alpha level. This is possible, for example, by using the Bonferroni adjustment . The alpha (often 5 %) is divided by the number of t-tests that must be performed. For example, if we want to perform ten t tests, we must use an alpha of (0.05 / 10 =) .005 instead of an alpha of .05 . The problem, however, is that the use of the Bonferroni adjustment increases the chance of a type II error. As explained in earlier chapters, the type II error is the probability that the null hypothesis is retained, while in reality it is not correct . The chance that the t-test lacks effects that were actually present is increased by applying the Bonferroni adjustment. This is because the alpha level is so low .

For this reason, ANOVA ('analysis of variance') is preferred over the Bonferonni adjustment. ANOVA is used for designs that contain more than two conditions . This type of analysis is not time-consuming, because all group averages can be analyzed at the same time . ANOVA determines whether there is a statistically significant difference between a set of group averages, whatever those group averages may be. The risk of a type I error does not increase by performing an ANOVA test .

When and how do we use ANOVA ?

When an independent variable has no effect, differences in group means are solely the result of error variance. If the independent variable does have an effect, the differences in group averages would be larger than would be expected based on the error variance alone . We should then find that the spread between the conditions (the intermediate group variance) is greater than the variance within the conditions (the inner group variance ). ANOVA is based on the F test. This test is calculated to compare the variance between conditions ( 'between-groups variance' ) with the variance within conditions ( 'within-groups variance' / 'error variance' ). We can estimate the probability that the differences between the condition averages are the result of error variance by testing this F ratio. 

The working of ANOVA

As described earlier, the total variance of a data set can be divided into systematic variance and error variance. In formula form that is: total variance = systematic variance + error variance. ANOVA is used in a one-way design to break down the total variance in the components.

  • The 'sum of squares' (SS) represents the total amount of spread in a data set. The 'total sum of squares' is calculated by (1) subtracting the mean of each score, (2) squaring these differences and (3) adding up the squared differences. The formula is: SStotal = Σ (xi - )².

  • SStotal stands for the total spread in a data set. This spread is divided into (1) 'sum of squares between-groups' (the between-groups sum of squares, or SSbg ) and (2) 'sum of squares within-groups'; (the within-groups sum or squares, or SSwg ).   

  • To calculate the 'sum of squares within-groups' (SSwg) , the spread within each individual condition must first be calculated. All this information must then be added. In short: SSwg = Σ (x1 - 1) ² + (x2 - 2) ² +… + (xk - k) ². The SSwg is a measurement of the error variance for various reasons: Within each condition, all test subjects encounter the independent variable to the same extent. Therefore, the spread found within each specific condition may not be due to the independent variable. It is therefore clear that the independent variable did not cause the spread. Therefore, there can be no systematic variance . That is why we can establish that there is error variance. The SSwg becomes larger when there are more conditions. In order to get an idea of the mean distribution within the experimental conditions, we divide SSwg by nk. N refers to the total number of participants. K represents the number of conditions. We also refer to nk as the 'degrees of freedom' within the groups ( 'within-groups degrees of freedom' ) and abbreviate them with dfwg. We first calculate the degrees of freedom. We then divide SSwg by these degrees of freedom. We call the result 'mean square within-groups' . In short; MSwg = SSwg / dfwg. The result gives us an idea of ​​the error variance. The within-groups variance is the same as the error variance.  

  • The 'sum of squares between groups' (also called 'sum of squares for treatment') represents the variance that arises as a result of the manipulation of the independent variable. The calculation of SSbg is based on the idea that all group means should be approximately the same if the independent variable had no effect. Each group mean should therefore be almost equal to the mean of all groups together (the great mean, or the 'grand mean'). However, if the independent variable has an effect, the group means should differ from the grand mean. To calculate SSbg, we use the following formula:
    SSbg = n1 (1 - GM) ² + n2 (2 - GM) ² +… + nk (k - GM) ². GM stands for the grand mean in this formula. SSbg also includes a calculation of the degrees of freedom ('between-groups degrees of freedom'). The formula of these degrees of freedom is k-1. K also stands for the number of conditions. To find the 'mean square between-groups' MSbg, we use the following formula: MSbg = SSbg / dfbg.     

The F key

We always expect to find a difference between the group means. However, what we do not know is whether the differences between the group means are greater than we would expect based on the error variance alone. Researchers would like to find this out. In order to do this, they perform an F test.

We find the F value as follows: F = MSbg / MSwg. If the independent variable has had no effect, then the numerator and the denominator are not very different. The F value will resultingly be around 1.00. However, if the independent variable does have an effect, MSbg should be relatively large. However, the question is how big the difference must be to be able to speak of a significant difference. To find out, we need to find the critical value of F in the F table. To find this value, we must first (1) determine an alpha level (often that alpha value is .05 ). Then (2) we calculate the degrees of freedom for between the groups (dfbg) and finally (3) we calculate the degrees of freedom for within the groups (dfwg). If the F value found is greater than the F value in the table, we can reject the null hypothesis. Rejecting the null hypothesis means that we can establish that at least one of the group means differs from another group mean, or from several other group means.  

Expansion of ANOVA to factorial designs

In one-way ANOVA the total variance can also be divided into systematic variance and error variance. These types of variance are also referred to as between-groups and within-groups variance. In this case, between-groups variance is the same as systematic variance. Within-groups variance is the same as error variance. In a formula it looks like this: SS total = SS bg + SS wg .

In an ANOVA with more than 1 independent variable, the systematic variance can be further subdivided into subgroups. For example, variables A and B. We can calculate the sum of squares (SS) and the mean square (MS).

Taken together, the total now consists of four groups, namely:

  1. The error variance

  2. The main effect of variable A (SS A and MS A )

  3. The main effect of variable B (SS B and MS B )

  4. The interaction variable between A and B (AxB)

In scheme this looks like this: SS total = SS A + SS B + SS AxB + SSwg. 

This means that there can be no other influence on the total besides the effects of variable A, B, AxB and the error.

To calculate the systematic variance caused by the independent variable A (SS A ), we must ignore variable B in this example. We only calculate the between-group variance for variable A (SS bg ). If the independent variable (A) has no effect, we expect that the means for the different conditions of A are approximately equal to the mean of all groups together, also known as grand mean or large mean (GR). However, if the means in the conditions of the variable A are greater than the means of the other conditions, then these means should differ from the grand mean.

In diagram we can calculate SS A in the following way:

SS A = n a1 ( a1 - GM) ² + n a2 ( a2 -GM) ² + …… + n aj ( bk -GM) ²

In this formula, the grand mean (GM) is always subtracted from the mean of each condition (aj). The answer to that calculation is subsequently squared. Finally, the squared answers are added. By simply replacing the A (condition A), by B in the formula (condition B), we calculate SS B . The formula then becomes: SSB = n b1 (b1-GM) ² +… etc.

If we divide the means to these formulas by the number of degrees of freedom ('degrees of freedom', or DF ), then we get the mean square (MS) for the variable in question. The number of degrees of freedom can be found in the following way: the number of conditions of the relevant variable -1.

For example: MS A = n a1 ( a1 - GM) ² + n a2 ( a2 -GM) ² + ...... + n j ( b k -GM) ² / DF A

DFA: number of conditions in A-1

If the variance in the outcomes of the participants can be assigned to the main effects A and B and there is no interaction, the formula looks like this: SS totaal l = SS A + SS B + SS wg.

However, it may also happen that the sum total of the SS A, SS B and SS wg appears to be less than the total (SS TOTAL l). In that case there is an interaction between the variables (AxB). The interaction can therefore be calculated by subtracting SS A, SS B and SS wg from the total. In formula form this is: SS AxB (interaction) = SS total - (SS A + SS B + SS wg)  

F key 

Calculate F-test in a factorial design (for example with a 2x2 design ):

F A = MS A / MS wg

F B = MS B / MS wg

F AxB = MS AxB / MS wg

An F test is statistically significant (F> F * = SS) when the calculated F value is greater than the critical value. If this is the case, then we know that at least one of the group means differs from the other means. To be able to properly interpret the results, the means for the significant effects are calculated first. For example, if the main effect of A is significant, the means of the other conditions of A must be calculated. Variable B is not taken into account in this calculation.

No further testing is required to prove a significant result if an independent variable has a maximum of two conditions. The significant F test indicates that two means differ significantly. However, it also happens that an independent variable has three or more conditions. A significant result then indicates that there is a significant difference between at least two condition means, but it does not indicate exactly which means. If this is the case, a follow-up test is used. Follow-up tests are also referred to as 'post-hoc tests' or 'multiple comparisons' . The most commonly used are: LSD (Least Significant Differences) test, Tukey's test and the Newman-Keuls test. They must determine which mean differs from the others, after it has been determined with an F test that there are means that differ from each other. A follow-up test may only be used if the pre-calculated F test is significant. After all, if an F test is not significant, no differences between the means were found. In that case, H0 is assumed.

Interactions

When an interaction is found (AxB), we know that there is a simple main effect . A simple main effect is the effect of one independent variable on a certain level of another independent variable. In other words: it is a main effect that is known at a certain level of the other variable. The four simple main effects in and 2x2 design are:

  • A simple main effect from A to B1

  • A simple main effect from A to B2

  • A simple main effect from B to A1

  • A simple main effect from B to A2

Testing the simple main effect shows us which condition means differ from each other. A simple main effect can therefore only be calculated if it has been established that there is interaction.

How does MANOVA differ from ANOVA?

Sometimes researchers want to test the differences between conditions of different dependent variables simultaneously. T-tests and ANOVA analyzes can only be performed if there is only one dependent variable. MANOVA ('multivariate analysis of variance') is used to test the effects of two or more conditions on two or more dependent variables. The question is of course; why we do not perform two ANOVAs for both dependent variables separately? There are two reasons:

  • Sometimes the measured dependent variables are interrelated. For example, they can be part of a general construct. In such a case, a researcher may believe that it is better to analyze the variables as a set, rather than separately.

  • Remember that the more tests that are performed, the greater the chance of a type I error. For this reason we prefer to perform one ANOVA over multiple t-tests. However, it is also true that the risk of type I errors increases when we perform t-tests or ANOVAs on multiple dependent variables . The more dependent variables we study, the greater the chance that we will find significant differences that result from a type I error instead of the independent variable. Because MANOVA simultaneously measures the differences between group averages over several dependent variables, the alpha remains 5 %. The chance of a type I error therefore does not increase.

The procedure of MANOVA

MANOVA uses a new variable, called a 'canonical variable' . The canonical variable is a type of weighted sum of the variance in all dependent variables in the study. It therefore gives us a clear overview of the variable in which we are interested. A multivariate version of the F-test is performed during MANOVA. This allows us to determine whether the scores of participants on the variable in which we are interested differ due to the differences between the experimental conditions.

If the multivariate F test is significant, we can conclude that the experimental manipulation has affected the set of dependent variables as a whole. MANOVA is used to reduce the chance of a type I error. A significant multivariate F test ensures that the researcher can perform separate ANOVAs for each variable. MANOVA shows that the groups differ on something (so on one of the dependent variables). However, to see exactly where these differences come from, additional tests must be performed. If MANOVA is not significant, then it is risky to perform individual ANOVAs per dependent variable. 

Quasi-experimental Designs - Chapter 13

What is the function of manipulation?

Often, we cannot manipulate variables in such a way that we can make a statement about causality. For example, we cannot manipulate gender. Also, for ethical reasons, we are sometimes unable to conduct research in which certain variables are manipulated. When it is not possible or desirable to manipulate independent variables, a quasi-experimental design is chosen . The independent variable that is not manipulated is called the quasi-independent variable .

The internal validity of experimental designs is high. Experimental designs can clearly demonstrate when independent variables cause changes in the dependent variable. Experimental designs can indeed demonstrate that the effect on the dependent variable is not due to anything other than the independent variable, because the researcher has control over external variables. Quasi-experimental designs have a lower internal validity than experimental designs, because participants cannot be randomly assigned to the different conditions. Yet , quasi-experimental designs can provide much (indirect) evidence for cause-effect relationships. How good a quasi-experimental design is , depends on the extent to which the design has control over external factors. A distinction is made between several types of quasi-experimental designs. These are described below.

One-group pretest-post test design

As the name suggests, a pre-test and post-test are used in a one-group pre-test post design . In formula language this looks like this: O1 -> X -> O2.  O1 is the pre-measurement , X is the manipulation of the independent variable, and O2 is the post-measurement . This design is not very good because its internal validity is low. Factors that threaten internal validity are not eliminated . For example, an important event may have occurred between the pre- and post-measurement ( 'historic effect' ). It is also possible that the responses of the test subjects are influenced by the fact that they participated in the pre-measurement ( 'testing effect' ).

The internal validity of a one-group pretest-post-test design can be threatened by regression to the mean  . This is the tendency of extreme scores to move in a distribution towards the mean with repeated measurement. Measurement errors then arise that mask the test subjects' true scores. The more measurements that take place, the greater the chance of measurement errors that cause an (incorrect) rise or fall in the scores. In fact, a one-group pretest-post-test design is called a " pre- experimental design," rather than a quasi-experimental design. The reason for this is that with a one-group pretest-post test design we deal don't with experimental control and internal validity. In other words: the design does not meet the basic conditions of an experimental (or quasi-experimental) study, and is therefore often renamed.

Non-equivalent control group design

To design a strong study with one-group pretest-post-test design, the researcher can add control groups. However, it is not possible to use a real control group because then test subjects cannot be randomly assigned to the groups. However, an unequal control group ( 'non-equivalent control group' ) can be used. With a ' non-equivalent control group design ' , the researcher looks for one or more groups of participants that resemble the group on which the quasi-independent variable is being tested. Non-equivalent control groups design come in two forms:

  • ' Non-equivalent groups post-test-only design ' : this only concerns a post-measurement. The quasi-experimental group first receives a manipulation (X) and then a follow-up measurement (O). The non-equivalent control group only receives a follow-up measurement (O). Systematically, the non-equivalent groups post-test-only design looks like this:
    Quasi-experimental group: X -> O
    Non-equivalent control group: - -> O

    If it appears that the groups differ from each other on the follow-up measurement, then we do not know whether this is due to the experiment (X) or not. It is also possible that the groups differed before the experiment. When this is the case, we speak of a selection bias . In this case, the design is very weak, because it is not clear what factors cause the differences in the post-measurement. The internal validity is very low. It is therefore not recommended to use the non-equivalent groups post-test-only design .

  • ' Non-equivalent groups pretest-post-test design ' : this design is almost the same as the non-equivalent groups post-test-only design , but in this case a pre- measurement is added for both the quasi-experimental group and the control group. In schematic form this design looks like this:
    Quasi-experimental group: O1 -> X -> O
    None equivalent control group: O1 -> - -> O2

    With this design we can see if the groups were different from each other before the research started. Nevertheless , there are also problems with the internal validity in this design. For example, it is possible that one group is exposed to a situation to which the other group is not exposed ( 'local history effect' / 'selection-by-history interaction' ).

Time series designs

With a 'time series design' , the dependent variable is measured at different moments before and after the application of the quasi-experimental variable . So there are multiple pre-measurements and multiple post-measurements. There are five different forms of this design :

  • ' Simple interrupted time series design ': with this design there are several pre and post measurements. The simple interrupted time series design has the following form:
    O1 -> O2 -> O3 -> O4 -> X -> O5 -> O6 -> O7 -> O8.
    The measurements are, so to say, interrupted by the addition of the quasi-experimental variable (X). If X is really effective, then there should be a difference between O4 and O5 . However, with this design the internal validity is also threatened . This threat is called 'contemporary history' : we cannot rule out the possibility that the observed effects will not be caused by another event that occurred at the same time as the addition of the quasi-experimental variable.

  • ' Interrupted time series with a reversal ': with this design there are pre-measurements and post-measurements after the addition of the quasi-experimental variable. However, the quasi-experimental variable is removed again after the measurement. After this removal, additional post-measurements are made. This design has the following shape: O1 -> O2 -> O3 -> O4 -> X -> O5 -> O6 -> O7 -> O8 -> –X -> O9 -> O10 -> O11 -> O12. If after the removal of the quasi-experimental variable the scores return to the scores on the pre-measurements, then we can state that quasi-experimental variable probably caused this effect. 

  • ' Interrupted time series design with multiple replications': this design is similar to the interrupted time series with a reversal design. With this design, however, the quasi-experimental variable is first used, then deleted and then used again and finally deleted. Schematically, the interrupted time series design with multiple replications looks like this: O1 -> O2 -> O3 -> X -> O4 -> O5 -> O6 -> -X -> O7 -> O8 -> O9 -> X -> O10 -> O11 -> O12 -> -X -> O13 -> O14 -> O15.
    This design has three drawbacks: (1) sometimes it is not possible to remove the quasi-experimental variable, (2) the effects of the quasi-experimental variable may persist, even though this variable is removed and (3) the removal of the quasi-experimental variable can cause changes that do not necessarily have to be due to the effects of this variable.

  • ' Control group interrupted time series design': in this case a number of pre-measurements are carried out with the quasi-experimental group, then the quasi-experimental variable is introduced and finally, more measurements are made. The same is done with an non-equivalent control group . This non-equivalent control group resembles the quasi-experimental group as much as possible. The following schema applies to the quasi-experimental group: O1 -> O2 -> O3 -> O4 -> X -> O5 -> O6 -> O7 -> O8. The following schema applies to the control group: O1 -> O2 -> O3 -> O4 -> - -> O5 -> O6 -> O7 -> O8.
    This design excludes the influence of historical effects. Both groups experience the same situational factors. If one group scores differently on the post-measurements, then this must be due to the quasi-experimental variable . Differences cannot be due to historical effects.

  • ' Comparative time series design ': this design is also called 'comparative trend analysis' . Two or more variables are studied in sequence. The researcher mainly tries to understand how changes in one variable are related to changes in the other variable.

If it turns out that one variable is accompanied by changes in the other variable, this provides indirect evidence of causality . As is always the case with quasi-experimental designs, the comparative time series design provides no certainty when concluding about causality. If it is clear that differences in one variable are related to changes in the other variable, this can, however, provide insight into causality.

What are longitudinal studies?

In longitudinal studies, the quasi-experimental variable is time itself. Consequently, there is no question of an intervention or manipulation of the independent variable. The procedure works as follows: Different measurements are taken over time to see if they yield the same scores. Longitudinal examinations are often used by developmental psychologists to investigate age-related developmental changes . The main focus is on changes in how people think, how they feel and how they behave. Longitudinal research is very informative, but has three disadvantages: (1) researchers find it difficult to find people who want to participate over and over again , (2) many participants eventually drop out, for example because they pass away or because they move and (3) it costs a lot of time, money and effort to test people for several years.

Another option is to perform a cross-sectional design . This involves testing groups of people of different ages at the same time. The disadvantage of a cross-sectional design , however, is that age-related changes cannot be distinguished from generation-bound effects . People of different ages differ not only in their age, but also in the living conditions and environment. Another advantage of longitudinal examinations over cross-sectional examinations is that small changes over time can be discovered through a longitudinal design, while this is not possible with a cross-sectional design. It is important, however, to exclude alternative statements when certain changes have been identified through a longitudinal examination .

Program evaluation

Quasi-experimental designs are often used in program evaluation. A program evaluations is a type of behavioral research method that analyzes interventions ( also called manipulations or programs ) that are intended to influence behavior . Interventions that influence behavior can often not be manipulated. That is why a quasi-experimental design is often used . The primary purpose of a program evaluation is to provide information to the people participate in the programs. In some cases, however, it is possible to conduct a real experiment. This can be done by assigning some people to one program and assigning other people to another program . People who have the task of evaluating the programs often have no control over which programs they analyze.

Evaluate quasi-experimental designs

Scientists used to find quasi-experimental designs unreliable, because they give less certain results than experimental designs do . In experiments, it is possible to randomly assign people to groups that represent different manipulations of the independent variable. This ensures that the internal validity increases. As a result, we can say with certainty that the independent variable has caused a difference in the dependent variable. However, it is still important to conduct quasi-experimental studies. If we did not do this, we could never investigate variables that we cannot manipulate. Please note that quasi-experimental studies guarantee internal validity not nearly as good as experimental studies. A statement about causality cannot be made with certainty. To be able to speak of causality, three conditions must be met: (1) the variable that is thought to have effect must precede the effect, (2) covariance must occur and (3) all alternative explanations should be excluded by randomly assigning people to conditions or by experimental control. Quasi-experimental studies meet the first two conditions, but not the last . They have little control over external variables that could be the cause of changes in the dependent variable.

Improve quasi-experimental designs

Because quasi-experimental designs can never provide perfect evidence for the existence of a cause-effect relationship, it is important to engage in 'critical multiplism' . This means that we must try to find evidence for a relevant hypothesis in as many different ways as possible. While doing that, more and more research results emerge that support the existence of a specific cause-effect relationship. The sum of all the evidence provides more conviction and more certainty of a causal relationship between the two variables in question.

Single-case Experimental Designs - Chapter 14

What approaches are there when analyzing behavior?

It is important to remember that the effect of the independent variable that is discovered only describes the average person. That is to say; the effect of the independent variable does not necessarily apply to every person in the population. There are also people on whom the independent variable has no effect.

Scientists have discussed a lot in the past about whether human behavior should be analyzed according to the nomothetic approach or the idiographic approach, which will be described now.

  • The nomothetic approach states that we must look for general principles and generalizations that we can apply to all individuals. But as described above, it rarely happens that a research result applies to all individuals of the entire population.

  • The idiographic approach states that the behavior of individual participants must be described, analyzed and compared. There are several designs that researchers can use in order to manipulate independent variables. Experimental control over external variables can also be maintained when using these designs. The behavior of individual participants is then analyzed (and not the behavior of averages of the entire group).   

  • Other scientists state that case studies are important. Case studies describe one group or one individual. Case studies do not use experimental control. Single-case experiments, on the other hand, do have experimental control .

What do single-case experimental designs look like?

In the designs discussed so far, the researchers study the effects of independent variables by comparing the average responses of two or more groups. These types of designs are also called group designs . In a single-case experimental design , the experimental group is not analyzed as a whole, but each individual from the experimental group is analyzed separately. Every time, between three to eight participants are analyzed. With single-case experimental designs , the mean of the test results is almost never calculated. Because the researcher looks at the individual, the group mean does not matter. Because no mean is calculated , it is also not possible to use inferential statistics, such as the t-test or the F-test . 

What criticism is there on group designs?

Scientists who are in favor of single-case experimental designs criticize group designs . According to the opponents of group designs, the use of a group design does not guarantee (1) the elimination of the error variance, (2) generalizability and (3) reliability.

1. Error variance

There is always a certain amount of error variance in a dataset. Group designs try to reduce the error variance in two ways. First, they try to calculate the mean of the responses of participants in a group. With this method they try to estimate the effect of the independent variable . Secondly, the error variance can be reduced by using groups. We can then analyze whether the differences between groups are greater than we would expect based on the error variance alone .

Opponents of group designs state that the error variance in a group is not the result of differences in behavior between participants. According to them , being in a group itself causes error variance. In addition, an argument of these opponents is that scientists deal with error variance in a group design too easily. The opponents think that group designs are too much on about discovering differences between people ( 'interparticipant variance' ), while it should be about differences within individuals themselves ( 'intraparticipant variance' ). Error variance is the result of the differences within individuals themselves , but these differences do deserve attention. According to proponents of single-case experimental designs , group designs do not pay enough attention to that. They state that we must first find out what the causes of error variance are. Only then can we eliminate the error variance.

2. Generalizability

In group designs, a group mean for different conditions is often found, while the individual score of the test subject does not match with this. It is therefore almost never the case that a mean is found that exactly matches to an individual score. Opponents of group designs therefore argue that group means that do not match individual responses should not be used to perform scientific analyzes.

3. Reliability

A third point of criticism on group designs is that the experiment is often not repeated. It is only checked once whether an independent variable has had an effect. The conclusions are not supported by any further research . This is at the expense of the reliability of the research results . In the single-case experimental designs, the effects of an independent variable are replicated. This is done in two ways: (1) by intra-participant replication : replicating the effects of the independent variable in an individual and (2) inter-participant replication : replicating the effects of the independent variable in more than one subject . Inter-participant replication allows the researcher to determine whether the results of one test subject can be generalized to other test subjects. With a single case experimental design, each individual in the group is analyzed individually . Subjects are never analyzed as a group, even when several subjects are studied at the same time.

What types of single-case experimental designs are there?

A distinction is made between three types of single-case experimental designs.

1. ABA design

An ABA design shows that the independent variable influences behavior firstly by demonstrating that the independent variable causes the behavior. Then the design shows that removing the variable causes the behavior to disappear again.

The ABA design is also called a ' reversal design '. The test subjects are repeatedly measured. First, the participant is observed without the influence of the independent variable. This is also called the baseline level . The independent variable is then introduced. After this, the behavior of the participant is measured again. If the independent variable has had an influence on the behavior of the participant; it should have changed . However, it is possible that this behavioral change is caused by some other factor than the independent variable. That is why the independent variable is subsequently deleted . This allows us to see if the behavior returns to the baseline level again or not . The researcher can perform this procedure on multiple test subjects. If the removal of the independent variable causes all subjects to return their behavior to the baseline level , this provides evidence for the effectiveness of the independent variable. With a simple ABA design, we first establish a baseline level (A) , then we introduce the independent variable (B) and then we look at the baseline level again (A). Sometimes, a researcher can choose to re-introduce the independent variable after the second baseline level; there is then an ABAB design. It is also possible to do this (much) more often. For example , an ABABABABA design can be used.

2. Multiple-I design

An ABA design does not use different levels of the independent variable . A single-case experimental design that uses different levels of the independent variable is called a 'multiple-I design' . We also speak of an ABC design. The researcher first tries to establish a baseline level (A). Then, he or she introduces a certain level of the independent variable (B) for a certain period of time . Finally, this level is replaced by another level of the independent variable (C). 

Another form of the multiple-I design is the ABACA design. With that design, the baseline level (A) is measured first, then a certain level of the independent variable is introduced (B), then this level is removed and a baseline level is determined (A). Then another level of the independent variable is introduced (C). Finally, C is also removed again, and a baseline level is established again (A).

In this way , there are many different types of multiple-I designs, all based on the same principle.

3. Multiple baseline design

Sometimes, it is difficult to remove an independent variable. In such a case, a multiple baseline design is used . Two or more behaviors are studied simultaneously. First, a baseline level is established for all behaviors. An independent variable is then introduced which is thought to affect only one of the behaviors . For example, the researcher can demonstrate that the independent variable has only affected one behavior, while other behaviors have remained unchanged.

Single-case experimental designs and data

For single-case experiments, group means are not considered. Therefore, the variance and the standard deviation cannot be calculated either. After all, you need means for that. Nor can researchers use t and F tests for single-case experimental designs. Moreover, we cannot see whether the results are statistically significant . We can, however, graphically display the results of a single-case experimental design per participant ( 'graphic analysis' / 'visual inspection' ) . The events (A, B, C) are then displayed on the horizontal axis (the x-axis) . The responses are on the vertical axis (the y-axis) . The different points are connected using a line. This allows us to see whether an event (a point on the x-axis) has caused an increase or decrease in a certain response. 

A disadvantage of this method of graphic representation is that we do not know whether the effect on behavior within an experimental design is large enough to speak of a real effect. We cannot test this because no t-test or F-test can be performed. An advantage of the graphical display is that such a graph looks simple and clear . Proponents of single-case experimental designs state that visual inspection is better than an inferential statistic . It is only said that the independent variable has had an effect if the effect appears to be really large. Due to inferential statistics, even very small effects can be described as statistically significant.

When are single-case experimental designs used?

In the past, single-case experimental designs were used very often. Pavlov, Thorndike, Ebbinghaus and Weber, for example, were proponents of this type of design. Today, single-case designs are mainly used to study operant conditioning, or to investigate whether behavioral change techniques are effective. Also, with this design, it can simply be demonstrated that a behavioral effect can be found, as long as the correct independent variable is used.

What criticism is there on single-case experimental designs?

Results that are yielded using single-case experimental designs are often easier to generalize than results from group designs. However, single-case designs also have their disadvantages.

The external validity of single-case experimental designs is not always better by definition. The external validity refers to the extent to which research results can be generalized to the population. Generalizability depends on how the participants are selected . Animal studies often use a single-case experimental design. This is because animal studies are often easier to generalize, because the animals being examined are often very similar. They often grew up in the same environments and genetically they are also very similar . If an independent variable has had an effect, this immediately applies to all animals of the same species. A limitation of single-case experimental designs is that they cannot be used properly to study interactions between variables. Finally, single-case experimental designs are sometimes associated with ethical problems.

What are case studies?

A 'case study' is a detailed description of an individual, group or event. For case studies , researchers can make use of observations, interviews, questionnaires, newspaper articles and archive material. Based on all information, a description ( 'narrative description' ) is put together of this person, group or event.

The use of case studies

A case study can be chosen for four reasons:

  1. A case study can serve as a source of insight and ideas . This allows a researcher to learn more about the phenomenon that he or she is investigating . This way the researcher can get ideas and think about how he or she can perform the research best.

  2. A case study can be used to describe rare events. When a rare disorder is studied , a case study is often chosen. Few people suffer from the disorder. In this case, it is important to study an individual with this rare disorder in detail to learn more about the disease .

  3. A case study can be used to write a psychobiography about someone . In psychobiography, concepts and theories from psychology are applied to analyze the lives of famous people. A psychobiography is always a description that is done afterwards . It is therefore not possible to test whether such a description is accurate.

  4. Finally, a case study can be used to quote real anecdotes instead of just talking about empirical results . Anecdotes ensure that people are quicker to believe in what the researcher has found.

Limitations of case studies

Case studies can never be used to test hypotheses . A case study has no control over external variables. For example, if a researcher detects a phenomenon through a case study, he or she cannot say for sure why this phenomenon occurs . This is because he has no control over the variables that can affect the person being examined.

Another limitation is that often only one researcher is involved in a specific case study. The researcher is often the participant's own psychotherapist. We cannot therefore determine how reliable and valid the findings of the researcher are .

Ethical Issues in Behavioral Research - Chapter 15

In behavioral research, it can sometimes be the case that certain (most often random) test subjects are disadvantaged. For example, they may experience a loss of self-esteem or a decrease in self-confidence after their participation in the experiment . This raises ethical questions. Is it acceptable to mislead test subjects or to harm them? And how much pressure ( psychologically or physiologically ) may researchers put on their test subjects? Behavioral researchers have been struggling with these questions for years .

How do we make ethical decisions?

Behavioral researchers have two types of obligations. These obligations sometimes conflict with each other. Ethical issues may arise during such a conflict. On the one hand, behavioral researchers must provide information that leads to an understanding of behavioral processes . This concept must then lead to an improvement in the well-being of people or animals. As a consequence of this first obligation, scientists may only conduct research if the aim, from an ethical perspective , is to increase our knowledge or solve problems. On the other hand, behavioral researchers are obliged to guarantee the rights and well-being of the human and non-human subjects they are investigating . Ethical issues arise when the researcher's obligations to science and society ( the first obligation ) clash with the obligation to protect the test subjects (the second obligation).

In addition, there are many disagreements about the ethics of the procedures of certain experiments. Even on the fundamental principles of ethics, opinions differ . This often results in an impasse (stagnation) in ethical conflicts. At such a moment we doubt what ethical decisions should be made, and whether these decisions can be made at all .

There are three types of general approaches researchers apply when trying to resolve ethical issues. These approaches differ from each other on the criteria used when judging about what is right and what is wrong. The three approaches are called: deontology , ethical skepticism and usefulness. 

  • The first approach is deontology. According to ethics, ethics must be assessed by means of a universal moral code . For example , some actions are very unethical and cannot be carried out under any circumstances.

  • Ethical skepticism, on the other hand, states that concrete, established moral codes do not exist. So this goes against the principles of deontology . Skeptics do not deny that ethical principles are important. Rather, they argue that ethical rules are arbitrary . It depends on the culture and the point in time whether a certain action is labeled right or wrong .

  • The third approach is that of usefulness . This approach states that the ethical assessment of a particular action depends on the consequences of that action . The potential revenues must be weighed against the potential costs. If the potential revenues are higher than the costs, then the action concerned is ethically allowed. Most official research guidelines mainly follow this approach to usability. Consider, for example, the American Psychological Association (APA) .

What fundamental ethical guidelines are there?

It is clear that researchers disagree on ethical issues. However, all researchers are always bound by two sets of ethical guidelines; the first consisting of principles formulated by professional organizations, such as the APA (Americal Psychological Association) . The APA has, for example, drawn up the so-called Ethical Principles of Psychologists and Code of Conduct . It sets out the ethical standards that psychologists must follow in all areas of their professional lives; in therapy, evaluation, teaching and research . In addition, all behavioral researchers are bound by regulations that are affected by the government, or are laid down in local laws.

Both sets of ethical guidelines are primarily based on the principle of usefulness or are pragmatic. They state that the researcher must weigh the potential benefits against the potential costs. What they do not do is establish specific ethical do's and don'ts for research.

Weighing up potential benefits (or revenues ) and potential costs is also referred to as cost-benefit analysis .

Potential benefits

Behavioral research has five potential benefits . This must be taken into account when preparing a cost-benefit analysis.

  • Basic knowledge: The biggest advantage of research is that it increases our knowledge. The more knowledge the research provides, the greater the revenues (benefits) are, so the more costs and risks are allowed in the research.

  • Improving research techniques: Some research is done to improve research procedures . This does not immediately increase knowledge, but it does improve research methods . As a result, knowledge may be increased in the future.

  • Practical outcomes: There are also studies that provide practical outcomes. They do this by directly improving the well-being of people or other living beings .

  • Proceeds for researchers: When you set up a research, you usually want to benefit from it yourself . Scientists often have to provide research to make a career or keep their job. Research also has an educational function . Not only for experienced researchers, but also for students.

  • Benefits for test subjects: The people participating in the study can also benefit from their participation. This is most evident in clinical research : Subjects receive experimental therapies that may cure them. Subjects could also learn a lot from their participation, for example about how scientific research works. Finally, it can be fun to participate in the study.

Potential costs

The benefits mentioned must be weighed against the potential risks and costs of the research. Some costs are relatively small and thus negligible . Consider, for example, the time and effort that the test subjects put into the study. Higher costs are the possible damage to the mental or physiological well-being of the test subjects. Participating in a study can , for example, lead to a reduction in self-confidence, stress, boredom, anxiety or pain. Subjects can also fear that others will find out about their reactions . The largest costs are incurred in studies that endanger the health or life of the test subjects.

Other costs for the researchers lie in the area of ​​money . Conducting research can be very expensive.

What is the role of the Institution Review Board (IRB) ?

When all potential costs and benefits have been determined, they must be weighed against each other. The researchers themselves are often not the most objective assessors in such a case. That is why guidelines have been drawn up that stipulate that every study in which people participate must be approved by an Institution Review Board (IRB) . Every institution that receives money from the government must have its own IRB. To protect the test subjects as well as possible , each IRB must include people from non-scientific disciplines and people who do come from a scientific discipline.

Researchers who use human test subjects must submit a written request to the IRB. This request states: (1) the purpose of the study, (2) the procedures that will be used and (3) the potential risks that the test subjects run .

Six issues are important in research involving people. These are: the informed consent, the degree of invasion of privacy , obligation to participate, possible psychological or physiological damage, deception and threat to the confidence of the test subjects .

What is the principle of the informed consent?

One of the most important ways to protect the rights of test subjects is to ask them to read an informed consent prior to the conduction of the experiment . An informed consent informs the test subjects about the purpose of the study. It also requires specific permission from the test subjects to participate. Making use of an informed consent ensures that researchers will not abuse the test subjects. They will also stress the privacy of the test subjects. Finally, the test subjects receive sufficient information about the research. This allows them to consciously decide whether or not they want to participate in the experiment.

But the informed consent is cannot inform the participants with too much information about the investigation. Researchers can therefore keep information about the hypotheses secret . However, if the subjects will suffer pain during the study, the researchers should of course not keep this from the participants. Everything that could influence the test subjects' decision about their participation must be mentioned in the informed consent.

Problems with the informed consent

Sometimes researchers do not want to provide all information in their informed consent . This creates a number of problems with this informed consent:

  • Problems with validity: Informing test subjects about the research can lead to a reduction in validity . People often behave differently when they know they are being observed . In addition, making the purpose of the study known can make test subjects sensitive to change certain aspects of their behavior. Usually they would not pay attention to this, but because they are made aware of this behavior, they become more aware of it. This can influence the research results.

  • Some test subjects cannot receive informed consent. Think of children or the mentally disabled. When they are examined, the informed consent must be given to the parents or legal representative of the test subjects.

  • Experiments that do not require informed consent. Sometimes it is not useful to use an informed consent.

Invasion of privacy

The researcher him- or herself is responsible for deciding whether or not the privacy of participants is violated in an experiment. If the investigator or the IRB comes to the conclusion that there has been a violation of privacy, then the experiment may not be conducted. Most researchers believe that the observation of people in public places does not infringe privacy. Observing people in private matters is often considered an invasion of privacy .

Forced participation

Forced participation occurs when test subjects are put under pressure to participate in the study. In that case, the subject believes that refusing to participate will have negative consequences for him or her. Researchers must respect the freedom of the individual . Test subjects may terminate their participation at any time. In addition, researchers may not give their test subjects an extremely high amount of money for their participation. That would indirectly force the subject to participate .

Physical and mental stress

Sometimes, scientists do research towards the effects of stress, failure, anxiety or pain . To examine such experiences, the test subjects must be exposed to stress. But how much stress can a researcher ethically let his test subjects experience? There are two extremes in solving that question. Some people think that the test subject may go through as much stress as is necessary for the study, others think that there should only be a minimum risk . This means that test subjects should not experience more stress than the amount of stress they would have experienced if they had not taken part in the study . There are many points of view somewhere in between these two extremes, but usually the final decision is made by the researchers and the IRB. This decision is often based on a cost-benefit analysis. Studies in which test subjects experience stress are only permitted if the costs (stress) are lower than the benefits . Subjects must be fully informed in advance about the possible negative experiences in an informed consent.

Deception

Behavioral researchers use deception for various reasons . The most common reason is to prevent test subjects from finding out about the purpose of the study. After all, test subjects often behave differently when they know they are being observed . In the case of deception, for example, an incorrect or incomplete purpose of the investigation is formulated in the informed consent. As a result, test subjects do not find out about the real purpose of the research. The behavior that is being investigated is thus much less influenced. Deception may only be used if the researchers believe it is necessary for the research.

Opponents of deception claim that lying is always bad, even when done with good intentions (such as for the purpose of scientific research) . This claim is based on deontology. Other opponents reason more pragmatically: Even if deception is allowed because it leads to positive outcomes, it can still lead to negative consequences. Subjects can distrust scientific researchers because they have not told them the truth about the purpose of the study they participated in. However, most test subjects seem to have an understanding of the purpose of deception. They understand that it is necessary and often have no problems with it.

Confidentiality

The information obtained about the test subjects is confidential . This means that the data from the test subjects may only be used for research purposes. Others may not have access to this information. Although it often does not matter that much in most cases, in some cases breaching the confidentiality of data can do harm . That is why it should be avoided as much as possible, since the information about the test subjects can be sensitive. If the information becomes public, it can have negative consequences for the test subject . The easiest way not to damage the participant's trust is to ensure that the responses remain anonymous. Data are anonymous if they do not contain information that can be used to identify the test subject.

Confidentiality sometimes becomes an issue when researchers publish their results. Especially when it is about an individual. When only one person is described, it is especially important that privacy and trust are not compromised.

What is being told in a debriefing?

Debriefing is the reception of the test subject after their participation in the study. The researcher must perform the following tasks in the debriefing:

  1. First, he or she must explain the purpose of the study to the test subject. It is possible that information has been withheld from the participant in the informed consent. This information still must be given after the experiment. When deception is used, the researcher must explain this and apologize.

  2. The second purpose of debriefing is to remove the stress or other negative consequences that may be caused by the experiment.

  3. Thirdly, the researcher must be informed of the test subject's experiences. He or she looks at the test subject's responses to the study.

  4. Finally, it is the task of the researcher to ensure that the test subjects are happy with their participation. The test subjects must feel that their participation was important and meaningful .

Behavior of the researcher

A researcher must not only comply with formal rules. He or she must also treat the test subjects well and be nice.

Vulnerable populations

When examining vulnerable populations, a number of additional matters must be taken into account. IRBs must pay extra attention to the protection of the well-being of, for example, children, prisoners, people with a mental disability, people who are suicidal, pregnant women, fetuses and newborns .

Children and adolescents under the age of 18 are required to have permission from their parents or legal representatives to participate in the study . The reason for this is that they may not understand the risks of participation themselves sufficiently. However, children older than 12 years of age may refuse to participate, even if their parents or representatives have already signed the informed consent . Therefore, they cannot be forced to participate. Children under the age of 12 can be forced to participate in a study .

Prisoners belong to the vulnerable populations for three reasons:

  1. First, their lives are completely determined by guards . That is why they need extra protection against forced participation.

  2. Secondly, the daily lives of prisoners are so monotonous that they do everything to escape the routine. So it may be that they participate in a certain study to get out of this routine, even if it is an study in which they would never have wanted to participate under other circumstances. Special attention must be paid to this.

  3. Finally, it must be clear that prisoners who participate in the study will not be treated any better than prisoners who do not participate. In other words: participation will not have a positive effect on treatment in prison .

The mentally handicapped receive special attention. The researcher must be sure that these people understand the research well enough to indicate whether they want to participate or not.

One of the most difficult populations consists of people who are suicidal. Does the investigation not increase the chance that these people commit suicide ? And is it wise to use a control group that does not receive treatment when they often need it?

Finally, pregnant women, fetuses and newborns need extra protection. This is especially important when administering medication.

What ethical principles play a role in animal research?

The APA does not only provides ethical guidelines for research with humans, but also specifies about how animals should be treated if they participate in research. The guidelines for animal research are much less detailed. The guidelines state that all animal research must be supervised by someone who is specialized in the care and use of laboratory animals. A veterinarian must also be present to be able to treat the animals afterwards if necessary . In addition, all personnel present must know the guidelines and be well trained in the treatment and use of laboratory animals .

The experimental circumstances in animal research are controlled by the National Institute of Health and by national and local laws.

Defenders of animal rights are mostly concerned about the experiments that are being performed on laboratory animals. The APA tries to minimize the inconvenience for the laboratory animals by drawing up guidelines. Procedures that cause more than minimal pain or discomfort must first be approved by a board. The APA has also established guidelines for the use of operational procedures, the examination of animals through field research , the use of animals for educational purposes, and the care of laboratory animals after the end of the study.

Scientific deception (scientific misconduct)

Scientists are therefore bound by ethical rules for both research with humans and research with animals. In addition, it is important that they do aim to do their job fairly. In this way, science is protected against scientific deception (scientific misconduct). The National Academy of Sciences has established three basic categories of scientific deception:

  1. The first category contains the worst form of scientific lying. This includes fabrication (fabrication ; adjusting data independently), falsification (falsification ; deliberately rejecting data or results) and committing plagiarism (plagiarism ; copying someone else's work without mentioning the original source) .
    Adjusting data (production) independently is wrong. Nevertheless, it is permitted in a number of cases. If a test subject has not followed the instructions, if he or she was under influence of some kind of substance, or shows extreme reactions, the data of this test subject may be deleted from the research. This is allowed because it removes factors that may be caused by other variables. This increases the validity of the results.

  2. A second category of scientific deception is that of unclear scientific practices. These are not as bad as the aforementioned deceptions, but they are still problematic. For example, researchers should only write their own name under work that truly belongs to them.
    Another problem may occur if researchers do not publish the data that does not match their hypotheses . It is also possible that researchers refuse to allow experts to reproduce their work . They can do this, for example, by not making their data public. It may be a sign that something is wrong with the research.

  3. The latter category contains unethical behavior towards colleagues or test subjects. Think of sexual assault, abuse of power, discrimination or the evasion of laws and / or rules .

Evade scientific intelligence research results

It has often happened that scientific discoveries have been ridiculed, suppressed, or even punished. This is still a common phenomenon today. Scientists and teachers are still under pressure. Therefore, they often avoid research towards sensitive subjects.

It may be that a certain group of people feels attacked or discriminated by scientific results. This raises ethical questions. Some people believe that science needs to be regulated because science can have negative outcomes to individuals. Most researchers think they should be free to spread the knowledge they have gained in their research. From their point of view , it should not matter whether that knowledge results in positive or negative outcomes to certain individuals. The only people who are in a position to assess research results are other researchers. Politicians, citizens or others have nothing to do with this . The suppression of knowledge is very unethical in the eyes of many researchers.

Image

Check summaries and supporting content in teasers:
Studiegids voor samenvattingen bij Introduction to Behavioral Research Methods van Leary

Studiegids voor samenvattingen bij Introduction to Behavioral Research Methods van Leary

Studiegids met samenvattingen en studiehulp voor:

  • Boektitel: Introduction to Behavioral Research Methods
  • Auteur: Leary
  • Druk: 6e druk

Waar gaat 'Introduction to Behavioral Research Methods' over?

  • Dit boek is een uitgebreide inleiding in de wereld van gedragsonderzoek. Het leert je hoe je op een wetenschappelijk verantwoorde manier onderzoek kunt doen naar menselijk gedrag. Je leert over verschillende onderzoeksmethoden, zoals experimenten, surveys en case studies, en hoe je deze kunt ontwerpen en uitvoeren.
  • Je krijgt inzicht in de basisprincipes van de wetenschappelijke methode en leert hoe je hypothesen kunt formuleren, data kunt verzamelen en analyseren, en conclusies kunt trekken. Daarnaast leer je over de ethische aspecten van onderzoek en hoe je onderzoek kunt ontwerpen dat zowel valide als betrouwbaar is.

Wanneer kan je 'Introduction to Behavioral Research Methods' goed gebruiken?

  • Dit boek is geschreven voor studenten die psychologie, sociologie of andere gedragswetenschappen studeren. Het is ook
.........Read more
Access: 
Public
Access: 
Public

Image

Image

 

 

Contributions: posts

Help other WorldSupporters with additions, improvements and tips

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Image

Spotlight: topics

Check the related and most recent topics and summaries:
Institutions, jobs and organizations:
Activity abroad, study field of working area:
WorldSupporter and development goals:

Image

Check how to use summaries on WorldSupporter.org

Online access to all summaries, study notes en practice exams

How and why use WorldSupporter.org for your summaries and study assistance?

  • For free use of many of the summaries and study aids provided or collected by your fellow students.
  • For free use of many of the lecture and study group notes, exam questions and practice questions.
  • For use of all exclusive summaries and study assistance for those who are member with JoHo WorldSupporter with online access
  • For compiling your own materials and contributions with relevant study help
  • For sharing and finding relevant and interesting summaries, documents, notes, blogs, tips, videos, discussions, activities, recipes, side jobs and more.

Using and finding summaries, notes and practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Use the summaries home pages for your study or field of study
  2. Use the check and search pages for summaries and study aids by field of study, subject or faculty
  3. Use and follow your (study) organization
    • by using your own student organization as a starting point, and continuing to follow it, easily discover which study materials are relevant to you
    • this option is only available through partner organizations
  4. Check or follow authors or other WorldSupporters
  5. Use the menu above each page to go to the main theme pages for summaries
    • Theme pages can be found for international studies as well as Dutch studies

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study for summaries and study assistance

Main summaries home pages:

Main study fields:

Main study fields NL:

Follow the author: Noa
Work for WorldSupporter

Image

JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

Statistics
2287 1