Psi refers to the anomalous retroactive influence of future events on an individual’s current behaviour. There are three important deficiencies in modal research practice: an overemphasis on conceptual replication (1), insufficient attention to verifying the integrity of measurement instruments and experimental procedures (2) problems with the implementation of null hypothesis testing (3). The interpretation bias refers to a bias towards interpretations of data that favour a researcher’s theory. A potential consequence of this is an increased risk of reported false positives and a disregard of true negatives. The knowledge system of psychology consists of theory relevant beliefs (1), this is about the mechanisms that produce behaviour and method-relevant beliefs (2), this is about the procedures through which data is obtained. Deficiencies in modal research practice bias systematically bias the interpretation of confirmatory data as theory relevant (1) and the interpretation of disconfirmatory data as method relevant (2).Central beliefs are beliefs on which many other beliefs depend. Conservatism refers to choosing the theoretical explanation consistent with the data that requires the least amount of restructuring of the existing knowledge system. If method-relevant beliefs are central in a knowledge system, it becomes more difficult to blame methodology related errors for disconfirmatory results. If theory-relevant beliefs become central, it poses the threat of becoming a logical assumption. A hypothesis under test should be described in a way that is falsifiable and not logically necessary. An overemphasis on conceptual replication at the expense of direct replication weakens method-relevant beliefs in the knowledge system. A statistical significant result is often followed by a conceptual replication. A failure of the conceptual replication leads to the question whether the negative result was due to the falsity of the underlying theory...


Access options

      How do you get full online access and services on JoHo WorldSupporter.org?

      1 - Go to www JoHo.org, and join JoHo WorldSupporter by choosing a membership + online access
       
      2 - Return to WorldSupporter.org and create an account with the same email address
       
      3 - State your JoHo WorldSupporter Membership during the creation of your account, and you can start using the services
      • You have online access to all free + all exclusive summaries and study notes on WorldSupporter.org and JoHo.org
      • You can use all services on JoHo WorldSupporter.org (EN/NL)
      • You can make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (Dutch service)
      Already an account?
      • If you already have a WorldSupporter account than you can change your account status from 'I am not a JoHo WorldSupporter Member' into 'I am a JoHo WorldSupporter Member with full online access
      • Please note: here too you must have used the same email address.
      Are you having trouble logging in or are you having problems logging in?

      Toegangsopties (NL)

      Hoe krijg je volledige toegang en online services op JoHo WorldSupporter.org?

      1 - Ga naar www JoHo.org, en sluit je aan bij JoHo WorldSupporter door een membership met online toegang te kiezen
      2 - Ga terug naar WorldSupporter.org, en maak een account aan met hetzelfde e-mailadres
      3 - Geef bij het account aanmaken je JoHo WorldSupporter membership aan, en je kunt je services direct gebruiken
      • Je hebt nu online toegang tot alle gratis en alle exclusieve samenvattingen en studiehulp op WorldSupporter.org en JoHo.org
      • Je kunt gebruik maken van alle diensten op JoHo WorldSupporter.org (EN/NL)
      • Op JoHo.org kun je gebruik maken van de tools voor werken in het buitenland, verre reizen, vrijwilligerswerk, stages en studeren in het buitenland
      Heb je al een WorldSupporter account?
      • Wanneer je al eerder een WorldSupporter account hebt aangemaakt dan kan je, nadat je bent aangesloten bij JoHo via je 'membership + online access ook je status op WorldSupporter.org aanpassen
      • Je kunt je status aanpassen van 'I am not a JoHo WorldSupporter Member' naar 'I am a JoHo WorldSupporter Member with 'full online access'.
      • Let op: ook hier moet je dan wel hetzelfde email adres gebruikt hebben
      Kom je er niet helemaal uit of heb je problemen met inloggen?

      Join JoHo WorldSupporter!

      What can you choose from?

      JoHo WorldSupporter membership (= from €5 per calendar year):
      • To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
      • To use the basic features of JoHo WorldSupporter.org
      JoHo WorldSupporter membership + online access (= from €10 per calendar year):
      • To support the JoHo WorldSupporter and Smokey projects and to contribute to all activities in the field of international cooperation and talent development
      • To use full services on JoHo WorldSupporter.org (EN/NL)
      • For access to the online book summaries and study notes on JoHo.org and Worldsupporter.org
      • To make use of the tools for work abroad, long journeys, voluntary work, internships and study abroad on JoHo.org (NL service)

      Sluit je aan bij JoHo WorldSupporter!  (NL)

      Waar kan je uit kiezen?

      JoHo membership zonder extra services (donateurschap) = €5 per kalenderjaar
      • Voor steun aan de JoHo WorldSupporter en Smokey projecten en een bijdrage aan alle activiteiten op het gebied van internationale samenwerking en talentontwikkeling
      • Voor gebruik van de basisfuncties van JoHo WorldSupporter.org
      • Voor het gebruik van de kortingen en voordelen bij partners
      • Voor gebruik van de voordelen bij verzekeringen en reisverzekeringen zonder assurantiebelasting
      JoHo membership met extra services (abonnee services):  Online toegang Only= €10 per kalenderjaar
      • Voor volledige online toegang en gebruik van alle online boeksamenvattingen en studietools op WorldSupporter.org en JoHo.org
      • voor online toegang tot de tools en services voor werk in het buitenland, lange reizen, vrijwilligerswerk, stages en studie in het buitenland
      • voor online toegang tot de tools en services voor emigratie of lang verblijf in het buitenland
      • voor online toegang tot de tools en services voor competentieverbetering en kwaliteitenonderzoek
      • Voor extra steun aan JoHo, WorldSupporter en Smokey projecten

      Meld je aan, wordt donateur en maak gebruik van de services

      Check more related content in this bundle:

      Scientific & Statistical Reasoning – Summary interim exam 3 (UNIVERSITY OF AMSTERDAM)

      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 6

      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 6

      Image

      Bias can be detrimental for the parameter estimates (1), standard errors and confidence intervals (2) and the test statistics and p-values (3). Outliers and violations of assumptions are forms of bias.

      An outlier is a score very different from the rest of the data. They bias parameter estimates and have an impact on the error associated with that estimate. Outliers have a strong effect on the sum of squared errors and this biases the standard deviation.

      There are several assumptions of the linear model:

      1. Additivity and linearity
        The scores on the outcome variable are linearly related to any predictors. If there are multiple predictors, their combined effect is best described by adding them together.
      2. Normality
        The parameter estimates are influenced by a violation of normality and the residuals of the parameters should be normally distributed. It is normality for each level of the predictor variable that is relevant. Normality is also important for confidence intervals and for null hypothesis significance testing.
      3. Homoscedasticity / homogeneity of variance Homoscedasticity / homogeneity of variance
        This impacts the parameters and the null hypothesis significance testing. It means that the variance of the outcome variable should not change between levels of the predictor variable. Violation of this assumption leads to bias in the standard error.
      4. Independence
        This assumption means that the errors in the model are not related to each other. The data has to be independent.

      The assumption of normality is mainly relevant in small samples. Outliers can be spotted using graphs (e.g. histograms or boxplots). Z-scores can also be used to find outliers.

      The P-P plot can be used to look for normality of a distribution. It is the expected z-score of a score against the actual z-score. If the expected z-scores overlap with the actual z-scores, the data will be normally distributed. The Q-Q plot is like the P-P plot but it plots the quantiles of the data instead of every individual score.

      Kurtosis and skewness are two measures of the shape of the distribution. Positive values of skewness indicate a lot of scores on the left side of th distribution. Negative values of skewness indicate a lot of scores on the right side of the distribution. The further the value is from zero, the more likely it is that the data is not normally distributed.

      Normality can be checked by looking at the z-scores of the skewness and kurtosis. It uses the following formula:

      Levene’s test is a one-way ANOVA on the deviation scores. The homogeneity of variance can be tested using Levene’s test or by evaluating a plot of the standardized predicted values against the standardized residuals.

      REDUCING BIAS
      There are four ways of correcting problems with the data:

      1. Trim the data
        Delete a
      .....read more
      Access: 
      JoHo members
      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 8

      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 8

      Image

      Variance of a single variable represents the average amount that the data vary from the mean. The cross-product deviation multiplies the deviation for one variable by the corresponding deviation for the second variable. The average value of the cross-product deviation is the covariance. This is an averaged sum of combined deviation. It uses the following formula:

      A positive covariance indicates that if one variable deviates from the mean, the other variable deviates in the same direction. A negative covariance indicates that if one variable deviates from the mean, the other variable deviates in the opposite direction.

      Covariance is not standardized and depends on the scale of measurement. The standardized covariance is the correlation coefficient and is calculated using the following formula:

      A correlation coefficient of values  0.1 represents a small effect. Values of  0.3 represent a medium effect and values of  0.5 represent a large effect.

       In order to test the null hypothesis of the correlation, namely that the correlation is zero, z-scores can be used. In order to use the z-scores, the distribution must be normal, but the r-sampling distribution is not normal. The following formula adjusts r in order to make the sampling distribution normal:

      The standard error uses the following formula:

      This leads to the following formula for z:

      The null hypothesis of correlations can also be tested using the t-score with degrees of freedom N-2:

      The confidence intervals for the correlation uses the same formula as all the other confidence intervals. These values have to be converted back to a correlation efficient using the following formula:

      CORRELATION
      Normality in correlation is only important if the sample size is small (1), there is significance testing (2) or there is a confidence interval (3). The assumptions of correlation are normality (1) and linearity (2).

      The correlation coefficient squared (R2) is a measure of the amount of variability in one variable that is shared by the other. Spearman’s correlation coefficient (rs) is a non-parametric statistic that is sued to minimize the effects of extreme scores or the effects of violations of the assumptions. Spearman’s correlation coefficient works best if the data is ranked. Kendall’s tau, denoted by τ, is a non-parametric statistic that is used when the data set is small with a large set of tied ranks.

      A biserial or point-biserial correlation is used when a relationship between two variables is investigated when one of the two variables is dichotomous (e.g. yes

      .....read more
      Access: 
      JoHo members
      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 9

      Image

      Any straight line can be defined by the slope (1) and the point at which the line crosses the vertical axis of the graph (intercept) (2). The general formula for the linear model is the following:

      Regression analysis refers to fitting a linear model to data and using it to predict values of an outcome variable (dependent variable) from one or more predictor variables (independent variables). The residuals are the differences between what the model predicts and the actual outcome. The residual sum of squares is used to assess the ‘goodness-of-fit’ of the model on the data. The smaller the residual sum of squares, the better the fit.

      Ordinary least squares regression refers to defining the regression models for which the sum of squared errors is the minimum it can be given the data. The sum of squared differences is the total sum of squares and represents how good the mean is as a model of the observed outcome scores. The model sum of squares represents how well the model can predict the data. The larger the model sum of squares, the better the model can predict the data. The residual sum of squares uses the differences between the observed data and the model and shows how much of the data the model cannot predict.

      The proportion of improvement due to the model compared to using the mean as a predictor can be calculated using the following formula:

      This value represents the amount of variance in the outcome explained by the model relative to how much variation there was to explain. The F-statistic can be calculated using the following formulas:

      ‘k’ represents the degrees of freedom and denotes the number of predictors.

      The F-statistic can also be used t test the significance of  with the null hypothesis being that  is zero. It uses the following formula:

      Individual predictors can be tested using the t-statistic.

      BIAS IN LINEAR MODELS
      An outlier is a case that differs substantially from the main trend in the data. Standardized residuals can be used to check which residuals are unusually large and can be viewed as an outlier. Standardized residuals are residuals converted to z-scores. Standardized residuals greater than 3.29 are considered an outlier (1), if more than 1% of the sample cases have a standardized residual of greater than 2.58, the level of error in the model may be unacceptable (2) and if more than 5% of the cases have standardized residuals with an absolute value greater than 1.96, the model may be a poor representation of the data (3).

      The studentized residual is the unstandardized residual divided

      .....read more
      Access: 
      JoHo members
      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 11

      Discovering statistics using IBM SPSS statistics by Andy Field, fifth edition – Summary chapter 11

      Image

      Moderation refers to the combined effect of two or more predictor variables on an outcome. This is also known as an interaction effect. A moderator variable is one that affects the relationship between two others. It affects the strength or direction of the relationship between the variables.

      The interaction effect indicates whether moderation has occurred. The predictor and the moderator must be included for the interaction term to be valid. If, in the linear model, the interaction effect is included, then the individual predictors represent the regression of the outcome on that predictor when the other predictor is zero.

      The predictors are often transformed using grand mean centring. Centring refers to transforming a variable into deviations around a fixed point. This fixed point is typically the grand mean. Centring is important when the model contains an interaction effect, as it makes the bs for lower-order effects interpretable. It makes interpreting the main effects easier (lower-order effects) if the interaction effect is not significant.

      The bs of individual predictors can be interpreted as the effect of that predictor at the mean value of the sample (1) and the average effect of the predictor across the range of scores for the other predictors (2) when the variables are centred.

      In order to interpret a (significant) moderation effect, a simple slopes analysis needs to be conducted. It is comparing the relationship between the predictor and outcome at low and high levels of the moderator. SPSS gives a zone of significance. Between two values of the moderator the predictor does not significantly predict the outcome and below and above the values it does.

      The steps for moderation are the following if there is a significant interaction effect: centre the predictor and moderator (1), create the interaction term (2), run a forced entry regression with the centred variables and the interaction of the two centred variables (3).

      The simple slopes analysis gives three models. One model for a predictor when the moderator value is low (1), one model for a predictor when the moderator value is at the mean (2) and one model for a predictor when the moderator value is high (1).

      If the interaction effect is significant, then the moderation effect is also significant.

      MEDIATION
      Mediation refers to a situation when the relationship between the predictor variable and an outcome variable can be explained by their relationship to a third variable, the mediator. Mediation can be tested through three linear models:

      1. A linear model predicting the outcome from the predictor variable (c).
      2. A linear model predicting the mediator from the predictor variable (a).
      3. A linear model predicting the outcome from both the predictor variable and the mediator (predictor = c’ and mediator = b).

      There are four conditions for mediation: the predictor variable must significantly predict the outcome variable (in model 1)(1), the predictor variable must significantly predict the mediator

      .....read more
      Access: 
      JoHo members
      Foster (2010). Causal inference and developmental psychology.” – Article summary

      Foster (2010). Causal inference and developmental psychology.” – Article summary

      Image

      The problem of causality is difficult in developmental psychology, as many questions of that field regard factors that a person cannot be randomly assigned to (e.g. single parent family). Causal inference refers to the study and measurement of cause-and-effect relationships outside of random assignment.

      In the current situation in developmental psychology, it is unclear among researchers whether causality can be implied and why. Causal inferences are necessary for the goals of developmental psychology because causal inferences can improve the lives of people (1), can help distinguish between associations and causal claims for laypeople (2) and causal thinking is unavoidable (3).

      The directed acyclic graph (DAG) is a tool which is useful in moving from associations to causal relationships. It is particularly useful in identifying covariates and understanding the anticipated consequences of incorporating these variables.

      The DAG is a symbolic representation of dependencies among variables. The causal Markov assumption states that the absence of a path (in the DAG) implies the absence of a relationship. In the DAG, models that represent data with fewer links are preferred to the more complex (parsimony). If two variables are simultaneously determined, the DAG could incorporate this possibility by treating the two as reflecting a common cause.

      Variables (in the DAG) can be related in three ways:

      1. Z is a common cause of X and Y
        In this case, Z needs to be controlled for.
      2. Z is a common effect of X and Y
        This is a collider. Conditioning on a collider creates a spurious relationship between X and Y. This relationship can suppress or inflate a true causal effect.
      3. Z mediates the effect of X on Y

         

      Access: 
      JoHo members
      “Pearl (2018). Confounding and deconfounding: Or, slaying the lurking variable.” - Article summary

      “Pearl (2018). Confounding and deconfounding: Or, slaying the lurking variable.” - Article summary

      Image

      Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment. If a possible confounding variable is known, it is possible to control for the possible confounding variable. Researchers tend to control for all possible variables, which leaves the possibility of controlling for the thing you are trying to measure (e.g. controlling for mediators).

      Confounding needs a causal solution, not a statistical one and causal diagrams provide a complete and systematic way of finding that solution. If all the confounders are controlled for, a causal claim can be made. However, it is not always sure whether all confounders are controlled for.

      Randomization has two clear benefits. It eliminates confounder bias and it enables the researcher to quantify his uncertainty. Randomization eliminates confounders without introducing new confounders. In a non-randomized study, confounders must be eliminated by controlling for them, although it is not always possible to know all the possible confounders.

      It is not always possible to conduct a randomized controlled experiment because of ethical, practical or other constraints. Causal estimates of observational studies can provide with provisional causality. This is causality contingent upon the set of assumptions that the causal diagram advertises.

      Confounding stands for the discrepancy between what we want to assess (the causal effect) and what we actually do assess using statistical methods. A mediator is the variable that explains the causal effect of X on Y (X>Z>Y). If you control for a mediator, you will conclude that there is no causal link, when there is.

      There are several rules for controlling for possible confounders:

      1. In a chain junction (A -> B -> C), controlling for B prevents information from A getting to C and vice versa.
      2. In a fork or confounding junction (A <- B -> C), controlling for B prevents information from A getting to C and vice versa.
      3. In a collider (A -> B <- C), controlling for B will allow information from A getting to C and vice versa.
      4. Controlling for a mediator partially closes the stream of information. Controlling for a descendant of a collider partially opens the stream of information.

      A variable that is associated with both X and Y is not necessarily a confounder.

      Access: 
      JoHo members
      “Shadish (2008). Critical thinking in quasi-experimentation.” - Article summary

      “Shadish (2008). Critical thinking in quasi-experimentation.” - Article summary

      Image

      A common element in all experiments is the deliberate manipulation of an assumed cause followed by an observation of the effects that follow. A quasi-experiment is an experiment that does not uses random assignment of participants to conditions.

      An inus condition is an insufficient but non-redundant part of an unnecessary but sufficient condition. It is insufficient, because in itself it cannot be the cause, but it is also non-redundant as it adds something that is unique to the cause. It is an insufficient cause.

      Most causal relationships are non-deterministic. They do not guarantee that an effect occur, as most causes are inus conditions, but they increase the probability that an effect will occur. To different degrees, all causal relationships are contextually dependent.

      A counterfactual is something that is contrary to fact. An effect is the difference between what did happen and what would have happened. The counterfactual cannot be observed. Researchers try to approximate the counterfactual, but it is impossible to truly observe it.

      Two central tasks of experimental design are creating a high-quality but imperfect source of counterfactual and understanding how this source differs from the experimental condition.

      Creating a good source of counterfactual is problematic in quasi-experiments. There are two tools to attempt this:

      1. Observe the same unit over time
      2. Make the non-random control groups as similar as possible to the treatment group

      A causal relationship exists if the cause preceded the effect (1), the cause was related to the effect (2) and there is no plausible alternative explanation for the effect other than the cause (3). Although quasi-experiments are flawed compared to experimental studies, they improve on correlational studies in two ways:

      1. Quasi-experiments make sure the cause precedes the effect by first manipulating the presumed cause and then observing an outcome afterwards.
      2. Quasi-experiments allows to control for some third-variable explanations. 

      Campbell’s threats to valid causal inference contains a list of common group differences in a general system of threats to valid causal inference:

      1. History
        Events occurring concurrently with treatment could cause worse performance.
      2. Maturation
        Naturally occurring changes over time, not too be confused with treatment effects.
      3. Selection
        Systematic differences over conditions in respondent characteristics.
      4. Attrition
        A loss of participants can produce artificial effects if that loss is systematically correlated with conditions.
      5. Instrumentation
        The instruments of measurement might differ or change over time.
      6. Testing
        Exposure to a test can affect subsequent scores on a test.
      7. Regression to the mean
        An extreme observation will be less extreme on the second observation.

      Two flaws of falsification are that it requires a causal claim to be clear, complete and agreed upon in all its details and it requires observational procedures to perfectly reflect the theory that is being tested.

      Access: 
      JoHo members
      “Kievit et al. (2013). Simpson’s paradox in psychological science: A practical guide.” - Article summary

      “Kievit et al. (2013). Simpson’s paradox in psychological science: A practical guide.” - Article summary

      Image

      Simpson’s paradox states that the direction of an association at the population-level may be reversed within subgroups of that population. Inadequate attention to the Simpson’s paradox may lead to faulty inferences. The Simpson’s paradox can arise because of differences in proportions on subgroup levels compared to population levels. It also states that a pattern (association) does not need to hold within a subgroup.

      The paradox is related to a lot of things, including causal inference. A generalized conclusion (e.g. extraversion causes party-going) might hold for the general population, but does not mean that this inference can be drawn at the individual level. A correlation across the population does not need to hold in an individual over time.

      In order to deal with Simpson’s paradox, the situations in which the paradox occurs frequently have to be assessed. There are several steps in preventing Simpson’s paradox:

      1. Consider when it occurs.
      2. Explicitly propose a mechanism, determining at which level it is presumed to operate.
      3. Assess whether the explanatory level of data collection aligns with the explanatory level of the proposed mechanism.
      4. Conduct an experiment to assess the association between variables.

      In the absence of strong top-down knowledge, people are more likely to make false inferences based on Simpson’s paradox.

      Access: 
      JoHo members
      Dienes (2008). Understanding psychology as a science.” – Article summary

      Dienes (2008). Understanding psychology as a science.” – Article summary

      Image

      A falsifier of a theory is any potential observation statement that would contradict the theory. There are different degrees of falsifiability, as some theories require fewer data points to be falsified than others. In other words, simple theories should be preferred as these theories require fewer data points to be falsified. The greater the universality a theory, the more falsifiable it is.

      A computational model is a computer simulation of a subject. It has free parameters, numbers that have to be set (e.g. number of neurons used in a computational model of neurons). When using computational models, more than one model will be able to fit the actual data. However, the most falsifiable model that has not been falsified by the data (fits the data) should be used.

      A theory should only be revised or changed to make it more falsifiable. Making it less falsifiable is ad hoc. Any revision or amendment to the theory should also be falsifiable. Falsifia

      Standard statistics are useful in determining probabilities based on the objective probabilities, the long-run relative frequency. This does not, however, give the probability of a hypothesis being correct.

      Subjective probability refers to the subjective degree of conviction in a hypothesis. The subjective probability is based on a person’s state of mind. Subjective probabilities need to follow the axioms of probability.

      Bayes’ theorem is a method of getting from one conditional probability (e.g. P(A|B)) to the inverse. The subjective probability of a hypothesis is called the prior. The posterior is how probable the hypothesis is to you after data collection. The probability of obtaining the data given the hypothesis is called the likelihood (e.g. P(D|H). The posterior is proportional to the likelihood times the prior. Bayesian statistics is updating the personal conviction in light of new data.

      The likelihood principle states that all the information relevant to inference contained in data is provided by the likelihood. A hypothesis having the highest likelihood does not mean that it has the highest probability. A hypothesis having the highest likelihood means that the data support the hypothesis the most. The posterior probability is not reliant on the likelihood.

      The probability distribution of a continuous variable is called the probability density distribution. It has this name, as a continuous variable has infinite possibilities and probabilities in this distribution gives the probability of any interval.

      A likelihood could be a probability or a probability density and it can also be proportional to a probability or a probability density. Likelihoods provide a continuous graded measure of support for different hypotheses.

      In Bayesian statistics (likelihood analysis), the data is fixed but the hypothesis can vary. In significance testing, the hypothesis is fixed (null hypothesis) but the data can vary. The height of the curve of the distribution for each hypothesis is relevant in calculating the likelihood. In significance testing, the tail area of

      .....read more
      Access: 
      JoHo members
      “Marewski & Olsson (2009). Formal modelling of psychological processes.” - Article summary

      “Marewski & Olsson (2009). Formal modelling of psychological processes.” - Article summary

      Image

      One way of avoiding the null hypothesis testing ritual in science is to increase the precision of theories by casting them as formal models. Rituals can be characterized by a repetition of the same action (1), fixations on special features (2), anxieties about punishment for rule violation (3) and wishful thinking (4). The null hypothesis testing ritual is mainly maintained because many psychological theories are too weak to make precise predictions besides the direction of the effect.

      A model is a simplified representation of the world that aims to explain observed data. It specifies a theory’s predictions. Modelling is especially suited for basic and applied research about the cognitive system. There are four advantages of formally specifying the theories as models:

      1. Designing strong tests of theories
        Modelling theories leads to being able to make quantitative predictions about a theory, which then leads to comparable, competing predictions between theories which allows for comparison and testing of theories.
      2. Sharpening research questions
        Null hypothesis testing allows for vague descriptions of theories and specifying the theories as models requires more precise research questions. These vague descriptions make theories difficult to test and sharpening the research questions makes it easier to test the theories.
      3. Going beyond linear theories
        Null hypothesis testing is especially applicable to simple hypotheses. The statistical tools available are used to create theories, mostly linear theories and by specifying the theory as a model, this is not necessary anymore.
      4. Using more externally valid designs to study real-world questions
        Modelling can lead to more externally valid designs, as confounds are not eliminated in the analysis, but built into the model.

      Goodness-of-fit measures cannot make the distinction between variation in the data as a result of noise or as a result of the psychological process of interest. A model can end up overfitting the data, capturing the variance of the psychological process of interest and variance as a result of random error. The ability of a model to predict new data is the generalizability. The complexity of a model refers to a model’s inherent flexibility that enables to fit diverse patterns of data. The complexity of a model is related to the degree to which a model is susceptible to overfitting. The number of free parameters (1) and how parameters are combined in the model (2) contribute to the model’s complexity.

      Increased complexity makes a model more likely to overfit while the generalizability to new data decreases. Increased complexity can also lead to better generalizability of the data, but only if the model is complex enough and not too complex. A good fit to current data does not predict a good fit to other data.

      The irrelevant specification problem refers to the difficulty bridging the gap between description of theories and formal implementations. This can lead to unintended discrepancies between theories and their formal counterparts. The Bonari paradox refers to when models become more complex and

      .....read more
      Access: 
      JoHo members
      “Dennis & Kintsch (2008). Evaluating theories.” - Article summary

      “Dennis & Kintsch (2008). Evaluating theories.” - Article summary

      Image

      A theory is a concise statement about how we believe the world to be. There are several things to look at when evaluating theories:

      1. Descriptive adequacy

      Does the theory accord with the available data?

      2. Precision and interpretability

      Is the theory described in a sufficiently precise fashion that it is easy to interpret?

      3. Coherence and consistency

      Are there logical flaws in the theory? Is it consistent with theories of other domains?

      4. Prediction and falsifiability

      Can the theory be falsified?

      5. Postdiction and explanation

      Does the theory provide a genuine explanation of existing results?

      6. Parsimony

      Is the theory as simple as possible?

      7. Originality

      Is the theory new or a restatement of an old theory?

      8. Breadth

      Does the theory apply to a broad range of phenomena?

      9. Usability

      Does the theory have applied implications?

      10. Rationality

      Are the claims of the theory reasonable?

      Postdiction refers to predictions under controlled conditions. 

      Access: 
      JoHo members
      "Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

      "Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

      Image

      There are four procedures to present the implications of a correlation in terms of our ability to use the correlations to make successful predictions:

      1. Binomial effect size display (dichotomous)
        This illustrates the practical consequences of using correlations to make decisions. It can show how many successful and unsuccessful predictions can be made on the basis of a correlation. It uses the following formula:
      2. Binomial effect size display can be used to translate a validity correlation into an intuitive framework. However, it frames the situation in terms of an ‘equal proportions’ situation.
      3. Taylor-Russell tables (dichotomous)
        These tables inform selection decisions and provide a probability that a prediction will result in a successful performance on a criterion. The size of the validity coefficient (1), selection proportion (2) and the base rate (3) are required for the tables.
      4. Utility analysis
        This frames validity in terms of a cost-benefit analysis of test use.
      5. Analysis of test sensitivity and test specificity
        A test is evaluated in terms of its ability to produce correct identifications of a categorical difference. This is useful for tests that are designed to detect a categorical difference.

      Validity correlations can be evaluated in the context of a particular area of research or application.

      A nomological network refers to the interconnections between a construct and other related construct. There are several methods to evaluate the degree to which measures show convergent and discriminate associations:

      1. Focusses associations
        This method focusses on a few highly relevant criterion variables. This can make use of validity generalization.
      2. Sets of correlations
        This method focusses on a broad range of criterion variables and computes the correlations between the test and many criterion variables. The degree to which the pattern of correlations ‘makes sense’ given the conceptual meaning of the construct is evaluated.
      3. Multitrait-multimethod matrices
        This method obtains measures of several traits, each measured through several methods. The purpose is to set clear guidelines for evaluating convergent and discriminant validity evidence. This is done by evaluating trait variance and method variance. Evidence of convergent validity is represented by monotrait-heteromethod correlations.

      The correlations between measures are called validity coefficients. Validity generalization is a process of evaluating a test’s validity coefficients across a large set of studies. Validity generalization studies are intended to evaluate the predictive utility of test’s scores across a range of settings, times and situations. These studies can reveal the general level of predictive validity (1), reveal the degree of variability among the smaller individual studies (2) and it can reveal the source of the variability among studies (3).

       

      Method used to measure the two constructs

      .....read more
      Access: 
      JoHo members
      “Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

      “Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

      Image

      Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses (e.g. to what degree does it measure what it is supposed to measure). Items of a test itself cannot be valid or invalid, only the interpretations can be valid or invalid.

      Validity is a property of the interpretation (1), it is a matter of degree (2) and the validity of a test’s interpretation is based on evidence and theory (3). Validity influences the accuracy of our understanding of the world, as research conclusions are based on the validity of a measure.

      Construct validity refers to the degree to which test scores can be interpreted as reflecting a particular psychological construct. Face validity refers to the degree to which a measure appears to be related to a specific construct, in the judgement of nonexperts, test takers and representatives of the legal system. Convergent validity refers to the degree to which test scores are correlated with tests of related constructs.

      Validity is important for the accuracy of our understanding of the world (1), decisions on societal level (e.g. laws based on ‘invalid’ research) and decisions on individual level (3) (e.g. college admissions).

      The validity of test score interpretation depends on five types of evidence: test content (1), consequences of use (2), association with other variables (3), response processes (4) and internal structure (5).

      Test content can be seen as content validity. There are two threats to content validity:

      1. A test including construct-irrelevant content
        The inclusion of content that is not relevant to the construct of interest reduces validity.
      2. Construct underrepresentation
        A test should include the full range of content that is relevant to the construct.

      Construct underrepresentation can be constrained by practical issues (e.g. time of a test). The internal structure of a test refers to the way the parts of a test are related to each other. There should be a proper match between the actual internal structure of a test and the internal structure a test should have. The internal structure can be examined through the correlations among items in the test and among the subscales in the test. This can be done using factor analysis.

      Factor analysis helps to clarify the number of factors within a set of items (1), reveals the associations among the factors within a multidimensional test (2) and identifies which items are linked to which factors (3). Factors are dimensions of the test.

      Response processes refers to the match between the psychological processes that respondents actually use when completing a measure and the processes that they should use.

      In order to assess validity, the association with other variables (e.g. happiness and self-esteem) should be assessed. If a positive relationship is to be expected between two variables, then, in order for the interpretation of a measure to be valid, this relationship needs to exist. The association with other variables involves the match between a measure’s actual associations with other measures

      .....read more
      Access: 
      JoHo members
      “Furr & Bacharach (2014). Scaling.” - Article summary

      “Furr & Bacharach (2014). Scaling.” - Article summary

      Image

      Scaling refers to assigning numerical values to psychological attributes. Individuals in a group should be similar to each other in the regard of sharing a psychological feature. There are rules to follow in order to put people in categories:

      1. People in a category must be identical with respect to the feature that categorizes the group (e.g. hair colour).
      2. The groups must be mutually exclusive
      3. The groups must be exhaustive (e.g. everyone in the population can fall into a category).

      Each person should fall into one category and not more than one. If numerals are used to indicate order, then the numerals serve as labels indicating rank. If numerals have the property of quantity, then they convey information about the exact amounts of an attribute. Units of measurement are standardized quantities. The three levels of groups are identity (1), order (2) and quantity (3).

      There are two possible meanings of the number zero. It can be the absolute zero (1) (e.g. a reaction time of 0ms) or it can be an arbitrary quantity of an attribute (2). This is called the arbitrary zero. The arbitrary zero does not represent the absence of anything, rather, it is a point on a scale to measure that feature. A lot of psychological attributes use the arbitrary zero (e.g. social skill, self-esteem, intelligence).

      An unit of measurement might be arbitrary because unit size may be arbitrary (1), some units of measurement are not tied to any one type of object (2) (e.g. centimetres can measure anything with a spatial property) and some units of measurement can be used to measure different features of the same object (3) (e.g. weight and length).

      One assumption of counting is additivity. This requires that unit size does not change. This would mean that an increase of one point is equal at every point. This is not always the case, as an IQ test asks increasingly difficult questions to increase one point of IQ. Therefore, the unit size changes.

      Counting only qualifies as measurement if it reflects the amount of some feature or attribute of an object. There are four scales of measurement:

      1. Nominal scale
        This is used to identify groups of people who share a common attribute that is not shared by people in other groups (e.g. ‘0’ for male and ‘1’ for female). It assesses the principle of identity.
      2. Ordinal scale
        This is used to rank people according to some attribute. It is used to make rankings within groups and cannot be used to make comparisons between groups, as this would require quantity. It assesses the principle of identity and order.
      3. Interval scale
        This is a scale that is used to represent quantitative difference between people. It assesses the principle of identity, order and quantity.
      4. Ratio scales
        This is a scale that has an absolute zero point. It satisfies the principle of identity, order, quantity and has an absolute zero.

      Psychological attributes might not be able to be put

      .....read more
      Access: 
      JoHo members
      “Mitchell & Tetlock (2017). Popularity as a poor proxy for utility.” - Article summary

      “Mitchell & Tetlock (2017). Popularity as a poor proxy for utility.” - Article summary

      Image

      Before the existence of the IAT, indirect measures of prejudice were developed in order to overcome response bias and psychologists began to examine automatic processes that may contribute to contemporary forms of prejudice. After the existence of the IAT, implicit prejudice became the same thing as widespread unconscious prejudices that are more difficult to spot and regularly infect intergroup interactions.

      The IAT has been used throughout different areas of society and is a very popular mean of describing implicit prejudice. Prejudice extends beyond negative or positive associations with an attitude object to include motivational and affective reactions to in-group and out-group members. IAT does not have a strong predictive validity. The IAT score is a poor predictor of discriminating behaviour.

      There are no guidelines for how to interpret the scores on the IAT. This is referred to as the score interpretation problem. The test scores are dependent on arbitrary thresholds and it is not possible to link them to behaviour outcomes.

      The focus of the IAT on implicit gender stereotypes is (not implicit sexism) is problematic because implicit measures of gender stereotypes are not a good predictor of discriminatory behaviour (1), only a very limited set of implicit gender stereotypes has been examined (2) and no explanation is provided about how conflicts between automatic evaluative associations and automatic semantic associations are resolved (3).

      Individuating information, getting personal information about a certain group, exerts effects to counter explicit biases. It does the same with regard to implicit biases.

      Subjective evaluation criteria are not associated with discrimination. Therefore, the solution that only objective measures must be used in decision making to counter (implicit) bias is unnecessary. This is referred to as the subjective judgement problem.

      Access: 
      JoHo members
      “LeBel & Peters (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice.” - Article summary

      “LeBel & Peters (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice.” - Article summary

      Image

      Psi refers to the anomalous retroactive influence of future events on an individual’s current behaviour. There are three important deficiencies in modal research practice: an overemphasis on conceptual replication (1), insufficient attention to verifying the integrity of measurement instruments and experimental procedures (2) problems with the implementation of null hypothesis testing (3).

      The interpretation bias refers to a bias towards interpretations of data that favour a researcher’s theory. A potential consequence of this is an increased risk of reported false positives and a disregard of true negatives. The knowledge system of psychology consists of theory relevant beliefs (1), this is about the mechanisms that produce behaviour and method-relevant beliefs (2), this is about the procedures through which data is obtained.

      Deficiencies in modal research practice bias systematically bias the interpretation of confirmatory data as theory relevant (1) and the interpretation of disconfirmatory data as method relevant (2).

      Central beliefs are beliefs on which many other beliefs depend. Conservatism refers to choosing the theoretical explanation consistent with the data that requires the least amount of restructuring of the existing knowledge system.

      If method-relevant beliefs are central in a knowledge system, it becomes more difficult to blame methodology related errors for disconfirmatory results. If theory-relevant beliefs become central, it poses the threat of becoming a logical assumption. A hypothesis under test should be described in a way that is falsifiable and not logically necessary.

      An overemphasis on conceptual replication at the expense of direct replication weakens method-relevant beliefs in the knowledge system. A statistical significant result is often followed by a conceptual replication. A failure of the conceptual replication leads to the question whether the negative result was due to the falsity of the underlying theory or to methodological flaws introduced by changes in conceptual replication.

      The failure to verify the integrity of measurement instruments and experimental procedures weakens method-relevant beliefs and leads to ambiguity in the interpretation of results. The null hypothesis can be viewed as a straw man, as two identical populations are almost not possible. Basing theory choices on null hypothesis significance tests detaches theories from the broader knowledge system.

      In order to overcome the flaws of the modal research practice, method-relevant beliefs must be strengthened. There are three ways in order to do this:

      1. Stronger emphasis on direct replication
        A direct replication leads to greater confidence in the results. They are necessary to ensure that an effect is real.
      2. Verify integrity of methodological procedures
        Method-relevant beliefs are more difficult to reject if the integrity of methodological procedures are verified and this leads to a less ambiguous interpretation of results. This includes routinely checking the internal consistency of the scores of any measurement instrument that is used. This includes the use of objective markers of instruction comprehension.
      3. Use stronger forms of NHST
        The null hypothesis should be a theoretically derived point value of the focal variable, instead
      .....read more
      Access: 
      JoHo members

      Scientific & Statistical Reasoning – Article summary (UNIVERSITY OF AMSTERDAM)

      Borsboom & Cramer (2013). Network analysis: An integrative approach to the structure of psychopathology.

      Borsboom & Cramer (2013). Network analysis: An integrative approach to the structure of psychopathology.

      Image

      The disease model states that problems are symptoms of a small set of underlying disorders. This explains observable clinical symptoms by a small set of latent variables (e.g. depression). A network is a set of elements (nodes) connected through a set of relations. In network models, disorders are conceptualized as systems of causally connected symptoms rather than effects of a latent disorder.

      Mental disorders cannot be identified independently of their symptoms. In medicine, the medical condition can be separated from the symptoms. In psychology, this is not possible. In order to separate this, it must be possible that a person has symptoms without the disorder (e.g. depression without feeling down is not possible). In mental disorders, it is likely that there is symptom-symptom causation. One symptom causes another symptom and this leads to a mental disorder.

      With network systems, it might be unclear where one disorder starts and another stops. The boundaries between disorders become unclear. Network models might change treatment, as the treatment is then no longer aimed at the disorder but rather at the symptoms and the causal relationship between the symptoms.

      Networks in psychopathology can be created by using data on symptom endorsement frequencies (e.g. looking at correlations between symptoms) (1), assess the relationship between symptoms rated by clinicians and patients (2) and use the information in the diagnostic systems (3).

      In networks, any node can reach another node in only a few steps. This is called the small world property. The DSM attempts to be neutral, theoretically, but makes claims about causal relationships between the disorders.

      Asking experts on how nodes are related (e.g. clinicians and symptoms of a disorder) is called perceived causal relations scaling

      Extended psychopathology systems refers to network systems in which the network is not isolated in a single individual but spans across multiple individuals. This would mean that one symptom in one person could cause a symptom in another person. These networks can be used to review what the interaction is between symptoms of different people in different social situations.

      Association networks show what the strength of the correlations between symptoms is. This gives an indication of different disorders, as the symptoms in disorder A are more correlated with the other symptoms in disorder A than with the symptoms of disorder B.

      A partial correlation network, also called a concentration network, shows the partial correlations between symptoms. This can be used to be a bit more certain about the causal relationship of two nodes as it rules out some third-variable explanations. Concentration graphs can be used to assess which pathways between symptoms appear common in a disorder.

      Association and concentration graphs provide information about the causal relationship between nodes but it does not provide information about the causal direction of the network. Directed networks give information about the causal relationship between nodes. This is usually represented in a DAG. In order to generate statements

      .....read more
      Access: 
      JoHo members
      Borsboom et al. (2016). Kinds versus continua: a review of psychometric approaches to uncover the structure of psychiatric constructs.

      Borsboom et al. (2016). Kinds versus continua: a review of psychometric approaches to uncover the structure of psychiatric constructs.

      Image

      The danger of using a dichotomous system when it comes to mental disorders is not treating people who require treatment or treating people who do not require treatment. It is unclear where the boundary between disorder and no disorder is and this is not progressive for science and research as a whole.

      Equivalence classes refers to sets of individuals who are exchangeable with respect to the attribute of interest. Measurement starts with categorization. The continuity hypothesis states that in between any two positions lies a third that can be empirically confirmed (1) and that there are no gaps in the continuum (2).

      In a continuous interpretation, the distinction between people that have a disorder and do not have a disorder depends on the imposition of a cut-off score that does not reflect a gap in the inherent attribute itself (e.g. difference between average length and being tall). However, there is no way of measuring how depressed someone is (i.e. there is no scale).

      Local independence states that given a specific level of a latent variable, the observed variables are uncorrelated (e.g. guilt and suicide ideation is uncorrelated in healthy individuals).

      The form of the latent structure can be assessed by inspecting particular consequences of the model for specific statistical properties of items (1) and on the basis of the global fit measures that allow one to compare whether a model with a categorical latent structure fits better than a model with a continuous latent structure on the observed data.(2).

      Taxometrics refers to inspecting particular consequences of the model for specific statistical properties of items. If an underlying construct is continuous, then the covariance between any two observed variables should be the same regardless of the exact range. This analysis can be done by choosing a variable and denoting it as the index variable. In other words, the covariance between A and B should be the same on different levels of index variable C if the index variable is continuous. If it is categorical, then the covariance between A and B should differ on different levels of the index variable and be 0 at the ‘no disorder’ level of the index variable.

      ALTERNATIVE LATENT VARIABLE MODELS
      The factor mixture models subdivide the population into different categories but there is a continuous scale within categories. It is a model in which each category is characterized by its own common factor model. It is a multi-group common factor model in which group membership is unknown. The class variable takes the place of an observed grouping variable.  

      Grade of membership (GoM) models can integrate continuous features. This continuous variation concerns group membership. This model allows individuals to be members of multiple classes at the same time but to different degrees. This model is useful if there is no clear distinction between classes.

      In a network model, modes are causally related to each other and this network

      .....read more
      Access: 
      JoHo members
      "Cohen on item response theory” – Article summary

      "Cohen on item response theory” – Article summary

      Image

      The item response theory (latent trait theory) provides a way to model the probability that a person with X ability will be able to perform at level of Y. It models the probability that a person with X amount of a personality trait will exhibit Y amount of that trait on a test that is supposed to measure is. This theory focusses on the relationship between a testtaker’s response to an individual test item and that testtaker’s standing on the construct being measured.

      Discrimination signifies the degree to which an item differentiates among people with higher or lower levels of the trait. Items can be given different weight in the item response theory. In classical test theory, there are no assumptions about the frequency distribution of test scores.

      There are several assumptions of the item response theory:

      1. Unidimensionality
        This assumption states that the set of items measures a single continuous latent construct. This assumption does not neglect minor dimensions, although assumes one dominant dimension underlying the structure.
      2. Local independence
        This assumption states that there is a systematic relationship between all of the test items and this relationship has to do with the level of a person on the construct of interest. If this assumption is met, then the differences in responses to items are reflective of differences in the underlying trait or ability.
      3. Monotonicity
        This assumption states that the probability of endorsing or selecting an item response indicative of higher levels of the construct should increase if as the level of the underlying construct increases.

      Local dependence refers to the fact that items can be dependent on another factor than what the test as a whole is measuring. Locally dependent items have higher inter-item correlations and it may be controlled for by combining the responses to a set of locally dependent items into a separate subscale within the test. The theta level refers to the level of the underlying construct.

      The probabilistic relationship between a testtaker’s response to a test item and that testtaker’s level on the latent construct being measured can be expressed in the item characteristic curve (ICC).

      IRT enables test users to better understand the range of the underlying construct for which an item is most useful in discriminating among groups of testtakers. This can be done using the information function.

      Information refers to the precision of measurement.

      Items with low information prompt the test developer to consider the possibility that the content of the item does not match the construct measured by the other items (1), the item is poorly worded (2), the item is too complex (3), the placement of the item in the test is out of context (4) or cultural factors may be operating to weaken the item’s ability to discriminate between groups (5).

      Access: 
      JoHo members
      Cohen on the science of psychological measurement” - Article summary

      Cohen on the science of psychological measurement” - Article summary

      Image

      A utility analysis refers to a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment. It is done in order to see whether the benefits of using a test outweigh the costs of that test. The objective of a utility analysis determines the required information (1) and the specific methods that have to be used (2).

      One method of utility analysis is expectancy data. This is converting the test data to an expectancy table. It can provide a likelihood that a test taker will score within some interval of scores on a criterion measure. Taylor-Russel tables provide an estimate of the extent to which inclusion of a particular test in the selection system will improve selection. It gives an increase in base rate of successful performance that is associated with a particular level of criterion-related validity.

      The selection ratio is a numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired. The base rate refers to the percentage of people hired under the existing system.

      Top-down selection is a process of awarding available positions to applicants whereby the highest scorer is awarded the first position. A downside of top-down selection is that this may lead to unintended discriminatory effects.

      Hit

      A correct classification

      Miss

      An incorrect classification

      Hit rate

      The proportion of people that an assessment tool accurately identifies as possessing or exhibiting a particular trait, ability, behaviour or attribute.

      Miss rate

      The proportion of people that an assessment tool inaccurately describes as possessing or exhibiting a particular trait, ability, behaviour or attribute.

      False positive

      A specific type of miss whereby an assessment tool falsely indicates that the test taker possesses a trait.

      False negative

      .....read more
      Access: 
      JoHo members
      Coyle (2015). Introduction to qualitative psychological research.” – Article summary

      Coyle (2015). Introduction to qualitative psychological research.” – Article summary

      Image

      Qualitative research refers to the collection and analysis of non-numerical data through a psychological lens in order to provide rich descriptions and possibly explanations of people’s meaning-making, how they make sense of the world and how they experience particular events.

      Epistemology refers to the theory of knowledge regarding what we can know and how we can know. Ontology refers to the assumptions made about the nature of being, existence or reality. Different research approaches are associated with different epistemologies.

      Positivism holds that there is a direct correspondence between the state in the world and our perceptions through our senses, provided that our perception is now skewed by factors that could damage that correspondence (e.g. interest in a topic). Empiricism states that our knowledge must arise from the collection and categorization of our sense perceptions of the world. Hypothetico-deductivism states that theories should be exposed to attempts of falsifications, rather than attempts of verification.

      The classical scientific method assumed that reality exists independently of the observer and that reality can be observed through research. It assumes that any existing psychological dimension could be measured with precision.

      ‘Small q’ qualitative research is a structured form of content analysis, which categorizes and quantifies qualitative data systematically. ‘Big Qqualitative research refers to the use of qualitative techniques within a qualitative paradigm which rejects notions of objective reality or universal truth.

      Nomothetic research seeks generalizable findings that uncover laws to explain objective phenomena and idiographic research seeks to examine individual cases in detail to understand an outcome. Phenomenological methods focus on obtaining detailed descriptions of experience as understood by those who have that experience in order to discern its essence.

      Critical realism states that reality exists independent of the observer, although we cannot know that reality with certainty. Social constructionism has a critical stance towards assumptions about the world. It states that the way we understand the world and ourselves are built up through social processes. This is not fixed. Relativism states that reality is dependent on the ways we come to know it.  

      Reflexivity refers to the acknowledgement by the researcher of the role played by their interpretative framework in creating their analytic account.

      Sensitivity to context refers to whether the context of the theory is made clear. Commitment refers to prolonged engagement with the research topic. Rigour refers to the completeness of the data collection and analysis. Coherence refers to the quality of the research narrative and the fit between the research question and the adopted philosophical perspective. Impact and importance refers to the theoretical, practical and socio-cultural impact of the study.

      There are several evaluative criteria for qualitative research:

      Qualitative and quantitative

      Qualitative research

      .....read more
      Access: 
      JoHo members
      Dienes (2008). Understanding psychology as a science.” – Article summary

      Dienes (2008). Understanding psychology as a science.” – Article summary

      Image

      A falsifier of a theory is any potential observation statement that would contradict the theory. There are different degrees of falsifiability, as some theories require fewer data points to be falsified than others. In other words, simple theories should be preferred as these theories require fewer data points to be falsified. The greater the universality a theory, the more falsifiable it is.

      A computational model is a computer simulation of a subject. It has free parameters, numbers that have to be set (e.g. number of neurons used in a computational model of neurons). When using computational models, more than one model will be able to fit the actual data. However, the most falsifiable model that has not been falsified by the data (fits the data) should be used.

      A theory should only be revised or changed to make it more falsifiable. Making it less falsifiable is ad hoc. Any revision or amendment to the theory should also be falsifiable. Falsifia

      Standard statistics are useful in determining probabilities based on the objective probabilities, the long-run relative frequency. This does not, however, give the probability of a hypothesis being correct.

      Subjective probability refers to the subjective degree of conviction in a hypothesis. The subjective probability is based on a person’s state of mind. Subjective probabilities need to follow the axioms of probability.

      Bayes’ theorem is a method of getting from one conditional probability (e.g. P(A|B)) to the inverse. The subjective probability of a hypothesis is called the prior. The posterior is how probable the hypothesis is to you after data collection. The probability of obtaining the data given the hypothesis is called the likelihood (e.g. P(D|H). The posterior is proportional to the likelihood times the prior. Bayesian statistics is updating the personal conviction in light of new data.

      The likelihood principle states that all the information relevant to inference contained in data is provided by the likelihood. A hypothesis having the highest likelihood does not mean that it has the highest probability. A hypothesis having the highest likelihood means that the data support the hypothesis the most. The posterior probability is not reliant on the likelihood.

      The probability distribution of a continuous variable is called the probability density distribution. It has this name, as a continuous variable has infinite possibilities and probabilities in this distribution gives the probability of any interval.

      A likelihood could be a probability or a probability density and it can also be proportional to a probability or a probability density. Likelihoods provide a continuous graded measure of support for different hypotheses.

      In Bayesian statistics (likelihood analysis), the data is fixed but the hypothesis can vary. In significance testing, the hypothesis is fixed (null hypothesis) but the data can vary. The height of the curve of the distribution for each hypothesis is relevant in calculating the likelihood. In significance testing, the tail area of

      .....read more
      Access: 
      JoHo members
      Dienes (2011). Bayesian versus orthodox statistics: Which side are you on?” – Article summary

      Dienes (2011). Bayesian versus orthodox statistics: Which side are you on?” – Article summary

      Image

      Probabilities are long-run relative frequencies for the collective, rather than an individual. Probabilities do not apply to theories, as individual theories are not collectives. Therefore, the null hypothesis cannot be assigned a probability. A p-value does not indicate the probability of the null hypothesis being true.

      Power or a p-value is not necessary in Bayesian statistics, as a degree of plausibility can be assigned to theories and the data tells us how to adjust these plausibilities. It is only needed to determine a factor by which we should change the probability of different theories given the data.

      The probability of a hypothesis being true is the prior probability (P(H)). The probability of a hypothesis given the data is the posterior probability (P(H|D)). The probability of obtaining the exact data given the hypothesis is the likelihood (P(D|H)). Therefore, the posterior probability is the likelihood times the prior probability.

      The likelihood principle states that all information relevant to inference contained in data is provided by the likelihood. In a distribution, the p-value is the area under the curve at a certain point. The likelihood is the height of the distribution at a certain point.

      The p-value is influenced by the stopping rule (1), whether or not the test is post-hoc (2) and how many other tests have been conducted (3). These things do not influence the likelihood.

      The Bayes factor is the ratio of the likelihoods. The Bayes factor is driven to 0 if the null hypothesis is true, whereas the p-values fluctuate randomly if the null hypothesis is true and data-collection continues. The Bayes factor is slowly driven towards the ‘truth’. Therefore, the Bayes factor gives a notion of sensitivity. It distinguishes evidence that there is no relevant effect from no evidence of a relevant effect. It can be used to determine the practical significance of an effect.

      Adjusting conclusions according to when the hypothesis was thought of would introduce irrelevancies in inference and therefore, the timing of the hypothesis is irrelevant in Bayesian statistics. In assessing evidence for or against a theory, all relevant evidence should be taken into account and the evidence should not be cherry picked.

      Rationality refers to having sufficient justification for one’s beliefs. Critical rationalism is a matter of having one’s beliefs subjected to critical scrutiny. Irrational beliefs are beliefs not subjected to sufficient criticism.

      It is possible to have a uniform (1), normal (2) and half-normal (3) distribution. In a uniform distribution, all values are equally likely. In a normal distribution, one value is most likely given the theory and a half-normal distribution is a normal distribution centred on zero with only one tail. It predicts a theory into one direction but smaller effects are more likely than larger effects.

      There are several weaknesses of the Bayesian approach:

      1. Bayesian analyses force people to specify predictions in detail
      2. Bayesian analyses do not
      .....read more
      Access: 
      JoHo members
      Eaton et al. (2014). Toward a model-based approach to the clinical assessment of personality psychopathology.” – Article summary

      Eaton et al. (2014). Toward a model-based approach to the clinical assessment of personality psychopathology.” – Article summary

      Image

      Types refers to categories and traits refers to dimensions. In order to determine where an individual falls on a trait, the measure needs to measure the full range of the trait dimension. There are several models:

      1. Latent trait model
        This model assumes that there is one or more underlying continuous distributions. There are no locations across the continuum that are unoccupied. The dimensional scores of this model can be changed to percentiles in order to facilitate interpretation.
      2. Latent class model
        This model assumes a latent group ( class) structure for the distribution. There are a finite number of latent classes. They are mutually exclusive and nominal. It assumes conditional independence.
      3. Hybrid model (factor mixture model)
        This model combines the continuous aspect of the latent trait model with the discrete aspects of the latent class model. This model assumes that there are classes, but there are individual differences in the classes. The distribution within a class is continuous.

      Discrimination is a measure of how strongly the item taps into the latent trait. Conditional independence states that interitem correlations solely reflect class membership.

      Access: 
      JoHo members
      Foster (2010). Causal inference and developmental psychology.” – Article summary

      Foster (2010). Causal inference and developmental psychology.” – Article summary

      Image

      The problem of causality is difficult in developmental psychology, as many questions of that field regard factors that a person cannot be randomly assigned to (e.g. single parent family). Causal inference refers to the study and measurement of cause-and-effect relationships outside of random assignment.

      In the current situation in developmental psychology, it is unclear among researchers whether causality can be implied and why. Causal inferences are necessary for the goals of developmental psychology because causal inferences can improve the lives of people (1), can help distinguish between associations and causal claims for laypeople (2) and causal thinking is unavoidable (3).

      The directed acyclic graph (DAG) is a tool which is useful in moving from associations to causal relationships. It is particularly useful in identifying covariates and understanding the anticipated consequences of incorporating these variables.

      The DAG is a symbolic representation of dependencies among variables. The causal Markov assumption states that the absence of a path (in the DAG) implies the absence of a relationship. In the DAG, models that represent data with fewer links are preferred to the more complex (parsimony). If two variables are simultaneously determined, the DAG could incorporate this possibility by treating the two as reflecting a common cause.

      Variables (in the DAG) can be related in three ways:

      1. Z is a common cause of X and Y
        In this case, Z needs to be controlled for.
      2. Z is a common effect of X and Y
        This is a collider. Conditioning on a collider creates a spurious relationship between X and Y. This relationship can suppress or inflate a true causal effect.
      3. Z mediates the effect of X on Y

         

      Access: 
      JoHo members
      “Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

      “Furr & Bacharach (2014). Estimating practical effects: Binomial effect size display, Taylor-Russell tables, utility analysis and sensitivity / specificity.” – Article summary

      Image

      Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses (e.g. to what degree does it measure what it is supposed to measure). Items of a test itself cannot be valid or invalid, only the interpretations can be valid or invalid.

      Validity is a property of the interpretation (1), it is a matter of degree (2) and the validity of a test’s interpretation is based on evidence and theory (3). Validity influences the accuracy of our understanding of the world, as research conclusions are based on the validity of a measure.

      Construct validity refers to the degree to which test scores can be interpreted as reflecting a particular psychological construct. Face validity refers to the degree to which a measure appears to be related to a specific construct, in the judgement of nonexperts, test takers and representatives of the legal system. Convergent validity refers to the degree to which test scores are correlated with tests of related constructs.

      Validity is important for the accuracy of our understanding of the world (1), decisions on societal level (e.g. laws based on ‘invalid’ research) and decisions on individual level (3) (e.g. college admissions).

      The validity of test score interpretation depends on five types of evidence: test content (1), consequences of use (2), association with other variables (3), response processes (4) and internal structure (5).

      Test content can be seen as content validity. There are two threats to content validity:

      1. A test including construct-irrelevant content
        The inclusion of content that is not relevant to the construct of interest reduces validity.
      2. Construct underrepresentation
        A test should include the full range of content that is relevant to the construct.

      Construct underrepresentation can be constrained by practical issues (e.g. time of a test). The internal structure of a test refers to the way the parts of a test are related to each other. There should be a proper match between the actual internal structure of a test and the internal structure a test should have. The internal structure can be examined through the correlations among items in the test and among the subscales in the test. This can be done using factor analysis.

      Factor analysis helps to clarify the number of factors within a set of items (1), reveals the associations among the factors within a multidimensional test (2) and identifies which items are linked to which factors (3). Factors are dimensions of the test.

      Response processes refers to the match between the psychological processes that respondents actually use when completing a measure and the processes that they should use.

      In order to assess validity, the association with other variables (e.g. happiness and self-esteem) should be assessed. If a positive relationship is to be expected between two variables, then, in order for the interpretation of a measure to be valid, this relationship needs to exist. The association with other variables involves the match between a measure’s actual associations with other measures

      .....read more
      Access: 
      JoHo members
      "Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

      "Furr & Bacharach (2014). Estimating and evaluating convergent and discriminant validity evidence.” - Article summary

      Image

      There are four procedures to present the implications of a correlation in terms of our ability to use the correlations to make successful predictions:

      1. Binomial effect size display (dichotomous)
        This illustrates the practical consequences of using correlations to make decisions. It can show how many successful and unsuccessful predictions can be made on the basis of a correlation. It uses the following formula:
      2. Binomial effect size display can be used to translate a validity correlation into an intuitive framework. However, it frames the situation in terms of an ‘equal proportions’ situation.
      3. Taylor-Russell tables (dichotomous)
        These tables inform selection decisions and provide a probability that a prediction will result in a successful performance on a criterion. The size of the validity coefficient (1), selection proportion (2) and the base rate (3) are required for the tables.
      4. Utility analysis
        This frames validity in terms of a cost-benefit analysis of test use.
      5. Analysis of test sensitivity and test specificity
        A test is evaluated in terms of its ability to produce correct identifications of a categorical difference. This is useful for tests that are designed to detect a categorical difference.

      Validity correlations can be evaluated in the context of a particular area of research or application.

      A nomological network refers to the interconnections between a construct and other related construct. There are several methods to evaluate the degree to which measures show convergent and discriminate associations:

      1. Focusses associations
        This method focusses on a few highly relevant criterion variables. This can make use of validity generalization.
      2. Sets of correlations
        This method focusses on a broad range of criterion variables and computes the correlations between the test and many criterion variables. The degree to which the pattern of correlations ‘makes sense’ given the conceptual meaning of the construct is evaluated.
      3. Multitrait-multimethod matrices
        This method obtains measures of several traits, each measured through several methods. The purpose is to set clear guidelines for evaluating convergent and discriminant validity evidence. This is done by evaluating trait variance and method variance. Evidence of convergent validity is represented by monotrait-heteromethod correlations.

      The correlations between measures are called validity coefficients. Validity generalization is a process of evaluating a test’s validity coefficients across a large set of studies. Validity generalization studies are intended to evaluate the predictive utility of test’s scores across a range of settings, times and situations. These studies can reveal the general level of predictive validity (1), reveal the degree of variability among the smaller individual studies (2) and it can reveal the source of the variability among studies (3).

       

      Method used to measure the two constructs

      .....read more
      Access: 
      JoHo members
      “Furr & Bacharach (2014). Scaling.” - Article summary

      “Furr & Bacharach (2014). Scaling.” - Article summary

      Image

      Scaling refers to assigning numerical values to psychological attributes. Individuals in a group should be similar to each other in the regard of sharing a psychological feature. There are rules to follow in order to put people in categories:

      1. People in a category must be identical with respect to the feature that categorizes the group (e.g. hair colour).
      2. The groups must be mutually exclusive
      3. The groups must be exhaustive (e.g. everyone in the population can fall into a category).

      Each person should fall into one category and not more than one. If numerals are used to indicate order, then the numerals serve as labels indicating rank. If numerals have the property of quantity, then they convey information about the exact amounts of an attribute. Units of measurement are standardized quantities. The three levels of groups are identity (1), order (2) and quantity (3).

      There are two possible meanings of the number zero. It can be the absolute zero (1) (e.g. a reaction time of 0ms) or it can be an arbitrary quantity of an attribute (2). This is called the arbitrary zero. The arbitrary zero does not represent the absence of anything, rather, it is a point on a scale to measure that feature. A lot of psychological attributes use the arbitrary zero (e.g. social skill, self-esteem, intelligence).

      An unit of measurement might be arbitrary because unit size may be arbitrary (1), some units of measurement are not tied to any one type of object (2) (e.g. centimetres can measure anything with a spatial property) and some units of measurement can be used to measure different features of the same object (3) (e.g. weight and length).

      One assumption of counting is additivity. This requires that unit size does not change. This would mean that an increase of one point is equal at every point. This is not always the case, as an IQ test asks increasingly difficult questions to increase one point of IQ. Therefore, the unit size changes.

      Counting only qualifies as measurement if it reflects the amount of some feature or attribute of an object. There are four scales of measurement:

      1. Nominal scale
        This is used to identify groups of people who share a common attribute that is not shared by people in other groups (e.g. ‘0’ for male and ‘1’ for female). It assesses the principle of identity.
      2. Ordinal scale
        This is used to rank people according to some attribute. It is used to make rankings within groups and cannot be used to make comparisons between groups, as this would require quantity. It assesses the principle of identity and order.
      3. Interval scale
        This is a scale that is used to represent quantitative difference between people. It assesses the principle of identity, order and quantity.
      4. Ratio scales
        This is a scale that has an absolute zero point. It satisfies the principle of identity, order, quantity and has an absolute zero.

      Psychological attributes might not be able to be put

      .....read more
      Access: 
      JoHo members
      “Gigerenzer & Marewski (2015). Surrogate science: The idol of a universal method for scientific inference.” - Article summary

      “Gigerenzer & Marewski (2015). Surrogate science: The idol of a universal method for scientific inference.” - Article summary

      Image

      Good science requires statistical tools and informed judgement about what model to construct, what hypotheses to test and what tools to use.

      There is no universal method of scientific inference, but, rather, a toolbox of useful scientific methods. Besides that, the danger of Bayesian statistics is that this will become a new universal method of statistics. Lastly, statistical methods are not simply applied to a discipline, they change the discipline itself.

      In natural sciences, the probabilistic revolution shaped theorizing. In social sciences, it led to scientists mechanizing scientists’ inferences. Inference revolution refers to the idea that inference from sample to population was considered the most important part of research. This revolution led to a dismissive attitude towards replication.

      There are three meanings of significance:

      1. Mere convention
        This means that it is convenient for researchers to use 5% as a standard level of significance.
      2. Alpha level
        This means that significance refers to the long-term relative frequency of making a type-I error.
      3. Exact level of significance
        This is the exact level of significance and is used in null hypothesis testing using a nil and not a null of zero difference.

      There are three interpretations of probability:

      1. A relative frequency
        This is a long-term relative frequency.
      2. Propensity
        This is the physical design of an object (e.g. dice)
      3. Reasonable degree of subjective belief
        This is the degree an individual believes in somethings.

      Bayesian statistics should not be used in an automatic way, like frequentism. Objections to the use of Bayes rule are that frequency-based prior probabilities do not exist (1), that the set of hypotheses needed for the prior probability distribution is not known (2) and that researchers’ introspection does not confirm the calculation of probabilities (3).

      Fishing expeditions refers to the idea that hypothesis finding is the same as hypothesis testing, characterised by using a lot of p-values in a research article.

      Access: 
      JoHo members
      “Halpern (2014). Thinking, an introduction.” - Article summary

      “Halpern (2014). Thinking, an introduction.” - Article summary

      Image

      The twin pillars of critical thinking are knowing how to learn and knowing how to think. Critical thinking is a necessary skill for the future. Critical thinking is a broad term that describes reasoning in an open-ended manner with an unlimited number of solutions and involves constructing a situation and supporting the reasoning that went into a conclusion.

      Nondirected or automatic thinking are not part of critical thinking. Cognitive process instruction refers to utilizing the knowledge we have accumulated about human thinking processes and mechanisms in ways that can help people improve how they think. Critical thinking is best cultivated in a school environment that encourages students to ask questions.

      There are four steps to critical thinking instruction:

      1. Explicitly learn the skill of critical thinking
      2. Develop the disposition for effortful thinking and learning
      3. Direct learning activities in ways that increase the probability of trans-contextual transfer
      4. Make metacognitive monitoring explicit and overt

      A good critical thinker:

      1. Has the habit of planning
      2. Has motivation for critical thinking
      3. Is flexible and open to new ideas
      4. Has persistence
      5. Acknowledges errors
      6. Is willing to change stances
      7. Does not use self-justification
      8. Is mindful
      9. Is consensus-seeking
      10. Uses metacognition
      11. Recognizes when critical thinking is necessary

      Critical thinking requires motivation to exert the conscious effort needed to work in a planful manner, to check accuracy, gather information and to persist when the solution is not obvious.

      Self-justification refers to making excuses for a mistaken belief in order to protect the self-image. Mindfulness refers to the simple act of drawing novel distinctions. Consensus-seeking refers to an openness in thinking that allows members of a group to agree on some aspects of a solution and disagree on others. Recognizing when critical thinking is necessary is also important. Metacognition refers to our knowledge of what we know and the use of this knowledge to direct further learning activities.
       

      Access: 
      JoHo members
      “Kievit et al. (2013). Simpson’s paradox in psychological science: A practical guide.” - Article summary

      “Kievit et al. (2013). Simpson’s paradox in psychological science: A practical guide.” - Article summary

      Image

      Simpson’s paradox states that the direction of an association at the population-level may be reversed within subgroups of that population. Inadequate attention to the Simpson’s paradox may lead to faulty inferences. The Simpson’s paradox can arise because of differences in proportions on subgroup levels compared to population levels. It also states that a pattern (association) does not need to hold within a subgroup.

      The paradox is related to a lot of things, including causal inference. A generalized conclusion (e.g. extraversion causes party-going) might hold for the general population, but does not mean that this inference can be drawn at the individual level. A correlation across the population does not need to hold in an individual over time.

      In order to deal with Simpson’s paradox, the situations in which the paradox occurs frequently have to be assessed. There are several steps in preventing Simpson’s paradox:

      1. Consider when it occurs.
      2. Explicitly propose a mechanism, determining at which level it is presumed to operate.
      3. Assess whether the explanatory level of data collection aligns with the explanatory level of the proposed mechanism.
      4. Conduct an experiment to assess the association between variables.

      In the absence of strong top-down knowledge, people are more likely to make false inferences based on Simpson’s paradox.

      Access: 
      JoHo members
      “LeBel & Peters (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice.” - Article summary

      “LeBel & Peters (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice.” - Article summary

      Image

      Psi refers to the anomalous retroactive influence of future events on an individual’s current behaviour. There are three important deficiencies in modal research practice: an overemphasis on conceptual replication (1), insufficient attention to verifying the integrity of measurement instruments and experimental procedures (2) problems with the implementation of null hypothesis testing (3).

      The interpretation bias refers to a bias towards interpretations of data that favour a researcher’s theory. A potential consequence of this is an increased risk of reported false positives and a disregard of true negatives. The knowledge system of psychology consists of theory relevant beliefs (1), this is about the mechanisms that produce behaviour and method-relevant beliefs (2), this is about the procedures through which data is obtained.

      Deficiencies in modal research practice bias systematically bias the interpretation of confirmatory data as theory relevant (1) and the interpretation of disconfirmatory data as method relevant (2).

      Central beliefs are beliefs on which many other beliefs depend. Conservatism refers to choosing the theoretical explanation consistent with the data that requires the least amount of restructuring of the existing knowledge system.

      If method-relevant beliefs are central in a knowledge system, it becomes more difficult to blame methodology related errors for disconfirmatory results. If theory-relevant beliefs become central, it poses the threat of becoming a logical assumption. A hypothesis under test should be described in a way that is falsifiable and not logically necessary.

      An overemphasis on conceptual replication at the expense of direct replication weakens method-relevant beliefs in the knowledge system. A statistical significant result is often followed by a conceptual replication. A failure of the conceptual replication leads to the question whether the negative result was due to the falsity of the underlying theory or to methodological flaws introduced by changes in conceptual replication.

      The failure to verify the integrity of measurement instruments and experimental procedures weakens method-relevant beliefs and leads to ambiguity in the interpretation of results. The null hypothesis can be viewed as a straw man, as two identical populations are almost not possible. Basing theory choices on null hypothesis significance tests detaches theories from the broader knowledge system.

      In order to overcome the flaws of the modal research practice, method-relevant beliefs must be strengthened. There are three ways in order to do this:

      1. Stronger emphasis on direct replication
        A direct replication leads to greater confidence in the results. They are necessary to ensure that an effect is real.
      2. Verify integrity of methodological procedures
        Method-relevant beliefs are more difficult to reject if the integrity of methodological procedures are verified and this leads to a less ambiguous interpretation of results. This includes routinely checking the internal consistency of the scores of any measurement instrument that is used. This includes the use of objective markers of instruction comprehension.
      3. Use stronger forms of NHST
        The null hypothesis should be a theoretically derived point value of the focal variable, instead
      .....read more
      Access: 
      JoHo members
      “Marewski & Olsson (2009). Formal modelling of psychological processes.” - Article summary

      “Marewski & Olsson (2009). Formal modelling of psychological processes.” - Article summary

      Image

      One way of avoiding the null hypothesis testing ritual in science is to increase the precision of theories by casting them as formal models. Rituals can be characterized by a repetition of the same action (1), fixations on special features (2), anxieties about punishment for rule violation (3) and wishful thinking (4). The null hypothesis testing ritual is mainly maintained because many psychological theories are too weak to make precise predictions besides the direction of the effect.

      A model is a simplified representation of the world that aims to explain observed data. It specifies a theory’s predictions. Modelling is especially suited for basic and applied research about the cognitive system. There are four advantages of formally specifying the theories as models:

      1. Designing strong tests of theories
        Modelling theories leads to being able to make quantitative predictions about a theory, which then leads to comparable, competing predictions between theories which allows for comparison and testing of theories.
      2. Sharpening research questions
        Null hypothesis testing allows for vague descriptions of theories and specifying the theories as models requires more precise research questions. These vague descriptions make theories difficult to test and sharpening the research questions makes it easier to test the theories.
      3. Going beyond linear theories
        Null hypothesis testing is especially applicable to simple hypotheses. The statistical tools available are used to create theories, mostly linear theories and by specifying the theory as a model, this is not necessary anymore.
      4. Using more externally valid designs to study real-world questions
        Modelling can lead to more externally valid designs, as confounds are not eliminated in the analysis, but built into the model.

      Goodness-of-fit measures cannot make the distinction between variation in the data as a result of noise or as a result of the psychological process of interest. A model can end up overfitting the data, capturing the variance of the psychological process of interest and variance as a result of random error. The ability of a model to predict new data is the generalizability. The complexity of a model refers to a model’s inherent flexibility that enables to fit diverse patterns of data. The complexity of a model is related to the degree to which a model is susceptible to overfitting. The number of free parameters (1) and how parameters are combined in the model (2) contribute to the model’s complexity.

      Increased complexity makes a model more likely to overfit while the generalizability to new data decreases. Increased complexity can also lead to better generalizability of the data, but only if the model is complex enough and not too complex. A good fit to current data does not predict a good fit to other data.

      The irrelevant specification problem refers to the difficulty bridging the gap between description of theories and formal implementations. This can lead to unintended discrepancies between theories and their formal counterparts. The Bonari paradox refers to when models become more complex and

      .....read more
      Access: 
      JoHo members
      “Meltzoff & Cooper (2018). Critical thinking about research: Psychology and related fields.” - Article summary

      “Meltzoff & Cooper (2018). Critical thinking about research: Psychology and related fields.” - Article summary

      Image

      People can get more polarized opinions after they have read the same article if they started off with opposing beliefs. This can be due to misinterpreting the evidence (1), different motivation (2) or other cognitive malfunctions (3).

      Psychologists need to be critical thinkers, because they want to know the field and want to keep up with what is going on in the field of psychology.

      Critical thinking requires to check for scientific soundness and not whether a paper matches with pre-existing beliefs, assumptions or other forms of bias. Scepticism of scientists towards their own work is valuable, as they then do not show the confirmation bias. There are several expectations a critical reader should have:

      1. The research question flows naturally from the previous literature and the context.
      2. The literature review and the statement of the problem relate to the hypotheses.
      3. The hypotheses set up research design expectancies and suggest what variables should be manipulated, measured or controlled for.
      4. The hypotheses, appropriate design and type of data dictate the method of data analysis.
      5. The analysis influences the kind of conclusions, inferences and generalizations that can be made.
      Access: 
      JoHo members
      “Mitchell & Tetlock (2017). Popularity as a poor proxy for utility.” - Article summary

      “Mitchell & Tetlock (2017). Popularity as a poor proxy for utility.” - Article summary

      Image

      Before the existence of the IAT, indirect measures of prejudice were developed in order to overcome response bias and psychologists began to examine automatic processes that may contribute to contemporary forms of prejudice. After the existence of the IAT, implicit prejudice became the same thing as widespread unconscious prejudices that are more difficult to spot and regularly infect intergroup interactions.

      The IAT has been used throughout different areas of society and is a very popular mean of describing implicit prejudice. Prejudice extends beyond negative or positive associations with an attitude object to include motivational and affective reactions to in-group and out-group members. IAT does not have a strong predictive validity. The IAT score is a poor predictor of discriminating behaviour.

      There are no guidelines for how to interpret the scores on the IAT. This is referred to as the score interpretation problem. The test scores are dependent on arbitrary thresholds and it is not possible to link them to behaviour outcomes.

      The focus of the IAT on implicit gender stereotypes is (not implicit sexism) is problematic because implicit measures of gender stereotypes are not a good predictor of discriminatory behaviour (1), only a very limited set of implicit gender stereotypes has been examined (2) and no explanation is provided about how conflicts between automatic evaluative associations and automatic semantic associations are resolved (3).

      Individuating information, getting personal information about a certain group, exerts effects to counter explicit biases. It does the same with regard to implicit biases.

      Subjective evaluation criteria are not associated with discrimination. Therefore, the solution that only objective measures must be used in decision making to counter (implicit) bias is unnecessary. This is referred to as the subjective judgement problem.

      Access: 
      JoHo members
      “Nosek, Spies, & Motyl (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability.” - Article summary

      “Nosek, Spies, & Motyl (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability.” - Article summary

      Image

      There is a publication bias in research. Mostly positive results – that is significant – are being published and non-significant results are often not published. This leads to researchers only trying to find significant results instead of genuine effects or non-effects.

      Researchers are more likely to find significant results because of flexible analysis options (1), the confirmation bias (2) and the drive to find significant results (3). It is possible to find significant results everywhere if you search hard enough. Direct replications do not occur often in research and are almost never published.

      There are several practices that can increase publishability but can decrease the validity of the results:

      1. Leveraging chance by running many low-powered studies instead of a few high-powered ones.
      2. Uncritically dismissing “failed” studies as pilot studies but uncritically accepting “successful” studies.
      3. Selectively reporting studies with positive results and not studies with negative results (cherry picking).
      4. Stopping data collection as soon as a reliable effect is obtained.
      5. Continuing data collection until a reliable effect is obtained.
      6. Including multiple independent and/or dependent variable and reporting the subset that “worked”.
      7. Maintaining flexibility in design and analytical models (attempt to exclude data)
      8. Reporting a discovery as if it had been the result of a confirmatory test
      9. Not doing a direct replication once a reliable effect is obtained

      Conceptual replication involves deliberately changing the operationalization of the key elements of the design such as the independent variable, dependent variable or both. Demonstrating the same effects with multiple operationalizations provides confidence in its conceptual interpretation. Conceptual replication is not an effective replacement for direct replication.

      The peer review process offers a way to detect false results. Not publishing articles without replications would also be effective in lessening the publication bias and the number of false results in science. This could also be ineffective, because it could reduce the number of innovative research as scientists might ‘play it on safe’.

      Paradigm-driven research can be used to both confirm and disconfirm prior results. It accumulates knowledge by systematically altering a procedure to investigate a theory or a research question. This includes both replication and extension of the research. One pitfall of paradigm-driven research is that the research could become about the methodology, instead of the theory. Conceptual replication prevents this.

      Check-lists could provide effective when conducting research as this prevents information from being left out. A metric that determines what is worth replicating could also prove to be effective. Resource constraints to replicate some findings could be overcome by using crowdfunding. Peer reviewers judge whether a finding is important enough to be published, but they are not always capable of making that judgement.

      Another solution to the publication bias and the number of false results in science is to shift the attention from publishing – make publishing trivial – to the evaluation of the research.

      The solution to

      .....read more
      Access: 
      JoHo members
      “Pearl (2018). Confounding and deconfounding: Or, slaying the lurking variable.” - Article summary

      “Pearl (2018). Confounding and deconfounding: Or, slaying the lurking variable.” - Article summary

      Image

      Confounding bias occurs when a variable influences both who is selected for the treatment and the outcome of the experiment. If a possible confounding variable is known, it is possible to control for the possible confounding variable. Researchers tend to control for all possible variables, which leaves the possibility of controlling for the thing you are trying to measure (e.g. controlling for mediators).

      Confounding needs a causal solution, not a statistical one and causal diagrams provide a complete and systematic way of finding that solution. If all the confounders are controlled for, a causal claim can be made. However, it is not always sure whether all confounders are controlled for.

      Randomization has two clear benefits. It eliminates confounder bias and it enables the researcher to quantify his uncertainty. Randomization eliminates confounders without introducing new confounders. In a non-randomized study, confounders must be eliminated by controlling for them, although it is not always possible to know all the possible confounders.

      It is not always possible to conduct a randomized controlled experiment because of ethical, practical or other constraints. Causal estimates of observational studies can provide with provisional causality. This is causality contingent upon the set of assumptions that the causal diagram advertises.

      Confounding stands for the discrepancy between what we want to assess (the causal effect) and what we actually do assess using statistical methods. A mediator is the variable that explains the causal effect of X on Y (X>Z>Y). If you control for a mediator, you will conclude that there is no causal link, when there is.

      There are several rules for controlling for possible confounders:

      1. In a chain junction (A -> B -> C), controlling for B prevents information from A getting to C and vice versa.
      2. In a fork or confounding junction (A <- B -> C), controlling for B prevents information from A getting to C and vice versa.
      3. In a collider (A -> B <- C), controlling for B will allow information from A getting to C and vice versa.
      4. Controlling for a mediator partially closes the stream of information. Controlling for a descendant of a collider partially opens the stream of information.

      A variable that is associated with both X and Y is not necessarily a confounder.

      Access: 
      JoHo members
      “Schmittmann et al. (2013). Deconstructing the construct: A network perspective on psychological phenomena.” - Article summary

      “Schmittmann et al. (2013). Deconstructing the construct: A network perspective on psychological phenomena.” - Article summary

      Image

      In the reflective model, the attribute is seen as the common cause of observed scores (e.g. depression causes people feeling sad). In the formative model, observed scores define or determine the attribute (e.g. depression occurs when people feel sad a lot).

      Reflective models are presented as measurement models. A latent variable is introduced to account for the covariance between other variables. In the reflective model, variables are regarded as exchangeable save for measurement parameters (e.g. reliability) and correlations between the variables are spurious in the reflective model. The correlation only exists because variables are related and might be the same thing.

      Formative models differ from reflective models because the variables are not exchangeable. This is because variables are hypothesised to capture different aspects of the same construct. There is also no assumption about whether the variables should correlate.

      There are three problems with the conceptualization of reflective and formative models:

      1. Time
        In reflective and formative models, time is not explicitly represented. The precedence criteria for causal relationships is not taken into account.
      2. Inability to articulate processes
        The processes of causal mechanisms cannot be described and tested using these models.
      3. Relations between observables
        Causal relationships between observable variables are neglected in these models as the models do not account for these relationships, although it is likely that there is a causal relationship between at least some observable variables.

      The network model states that observable variables of latent variables should be seen as autonomous causal entities in a network of dynamical systems.

      Access: 
      JoHo members
      “Simmons, Nelson, & Simonsohn (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant.” - Article summary

      “Simmons, Nelson, & Simonsohn (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant.” - Article summary

      Image

      A false positive refers to an incorrect rejection of the null hypothesis. The decisions a researcher can make during the research process is called the researcher degrees of freedom. Four common degrees of freedom are choosing sample size (1), using covariates (2), choosing among dependent variables (3) and reporting subsets of experimental conditions (4). The researcher degrees of freedom can significantly increase the false positive rate.

      There are six guidelines for authors to prevent the increased rate of false positives:

      1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article.
      2. Authors must collect at least 20 observations per cell or else provide a compelling cost-of-data collection justification.
      3. Authors must list all variables collected in a study.
      4. Authors must report all experimental conditions, including failed manipulations
      5. If observations are eliminated, authors must also note what the statistical results are if those observations are included.
      6. If an analysis includes a covariate, authors must report the statistical results of analysis without the covariate.

      There are four guidelines for reviewers to prevent the increased rate of false positives:

      1. Reviewers should ensure that authors follow the requirements.
      2. Reviewers should be more tolerant of imperfection in results.
      3. Reviewers should require authors to demonstrate that their results do not hinge on arbitrary analytic decisions.
      4. If justifications of data collection or analysis are not compelling, reviewers should require the authors to conduct an exact replication.

       

       

      Access: 
      JoHo members
      “Shadish (2008). Critical thinking in quasi-experimentation.” - Article summary

      “Shadish (2008). Critical thinking in quasi-experimentation.” - Article summary

      Image

      A common element in all experiments is the deliberate manipulation of an assumed cause followed by an observation of the effects that follow. A quasi-experiment is an experiment that does not uses random assignment of participants to conditions.

      An inus condition is an insufficient but non-redundant part of an unnecessary but sufficient condition. It is insufficient, because in itself it cannot be the cause, but it is also non-redundant as it adds something that is unique to the cause. It is an insufficient cause.

      Most causal relationships are non-deterministic. They do not guarantee that an effect occur, as most causes are inus conditions, but they increase the probability that an effect will occur. To different degrees, all causal relationships are contextually dependent.

      A counterfactual is something that is contrary to fact. An effect is the difference between what did happen and what would have happened. The counterfactual cannot be observed. Researchers try to approximate the counterfactual, but it is impossible to truly observe it.

      Two central tasks of experimental design are creating a high-quality but imperfect source of counterfactual and understanding how this source differs from the experimental condition.

      Creating a good source of counterfactual is problematic in quasi-experiments. There are two tools to attempt this:

      1. Observe the same unit over time
      2. Make the non-random control groups as similar as possible to the treatment group

      A causal relationship exists if the cause preceded the effect (1), the cause was related to the effect (2) and there is no plausible alternative explanation for the effect other than the cause (3). Although quasi-experiments are flawed compared to experimental studies, they improve on correlational studies in two ways:

      1. Quasi-experiments make sure the cause precedes the effect by first manipulating the presumed cause and then observing an outcome afterwards.
      2. Quasi-experiments allows to control for some third-variable explanations. 

      Campbell’s threats to valid causal inference contains a list of common group differences in a general system of threats to valid causal inference:

      1. History
        Events occurring concurrently with treatment could cause worse performance.
      2. Maturation
        Naturally occurring changes over time, not too be confused with treatment effects.
      3. Selection
        Systematic differences over conditions in respondent characteristics.
      4. Attrition
        A loss of participants can produce artificial effects if that loss is systematically correlated with conditions.
      5. Instrumentation
        The instruments of measurement might differ or change over time.
      6. Testing
        Exposure to a test can affect subsequent scores on a test.
      7. Regression to the mean
        An extreme observation will be less extreme on the second observation.

      Two flaws of falsification are that it requires a causal claim to be clear, complete and agreed upon in all its details and it requires observational procedures to perfectly reflect the theory that is being tested.

      Access: 
      JoHo members
      “Dennis & Kintsch (2008). Evaluating theories.” - Article summary

      “Dennis & Kintsch (2008). Evaluating theories.” - Article summary

      Image

      A theory is a concise statement about how we believe the world to be. There are several things to look at when evaluating theories:

      1. Descriptive adequacy

      Does the theory accord with the available data?

      2. Precision and interpretability

      Is the theory described in a sufficiently precise fashion that it is easy to interpret?

      3. Coherence and consistency

      Are there logical flaws in the theory? Is it consistent with theories of other domains?

      4. Prediction and falsifiability

      Can the theory be falsified?

      5. Postdiction and explanation

      Does the theory provide a genuine explanation of existing results?

      6. Parsimony

      Is the theory as simple as possible?

      7. Originality

      Is the theory new or a restatement of an old theory?

      8. Breadth

      Does the theory apply to a broad range of phenomena?

      9. Usability

      Does the theory have applied implications?

      10. Rationality

      Are the claims of the theory reasonable?

      Postdiction refers to predictions under controlled conditions. 

      Access: 
      JoHo members
      “Van der Maas, Kan, & Borsboom (2014). Intelligence is what the intelligence test measures. Seriously.” – Article summary

      “Van der Maas, Kan, & Borsboom (2014). Intelligence is what the intelligence test measures. Seriously.” – Article summary

      Image

      Intelligence research depends on the positive manifold. The positive manifold refers to the positive correlation between intelligence measures. The latent variable in the formative model is not tied to any subtests. In the mutualism model, the observed variables are exchangeable. In the formative model, they are not.

      Access: 
      JoHo members
      “Willingham (2007). Decision making an deductive reasoning.” – Article summary

      “Willingham (2007). Decision making an deductive reasoning.” – Article summary

      Image

      People do not reason logically, but decision making encompasses all human behaviour. Choices can be rational (internally consistent). People expect choices to show transitivity. If a relationship holds between item one and item two and between item two and item three then the relationship should also hold between item one and item three. Utility is the personal value we attach to outcomes rather than to their absolute monetary value.

      Normative theories of decision making imply that some choices are better than other choices. The optimal choice in normative theories depends on the theory.

      The expected value theory states that the optimal choice is the choice that offers the largest financial payoff. People also tend to go for choices with the maximum utility. People are not always consistent in their choices with expected value and utility, as this requires a lot of time and motivation. There are two principles in rational decision making:

      1. Description invariance
        The description of the choice should not make any difference as long as basic structure of the choices is the same.
      2. Procedure invariance
        The procedure of decision making should not make any difference in the decision that people make.

      People are inconsistent with this. Psychic budgets refers to how we mentally categorize money that we have spent or are contemplating to spend. Sunk cost refers to an investment that is irretrievably spent and should not influence present decision making, but still does, as people want to get their investment out of it as much as they can. People also make decisions based on loss aversion, the unpleasantness of a loss is bigger than the pleasure of a similar gain. Therefore, people make decisions based on aversion of the unpleasantness of a loss. Satisficing refers to selecting the first choice that satisfies a certain demand (e.g: cost of a phone). People use satisficing to prevent them from having to compare everything with each other.

      People tend to use heuristics to make decisions. There are several heuristics:

      1. Representativeness heuristic
        An event is judged to be probable if it has properties that are representative of that category (e.g: we believe a person wearing a metal t-shirt is more likely to be part of a metal band than someone in a suit).
      2. Availability heuristic
        An event is judged more probable if one is able to recall many examples of it (e.g: deadliness of plane crashes versus cardiovascular diseases).
      3. Anchoring and adjustment heuristic
        The initial value of an event adjusts or estimate upwards or downwards on the basis of other information (e.g: if people first hear that someone wants an offer between 10 and 50 euros, they will have a lower offer than when people hear that someone wants an offer between 50 and 150 euros).

      The odds of a conjunction of two events are always lower than the odds of a single event. The

      .....read more
      Access: 
      JoHo members
      Access: 
      JoHo members
      Check how to use summaries on WorldSupporter.org


      Online access to all summaries, study notes en practice exams

      Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

      There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

      1. Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
      2. Use the menu above every page to go to one of the main starting pages
      3. Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
      4. Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
      5. Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

      Do you want to share your summaries with JoHo WorldSupporter and its visitors?

      Quicklinks to fields of study (main tags and taxonomy terms)

      Field of study

      Comments, Compliments & Kudos:

      Add new contribution

      CAPTCHA
      This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
      Image CAPTCHA
      Enter the characters shown in the image.
      Promotions
      Image
      The JoHo Insurances Foundation is specialized in insurances for travel, work, study, volunteer, internships an long stay abroad
      Check the options on joho.org (international insurances) or go direct to JoHo's https://www.expatinsurances.org

       

      Follow the author: JesperN