Surrogate Science: The Idol of a Universal Method for Scientific Inference - summary of an article by Gigerenzer & Marewski

Critical thinking
Article: Gigerenzer, G. & Marewski, J, N. (2015)
Surrogate Science: The Idol of a Universal Method for Scientific Inference
doi: 10.1177/0149206314547522

Introduction

Scientific inference should not be made mechanically.
Good science requires both statistical tools and informed judgment about what model to construct, what hypotheses to test, and what tools to use.

This article is about the idol of a universal method of statistical inference.

In this article, we make three points:

There is no universal method of scientific inference, but, rather a toolbox of useful statistical methods. In the absence of a universal method, its followers worship surrogate idols, such as significant p values.
The inevitable gap between the ideal and its surrogate is bridged with delusions.
These mistaken beliefs do much harm. Among others, by promoting irreproducible results.
If the proclaimed ‘Bayesian revolution’ were to take place, the danger is that the idol of a universal method might survive in a new guise, proclaiming that all uncertainty can be reduced to subjective probabilities.
Statistical methods are not simply applied to a discipline. They change the discipline itself, and vice versa.

Dreaming up a universal method of inference
Bayesianism and the new quest for an universal method
How statistics change research, surrogate science
Conclusion: Leibniz’s dram of Bayes’ nightmare?

Dreaming up a universal method of inference

The null ritual

The most prominent creation of a seemingly universal inference method is the null ritual:

Set up a null hypothesis of ‘no mean inference’ or ‘zero correlation’. Do not specify the predictions or your own research hypothesis.
Use 5% as a convention for rejecting the null. If significant, accept you research hypothesis. Report the result as p<.05, p<.01, p<.001, whichever comes next to the obtained p value.
Always perform this procedure.

Level of significance has three different meanings:

A mere convention
The alpha level
The exact level of significance

Three meanings of significance

The alpha level: the long-term relative frequency of mistakenly rejecting hypothesis H₀if it is true, also known as Type I error rate.
The beta level: the long-term frequency of mistakenly rejecting H₁ if it is true.

Two statistical hypothesis need to be specified in order to be able to determine both alpha and beta.
Neyman and Pearson rejected a mere convention in favour of an alpha level that required a rational scheme.

Set up two statistical hypotheses, H₁, H₂, and decide on alpha, beta and the sample size before the experiment, based on subjective cost-benefit considerations.
If the data fall into the rejection region of H₁, accept H₂, otherwise accept H₁
The usefulness of this procedure is limited among others to situations where there is a disjunction of hypotheses, where there is repeated sampling, and where you can make meaningful cost-benefit trade-offs for choosing alpha and beta

The third definition of level of significance is the exact level of significance.

Set up a statistical null hypothesis. The null need not be a nil hypothesis (e.g., zero difference)
Report the exact level of significance. Do not use a conventional 5% level all the time
Use this procedure only if you known little about he problem at hand

This differs fundamentally form the null ritual.

One should not automatically use the same level of significance
One should not use this procedure of all problems.

According to Neyman-Pearson, alpha needs to be determined before the data are obtained.

Bestselling textbooks sell a single method of inference

The null ritual is an invention of statistical textbook writers in the social sciences.
The idol of an universal method (in the form of the null ritual) left no space for Bayesian statistics. Nor did publishers.

Bayesianism and the new quest for an universal method

The potential danger in Bayesian statistics lies in the subjective interpretation of probability, which sanctions its universal application to al situations of uncertainty.

Three interpretations of probability

A relative frequency in the long run.
A propensity, a physical design of an object.
A reasonable degree of subjective belief.

Universal Bayes

If probability is thought of as a relative frequency in the long run, it immediately becomes clear that Bayes’ rule has a limited range of applications.
The same holds for propensity.
The subjective interpretation has no limits.

Subjective probability can be applied to situations of uncertainty and to singular events.
- Whenever that makes sense is another question

Universal Bayes ignores the study of genuine tools for uncertainty.

Automatic Bayes

As with the null ritual, the universal claim for Bayes’ rule tends to get together with its automatic use.

An automatic use of Bayes’ rule is a dangerously beautiful idol.
But even for a devoted Bayesian, it is not a reality. Bayesianism does not exists in the singular.

Toward a statistical toolbox

The alternative to universal and automatic of Bayesian statistics as forming part of a larger toolbox.
Bayes’ rule has its value but, does not work for all problems.

In the social sciences, objections to the use of Bayes’ rule are that:

Frequency-based prior probabilities do not exist.
The set of hypotheses needed for the prior probability distribution is not known
Researchers’ introspection does not confirm he calculation of probabilities.

Summary

Bayes’ rule is useful as part of a statistical toolbox.
For instance, when priors can be reliably estimated.
Neyman and Pearson’s decision theory is appropriate for repeated random drawing situations in quality control.
Fisher’s null hypothesis testing is another tool, relevant for situations in which one does not understand what is happening.

This statistical toolbox contains not only techniques of inference but, of equal importance, descriptive statistics, exploratory data analysis, and formal modeling techniques.
The only items that do not belong in the toolbox are false idols.

How statistics change research, surrogate science

Surrogate science: the attempt to infer the quality of research using a single number or benchmark.
The introduction of surrogates shifts researchers’ goal away from doing innovative science and redirects their effort toward meeting the surrogate goal.

Statistical inference as surrogate for replication

This is the replication fallacy: a significant p value does not specify the probability that the same result can be reproduced in another study.

Inferential statistics have become surrogates for real replication.

Hypotheses finding is presented as hypotheses testing

Fishing expeditions: disguising hypothesis testing as hypothesis testing.
Many researchers first look at the data for patterns, check for significance, and then present the result as if it were a hypothesis test.

A hypothesis should not be tested with the same data from which it was derived.
Finding new patterns is important, but p values for confidence intervals should not be provided for these.

Routine statistical inference has become a surrogate for both hypothesis finding and replication. The surrogate goal is to obtain a significant p value or other test statistic, even when it is out of place, as in the case of hypothesis finding.

Quantity as surrogate for quality

Surrogate science does not end with statistical tests.
Research assessment exercises tend to create surrogates as well. Like citation counts.
The evident danger is that hiring committees and advisory broads study these surrogate numbers rather than the papers written by job candidates and faculty members.
With citation as surrogate for quality, some truly original work may go unheeded.
Surrogates transform science by warping researchers’ goals.

Conclusion: Leibniz’s dram of Bayes’ nightmare?

Surrogate science, from the mindless calculation of p values or Bayes factors to citation counts, is not entirely worthless.
It fuels a steady stream of work of average quality and keeps researchers busy producing more of the same.

But it makes it harder for scientists to be innovative, risk taking, and imaginative.
By transforming researchers’ goals, surrogates encourage cheating and incomplete or dishonest reporting.

Join World Supporter

for free to follow other supporters, see more content and use the tools
for €10,- by becoming a member to see all content

Waarom een account aanmaken?

Je WorldSupporter account geeft je toegang tot alle functionaliteiten van het platform
Zodra je bent ingelogd kun je onder andere:
- pagina's aan je lijst met favorieten toevoegen
- feedback achterlaten
- deelnemen aan discussies
- zelf bijdragen delen via de 7 WorldSupporter tools

Follow the author: SanneA

SanneA

Content is used in bundle

WSRt, critical thinking - a summary of all articles needed in the fourth block of second year psychology at the uva

Main study and working fields

Access level of this page

Public
WorldSupporters only
JoHo members
Private

Statistics

[totalcount]

Content categories

Learn & Study

Netherlands

Universiteit Amsterdam: UVA

Psychology & Behavioral Sciences

Science & Research

Comments, Compliments & Kudos

Add new contribution

Promotions

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

Je vertrek voorbereiden of je verzekering afsluiten bij studie, stage of onderzoek in het buitenland

Study or work abroad? check your insurance options with The JoHo Foundation

More contributions of WorldSupporter author: SanneA

WorldSupporter Resources

WSRt, critical thinking - a summary of all articles needed in the fourth block of second year psychology at the uva

This is a summary of the articles and reading materials that are needed for the fourth block in the course WSR-t. This course is given to second year psychology students at the Uva. The course is about thinking critically about how scientific research is done and how this could be done differently.

Kinds versus continua: a review of psychometric approaches to uncover the structure of psychiatric constructs - summary of an article by Borsboom, Rhemtulla, Cramer, van der Maas, Scheffer and Dolan

Toward a Model-Based Approach to the Clinical Assessment of Personality Psychopathology - summary of an article by Eaton, Krueger, Docherty, and Sponheim

Bayes and the probability of hypotheses - summary of Chapter 4 of Understanding Psychology as a science by Dienes

Bayesian Versus orthodox statistics: which side are you on? - summary of an article by Dienes, 2011

Network Analysis: An Integrative Approach to the Structure of Psychopathology - summary of an article by Borsboom and Cramer (2013)

Introduction to qualitative psychological research - an article by Coyle (2015)

Surrogate Science: The Idol of a Universal Method for Scientific Inference - summary of an article by Gigerenzer & Marewski

WSRt, critical thinking, a list of terms used in the articles of block 4