Surrogate Science: The Idol of a Universal Method for Scientific Inference - summary of an article by Gigerenzer & Marewski

Critical thinking
Article: Gigerenzer, G. & Marewski, J, N. (2015)
Surrogate Science: The Idol of a Universal Method for Scientific Inference
doi: 10.1177/0149206314547522

Introduction

Scientific inference should not be made mechanically.
Good science requires both statistical tools and informed judgment about what model to construct, what hypotheses to test, and what tools to use.

This article is about the idol of a universal method of statistical inference.

In this article, we make three points:

  • There is no universal method of scientific inference, but, rather a toolbox of useful statistical methods. In the absence of a universal method, its followers worship surrogate idols, such as significant p values.
    The inevitable gap between the ideal and its surrogate is bridged with delusions.
    These mistaken beliefs do much harm. Among others, by promoting irreproducible results.
  • If the proclaimed ‘Bayesian revolution’ were to take place, the danger is that the idol of a universal method might survive in a new guise, proclaiming that all uncertainty can be reduced to subjective probabilities.
  • Statistical methods are not simply applied to a discipline. They change the discipline itself, and vice versa.

Dreaming up a universal method of inference

The null ritual

The most prominent creation of a seemingly universal inference method is the null ritual:

  • Set up a null hypothesis of ‘no mean inference’ or ‘zero correlation’. Do not specify the predictions or your own research hypothesis.
  • Use 5% as a convention for rejecting the null. If significant, accept you research hypothesis. Report the result as p<.05, p<.01, p<.001, whichever comes next to the obtained p value.
  • Always perform this procedure.

Level of significance has three different meanings:

  • A mere convention
  • The alpha level
  • The exact level of significance

Three meanings of significance

The alpha level: the long-term relative frequency of mistakenly rejecting hypothesis H0 if it is true, also known as Type I error rate.
The beta level: the long-term frequency of mistakenly rejecting H1 if it is true.

Two statistical hypothesis need to be specified in order to be able to determine both alpha and beta.
Neyman and Pearson rejected a mere convention in favour of an alpha level that required a rational scheme.

  • Set up two statistical hypotheses, H1, H2, and decide on alpha, beta and the sample size before the experiment, based on subjective cost-benefit considerations.
  • If the data fall into the rejection region of H1, accept H2, otherwise accept H1
  • The usefulness of this procedure is limited among others to situations where there is a disjunction of hypotheses, where there is repeated sampling, and where you can make meaningful cost-benefit trade-offs for choosing alpha and beta

The third definition of level of significance is the exact level of significance.

  • Set up a statistical null hypothesis. The null need not be a nil hypothesis (e.g., zero difference)
  • Report the exact level of significance. Do not use a conventional 5% level all the time
  • Use this procedure only if you known little about he problem at hand

This differs fundamentally form the null ritual.

  • One should not automatically use the same level of significance
  • One should not use this procedure of all problems.

According to Neyman-Pearson, alpha needs to be determined before the data are obtained.

Bestselling textbooks sell a single method of inference

The null ritual is an invention of statistical textbook writers in the social sciences.
The idol of an universal method (in the form of the null ritual) left no space for Bayesian statistics. Nor did publishers.

Bayesianism and the new quest for an universal method

The potential danger in Bayesian statistics lies in the subjective interpretation of probability, which sanctions its universal application to al situations of uncertainty.

Three interpretations of probability

  • A relative frequency in the long run.
  • A propensity, a physical design of an object.
  • A reasonable degree of subjective belief.

Universal Bayes

If probability is thought of as a relative frequency in the long run, it immediately becomes clear that Bayes’ rule has a limited range of applications.
The same holds for propensity.
The subjective interpretation has no limits.

  • Subjective probability can be applied to situations of uncertainty and to singular events.

    • Whenever that makes sense is another question

Universal Bayes ignores the study of genuine tools for uncertainty.

Automatic Bayes

As with the null ritual, the universal claim for Bayes’ rule tends to get together with its automatic use.

An automatic use of Bayes’ rule is a dangerously beautiful idol.
But even for a devoted Bayesian, it is not a reality. Bayesianism does not exists in the singular.

Toward a statistical toolbox

The alternative to universal and automatic of Bayesian statistics as forming part of a larger toolbox.
Bayes’ rule has its value but, does not work for all problems.

In the social sciences, objections to the use of Bayes’ rule are that:

  • Frequency-based prior probabilities do not exist.
  • The set of hypotheses needed for the prior probability distribution is not known
  • Researchers’ introspection does not confirm he calculation of probabilities.

Summary

Bayes’ rule is useful as part of a statistical toolbox.
For instance, when priors can be reliably estimated.
Neyman and Pearson’s decision theory is appropriate for repeated random drawing situations in quality control.
Fisher’s null hypothesis testing is another tool, relevant for situations in which one does not understand what is happening.

This statistical toolbox contains not only techniques of inference but, of equal importance, descriptive statistics, exploratory data analysis, and formal modeling techniques.
The only items that do not belong in the toolbox are false idols.

How statistics change research, surrogate science

Surrogate science: the attempt to infer the quality of research using a single number or benchmark.
The introduction of surrogates shifts researchers’ goal away from doing innovative science and redirects their effort toward meeting the surrogate goal.

Statistical inference as surrogate for replication

This is the replication fallacy: a significant p value does not specify the probability that the same result can be reproduced in another study.

Inferential statistics have become surrogates for real replication.

Hypotheses finding is presented as hypotheses testing

Fishing expeditions: disguising hypothesis testing as hypothesis testing.
Many researchers first look at the data for patterns, check for significance, and then present the result as if it were a hypothesis test.

A hypothesis should not be tested with the same data from which it was derived.
Finding new patterns is important, but p values for confidence intervals should not be provided for these.

Routine statistical inference has become a surrogate for both hypothesis finding and replication. The surrogate goal is to obtain a significant p value or other test statistic, even when it is out of place, as in the case of hypothesis finding.

Quantity as surrogate for quality

Surrogate science does not end with statistical tests.
Research assessment exercises tend to create surrogates as well. Like citation counts.
The evident danger is that hiring committees and advisory broads study these surrogate numbers rather than the papers written by job candidates and faculty members.
With citation as surrogate for quality, some truly original work may go unheeded.
Surrogates transform science by warping researchers’ goals.

Conclusion: Leibniz’s dram of Bayes’ nightmare?

Surrogate science, from the mindless calculation of p values or Bayes factors to citation counts, is not entirely worthless.
It fuels a steady stream of work of average quality and keeps researchers busy producing more of the same.

But it makes it harder for scientists to be innovative, risk taking, and imaginative.
By transforming researchers’ goals, surrogates encourage cheating and incomplete or dishonest reporting.

Join World Supporter
Join World Supporter
Log in or create your free account

Waarom een account aanmaken?

  • Je WorldSupporter account geeft je toegang tot alle functionaliteiten van het platform
  • Zodra je bent ingelogd kun je onder andere:
    • pagina's aan je lijst met favorieten toevoegen
    • feedback achterlaten
    • deelnemen aan discussies
    • zelf bijdragen delen via de 7 WorldSupporter tools
Follow the author: SanneA
Comments, Compliments & Kudos

Add new contribution

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.
Promotions
vacatures

JoHo kan jouw hulp goed gebruiken! Check hier de diverse studentenbanen die aansluiten bij je studie, je competenties verbeteren, je cv versterken en een bijdrage leveren aan een tolerantere wereld

More contributions of WorldSupporter author: SanneA
WorldSupporter Resources
WSRt, critical thinking - a summary of all articles needed in the fourth block of second year psychology at the uva

WSRt, critical thinking - a summary of all articles needed in the fourth block of second year psychology at the uva

Image

This is a summary of the articles and reading materials that are needed for the fourth block in the course WSR-t. This course is given to second year psychology students at the Uva. The course is about thinking critically about how scientific research is done and how this could be done differently.