Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

Mindless statistical inference
The Idol of Universal Method of Inference
How statistics changed theories: the probabilistic revolution
The Null Ritual
The three meanings of significance
The problem of conflicting methods
Bayesianism and the new quest for a universal method
Risk versus uncertainty
The automatic Bayes
The statistical toolbox
How to change statistics?

The application of statistics to science is not a neutral act. Textbook writers in the social sceiences have transformed rivaling statistical systems into an apparently monolithic method that could be used mechaniscaly. No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas.

If statisticans agree on one thing it is that scientific inference should not be made mechanically. Good science requires both statistical tools and informed judgment about what model to construct, what hypotheses to test, and what tools to use. Many social scientist vote with their feet against an informed use of inferential statistics. A majority still computes p values, confindence intervals and a few calculate Bayes factors. Determining significance has become a surrogate for good research. This article is about the idol of a universal method of statistical inference.

Mindless statistical inference

In an internet study they asked participants if they felt a difference between heroism and altruism? The far majority felt so, and the authors computed a chi-squared test to find out whether the two numbers differed significantly. This is an illustration of the automatic use of statistical procedures, even when a statistical procedure really doesn't fit into the question. The idol of an automatic, universal method of inference, however, is not unique to p values or confidence intervals. It can also invade the Bayesian statistics.

The Idol of Universal Method of Inference

In this article they make three points:

There is no universal method of scientific inference, but, rather a toolbox of useful statistical methods. In the absence of a universal method, its followers worship surrogate idols, such as significant p values. The gap between the ideal and its surrogate rested on the wrong ideas people have regarding statistical inference. For instance that a p value of 1% indicates that there is a 99% chance of replication.
If the proclaimed 'Bayesian revolution' were to take place, the danger is that the idol of a universal method might survive in a new guise, proclaiming that all uncertainty can be reduced to subjective probabilities.
Statistical methods are not simply applied to a discipline; they change the discipline itself and vice versa.

In science and everyday life, statistical methods have changed whatever they touched. The most dramatic change brought about by statistics was the 'probabilistic revolution'. In the natural science, the term statistical began to refer to the nature of theories, not the evaluation of data.

How statistics changed theories: the probabilistic revolution

The probabilistic revolution upset the ideal of determinism shared by most European thinkers. It differs from other revolutions because it didn't replace any systems in its own field. But it did upset theories in other fields outside of mathemetics. The social sciences inspired the probabilistic revolution in physics. But the social and medical sciences were reluctant to abandon the ideal of simple, deterministic causes. The social theorists hesitated to think of probability as more than an error term in the equation observation = true value + error.

The term inference revolution refers to a change in scientific method that was instutionalized in psychology and in other social sciences. The qualifier inference indicates that the inference of a sample to population grew to be considered the most crucial part of research.

To understand how deeply the inference revolution changed the social sciences, it is helpful to realize that routine statistical tests, such as calculations of p values or other inferential statistics, are not common in the natural sciences.

The first known test of a null hypothesis was by Arbuthnott and is strikingly similar to the 'null ritual' that was instutionalized in the social sciences. He observed that the external accidents which males are subject do make a great havock of them, and that this loss exceeds far that of the other Sex. The first null hypothesis test impressed no one, but this does not say that statistical methods have played no role in the social sciences. To summarize, statistical inference played little role and Bayesian inference virtually none in research before roughly 1940. Automatic inference was unknown before the inferental revolution with the exception of the use of critical ratio (the ratio of the obtained difference to its standard deviation).

The Null Ritual

The most prominent creation of a seemingly universal inference method is the null ritual:

Set up a null hypothesis of 'no mean differences' or 'zero correlation'. Do not specify the predictions of your own research hypothesis.
Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis. Report the result as p<.05, p<.01, p<.001, whichever comes next to the obtained p value.
Always perform this procedure.

In psychology, this ritual became institutionalized in currricula, editorials and professional associations. But the null ritual does not exist in statistics proper. Also the null ritual is often confused with the Fisher's thoery of null hypothesis testing. For example, it has become common to use the term NHST (null hypothesis significance testing) without distinguishing between the two. But contrary to what is suggested by that misleading term, level of significance has three meanings: (a) a mere convention, (b) the alpha level, or (c) the exact level of significance.

The three meanings of significance

The alpha level is the long-term relative frequency of mistakenly rejecting the hypothesis H1 if it is true, also known as the Type 1 error rate. The beta level is the long-term relative frequency of mistakenly rejecting hypothesis H2 if it is true, also known as the type 2 error rate or power- 1.

Set up two statistical hypotheses, H1 and H2 and decide on the alpha, beta and sample size before the experiment.
If the data falls into the rejection region of H1, accept H2; otherwise accept H1.
The usefulness of this procedure is limited among other situations were there is a conjunction of hypothese, where there is repeated sampling.

Fisher eventually refined his earlier position. The result was that a third definition of level of significance, alongside convention and alpha level.

Set up a statistcal null hypothesis. The null need not be a nil hypothesis.
Report the exact level of significance, do not use a conventional 5% level all the time.
Use this procedure only if you know little about the problem at hand.

The procedure of Fisher differs fundamentally from the null ritual. First, one should not automatically use the same level of significance, and second, one should not use this procedure for all problems. Step one of the ritual does contain the misinterpretation that null means 'nil' such as zero difference.

The problem of conflicting methods

When writers learned about Neyman-Pearson these writers had a problem; how should they deal with conflicting methods? The solution would have been to present a sort of toolbox of different approaches, but Guilford and Nunnally mixed the concepts and presented the muddle as a single, universal method. The idol of this universal method also left no place for Bayesian statistics.

Bayesianism and the new quest for a universal method

Fisher, Neyman and Pearson also have been victims of social scientists' desire for a single tool, a desire that produced a surrogate number for inferring what is good research. The potential danger of the Bayesian statistics lies in the subjective interpretation of probability, which sanctions its universal application to all situations of uncertainty.

The 'Bayesian revolution' had a slow start. To begin with, Bayes' paper was eventually published, but it was largely ignored by all scientists. Just as the null ritual had replaced the three interpretations of level of significance with one, the currently dominant version of Bayesianism does the same with the Bayesian pluralism, promoting a universal subjective interpretation instead. Probability was:

a relative frequency in the long run, such as in mortality tables used for calculating insurance premiums
a propensity, that is, the physical design of an object, such as that of a dice or a billiard table
a reasonable degree of subjective belief, such as in the attempts of courts to quantify the reliability of witness testimony.

In the essay of Bayes, his notion of probability is ambiguous and can be read in all three ways. With this ambiguity, however, is typical for his time in which the classical theory of probability reigned.

If probability is thought of as a relative frequency in the long run, it immediately becomes clear that Bayes' rule has a limited range of applications. Knight (economist) used the term risk for these two situations (i.e. probabilities that can be reliable measured in terms of frequency or propensity) as opposed to uncertainty. Subjective probability can be applied to situations of uncertainty and to singular events, such as the probability that Michael Jackson is still alive. There is now a new generation of Bayesians who believe that Bayesianism is the only game in town. They use the term Universal Bayes for the view that all uncertainties can or should be represented by subjective probabilities, that explicitly rejects the idea of Knight regarding the distinction between risk and uncertainty.

Risk versus uncertainty

What the universal Bayesians do not seem to realize is that in a theory Bayesianism can be optimal in a world of risk, but is of uncertain value when not all information is known or can be known or when probabilities have to be estiated form small, unreliable samples. One can also use plain common sense to see that complex optimization algorithms are unreliable in an uncertain world.

The automatic Bayes

As with the null ritual, the universal claim for Bayes' rule tends to go together with the automatic use. One version of the automatic Bayes has to do with the interpretation of the Bayes factors using the Jeffrey's scale. A second version of Automatic Bayes can be found in the heuristic-and-biases research program, that is widely taught in business education courses. But, in short, the automatic use of Bayes'rule is a dangerously beautiful idol. But Bayesianism is not reality, Bayesianism can't exist in the singular.

The statistical toolbox

The view of this article states that an alternative to these approaches is to think of the Universal and Automatic Bayes as forming a part of a larger toolbox. In this toolbox, the Bayes' rules has its value, but like any other tool, does not work for all problems.

How to change statistics?

Leibniz had a dream: to discover the calculus that could map all ideas into symbols. Such a universal calculus would also put an end to all scholarly bickering. But, nonetheless, this dream of Leibniz is still alive in social sciences today. The idea of surrogate science; from the mindless calculation of p values or Bayes factors to citation counts, is not entirely worthless. It fuels a steady stream of work of average quality and keeps researchers busy producing more of the same. But it also makes it harder for scientists to be innovative, risk taking and imaginative. Therefor surrogates also encourage cheating and incomplete or dishonest reporting. Would a Bayesian revolution lead to a better world? The answer depends on what the revolution might be. The real challenge here is to prevent the surrogates from taking over once again, such as when replacing routine significance tests with routine interpretations of the Bayes factors. So, Leibniz's beautiful dream of a universal calculus could easily turn into Bayes' nightmare.

Access:

Public

Check more: click and go to more related summaries or chapters

Summaries: the best scientific articles for research, science and statistics summarized

Critical Thinking in Quasi-Experimentation - Shadish - 2008 - Article

Causal inference and developmental psychology - Foster - 2010 - Article

Evaluating theories - Dennis & Kintsch - 2008 - Article

Karl Popper and Demarcation - Dienes - 2018 edition - Article

Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

Beyond the Null Ritual: Formal Modeling of Psychological Processes - Marewski & Olsson - 2009 - Article

Simpson’s Paradox in Psychological Science: A Practical Guide - Kievit - 2013 - Article

Introduction to qualitative psychological research - an article by Coyle (2015)

False-positive psychology: Undiscovered flexibility in data collection and analysis allows presenting anything as significant - Simmons et al. - 2011 - Article

Check more: click and go to more related summaries or chapters

Article summaries of Scientific & Statistical Reasoning - UvA

Understanding Psychology as a Science - Dienes - 2008 - Article

False-positive psychology: Undiscovered flexibility in data collection and analysis allows presenting anything as significant - Simmons et al. - 2011 - Article

Causal inference and developmental psychology - Foster - 2010 - Article

Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

Critical Thinking in Quasi-Experimentation - Shadish - 2008 - Article

The two disciplines of scientific psychology - Cronbach - 1957 - Article

Simpson’s Paradox in Psychological Science: A Practical Guide - Kievit - 2013 - Article

Beyond the Null Ritual: Formal Modeling of Psychological Processes - Marewski & Olsson - 2009 - Article

Evaluating theories - Dennis & Kintsch - 2008 - Article

Karl Popper and Demarcation - Dienes - 2018 edition - Article

Scaling - Furr & Bacharach - 2014 - Article

Statistical treatment of football numbers - Lord - 1935 - Article

Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice - LeBel & Peters - 2011 - Article

Introduction to qualitative psychological research - Coyle - 2015 - Article

Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

Summaries of articles with Scientific and Statisitical Reasoning at the University of Amsterdam 20/21

Understanding Psychology as a Science - Dienes - 2008 - Article

False-positive psychology: Undiscovered flexibility in data collection and analysis allows presenting anything as significant - Simmons et al. - 2011 - Article

Causal inference and developmental psychology - Foster - 2010 - Article

Confounding and deconfounding: or, slaying the lurking variable - Pearl - 2018 - Article

Critical Thinking in Quasi-Experimentation - Shadish - 2008 - Article

The two disciplines of scientific psychology - Cronbach - 1957 - Article

Simpson’s Paradox in Psychological Science: A Practical Guide - Kievit - 2013 - Article

Beyond the Null Ritual: Formal Modeling of Psychological Processes - Marewski & Olsson - 2009 - Article

Evaluating theories - Dennis & Kintsch - 2008 - Article

Karl Popper and Demarcation - Dienes - 2018 edition - Article

Scaling - Furr & Bacharach - 2014 - Article

Statistical treatment of football numbers - Lord - 1935 - Article

Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice - LeBel & Peters - 2011 - Article

Introduction to qualitative psychological research - Coyle - 2015 - Article

Surrogate Science: The Idol of a Universal Method for Scientific Inference - Gigerenzer - 2015 - Article

Join: WorldSupporter!

Join with a free account for more service, or become a member for full access to exclusives and extra support of WorldSupporter >>

Check: concept of JoHo WorldSupporter

Concept of JoHo WorldSupporter

JoHo WorldSupporter mission and vision:

JoHo wants to enable people and organizations to develop and work better together, and thereby contribute to a tolerant and sustainable world. Through physical and online platforms, it supports personal development and promote international cooperation is encouraged.

JoHo concept:

As a JoHo donor, member or insured, you provide support to the JoHo objectives. JoHo then supports you with tools, coaching and benefits in the areas of personal development and international activities.
JoHo's core services include: study support, competence development, coaching and insurance mediation when departure abroad.