The philosophy behind hypothesis testing

2021-04-21 4 min read

I read a few interesting articles this week on the Fisher-Neyman debate on the foundation of hypothesis testing:

The first paper is written by Erich Lehmann and argues that the practical aspects of the Fisher and Neyman-Pearson approaches to testing statistical hypotheses are “complementary rather than contradictory”. I agree with this verdict, but I think what is more interesting and useful for modern statisticians is the basic philosophical differences. Lehmann summarized this as “inductive inference versus inductive behaviour”:

Both Neyman and Fisher considered the distinction between “inductive behavior” and “inductive inference” to lie at the center of their disagreement. In fact, in writing retrospectively about the dispute, Neyman (1961, p. 142) said that “the subject of the dispute may be symbolized by the opposing terms “inductive reasoning” and “inductive behavior.” How strongly Fisher felt about this distinction is indicated by his statement in Fisher (1973, p. 7) that “there is something horrifying in the ideological movement represented by the doctrine that reasoning, properly speaking, cannot be applied to empirical data to lead to inferences valid in the real world.”

Unfortunately, Lehmann did not offer his own thoughts on this matter. Fisher’s reference to “the ideological movement” is also quite puzzling. The second paper offers some very interesting insights into this remark:

Fisher was among a number of prominent Anglo-American geneticists who had been attacked by the Soviets in the early 1930s for their elaboration of ‘bourgeois genetics’. Eighteen years later, rumours that Lysenko had a hand in the death of a scientific critic, Nicolai Vavilov, drew him to the Lysenko debate. In 1948, Fisher, along with JBS Haldane and CD Darlington, did a BBC broadcast about the controversy. Fisher argued that Lysenko was using his political influence with Stalin to intimidate his scientific opponents. Where intimidation was insufficient, Fisher charged, Lysenko helped to see that ‘many Russian geneticists’ were ‘put to death either with or without pre-treatment in a concentration camp.’ Fisher devoted the bulk of this inflammatory speech to explaining Lysenko’s actions as the product of a system where scientific judgements were made by political leaders.

The paper then describes Fisher’s vicious attempt to associate Neyman’s statistical ideas with Soviet ideology (Neyman was originally from Poland): Neyman was, Fisher argued, “importing from Eastern Europe his misconceptions as to the nature of scientific research”.

Leaving the political and personal issues aside, I do think Fisher made some valuable points in his general argument. In particular, I generally agree with the verdict in this article:

RA Fisher saw the development of statistical inference as the 20th century’s great contribution to the classical problem of induction: how do we gain reliable empirical knowledge of the world? For Fisher, valid statistical inferences must not only meet certain mathematical conditions; they should also remind us of the provisional character of our knowledge—rigorous uncertainty.

The third article is written by a philosopher and offers a lot of additional insights. It argues that Fisher and Neyman were fundamentally different by their perception of mathematical modelling. Fisher put the scientific problem first and aimed to establish a logical basis for scientific induction. Neyman, on the other hand, was more interested in mathematizing behaviour and decision. I find Fisher’s points to be generally more interesting, perhaps because most of my training in mathematical statistics follows Neyman’s thinking. I find the following quote from Fisher’s cornerstone 1922 paper particularly sobering:

In short: the framing by means of a model is located at the beginning of the statistical treatment of a problem of application: ‘The postulate of randomness thus resolves itself into the question, ‘‘Of what population is this a random sample?’’ which must frequently be asked by every practical statistician’.

I think statisticians’ long-term struggle with educating the scientific community how to correctly use the p-value is a reminisce of the Fisher-Neyman debate. The simplistic accept-or-reject language that plagues scientific research traces all the way back to the Neyman-Pearson lemma. Lehmann’s article has an interesting discussion on this and points out that Pearson indeed admitted that the terms “acceptance” and “rejection” were perhaps unfortunately chosen. To be fair, Fisher also used these words in his writing (but not in a formal way). And it is part of human nature that we want to hear a simple yes-or-no conclusion, so perhaps we end up exactly the same position even if Neyman-Pearson used a different terminology.

Fisher imagined an utopia where science is done objectively by “free intellects” (using his Cold War rhetoric) and scientists can rationally interpret all the statistical analyses, but the truth is: almost no one is a genius like him. I am curious what Fisher would say about Thomas Kuhn’s groundbreaking The Structure of Scientific Revolutions, if he had not died in the same year that Kuhn’s book came out.

Philosophy

The philosophy behind hypothesis testing

Related