External Publication

Significance versus hypothesis testing

Datamethods Discussion Forum [Unofficial] February 15, 2026

First a technical point:

In Ronald Fisher’s framework, the p-value measures strength of divergence from the hypothesized null…

Not quite. As Amrhein, Trafimow, and Greenland noted, the p-value also depends on model specification, which has many, many failure modes.

Yes, a small P -value may arise because the null hypothesis is false. But it can also mean that some mathematical aspect of the model was not correctly specified, that sampling was not a hundred percent random, that we accidentally switched the names of some factor levels, that we unintentionally, or intentionally, selected analyses that led to a small P -value (downward “P-hacking”), that we did not measure what we think we measured, or that a cable in our measuring device was loose…. And a large P -value may arise from mistakes and procedural errors, such as selecting analyses that led to a large P -value (upward P-hacking), or using a measurement so noisy that the relation of the measured construct to anything else is hopelessly obscured.

Also how do you propose we plan a study without NP-influenced ideas such as power analysis or precision analysis? As a consulting statistician, “how many subjects” is one of the most common reasons a scientist asks me for help.

Discussion in the ATmosphere