External Publication

Change the range not the language on confidence intervals

Datamethods Discussion Forum [Unofficial] February 26, 2026

Agreed; however we need to stop identification of P-values with “tests”. P-values are just measures of fit; their habitual identification with “statistical significance” and “hypothesis tests” and of intervals constructed from them with “confidence intervals” are what I see as the chief culprits in statistical overinterpretation and misinterpretation.

A P-value is but one measure of fit or compatibility (Karl Pearson) or consonance (Oscar Kempthorne) or consistency (DR Cox) [albeit it may be the oldest such measure, as it predates the notion of “significance test” by a good century or so; even the term “value of P” (Pearson 1900) predates Fisher’s testing interpretation by a few decades]. By looking at P-values across a parameter range we can construct a compatibility interval showing all target-parameter values that have p>0.05 when all background assumptions are held fixed.

Any interpretation beyond that (e.g., “Type-I error”, “power”, “confidence”) requires much added baggage that is in no way inherent in the P-value concept, baggage such as the demanding requirements of Neyman’s repeated-sampling set-up. In sum, that P-values can be used to construct statistical tests does not mean P-values should be viewed as tests. And certainly any in-depth analysis demands more measures of fit than just P-values, such as those described in your book!

Discussion in the ATmosphere