{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibkpkut7cicu3nonwa5yijm4vzqvzi7h2k3lyvxsfmkglz5nrtg3u",
"uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mewlmttvlel2"
},
"path": "/t/significance-versus-hypothesis-testing/28638#post_6",
"publishedAt": "2026-02-15T03:07:44.000Z",
"site": "https://discourse.datamethods.org",
"tags": [
"Amrhein, Trafimow, and Greenland"
],
"textContent": "First a technical point:\n\n> In Ronald Fisher’s framework, the p-value measures strength of divergence from the hypothesized null…\n\nNot quite. As Amrhein, Trafimow, and Greenland noted, the p-value also depends on model specification, which has many, many failure modes.\n\n> Yes, a small _P_ -value may arise because the null hypothesis is false. But it can also mean that some mathematical aspect of the model was not correctly specified, that sampling was not a hundred percent random, that we accidentally switched the names of some factor levels, that we unintentionally, or intentionally, selected analyses that led to a small _P_ -value (downward “P-hacking”), that we did not measure what we think we measured, or that a cable in our measuring device was loose…. And a large _P_ -value may arise from mistakes and procedural errors, such as selecting analyses that led to a large _P_ -value (upward P-hacking), or using a measurement so noisy that the relation of the measured construct to anything else is hopelessly obscured.\n\nAlso how do you propose we plan a study without NP-influenced ideas such as power analysis or precision analysis? As a consulting statistician, “how many subjects” is one of the most common reasons a scientist asks me for help.",
"title": "Significance versus hypothesis testing"
}