Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibkpkut7cicu3nonwa5yijm4vzqvzi7h2k3lyvxsfmkglz5nrtg3u",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mexawshsa2m2"
  },
  "path": "/t/significance-versus-hypothesis-testing/28638#post_6",
  "publishedAt": "2026-02-15T03:07:44.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "Amrhein, Trafimow, and Greenland"
  ],
  "textContent": "First a technical point:\n\n> In Ronald Fisher’s framework, the p-value measures strength of divergence from the hypothesized null…\n\nNot quite. As Amrhein, Trafimow, and Greenland noted, the p-value also depends on model specification, which has many, many failure modes.\n\n> Yes, a small _P_ -value may arise because the null hypothesis is false. But it can also mean that some mathematical aspect of the model was not correctly specified, that sampling was not a hundred percent random, that we accidentally switched the names of some factor levels, that we unintentionally, or intentionally, selected analyses that led to a small _P_ -value (downward “P-hacking”), that we did not measure what we think we measured, or that a cable in our measuring device was loose…. And a large _P_ -value may arise from mistakes and procedural errors, such as selecting analyses that led to a large _P_ -value (upward P-hacking), or using a measurement so noisy that the relation of the measured construct to anything else is hopelessly obscured.\n\nAlso how do you propose we plan a study without NP-influenced ideas such as power analysis or precision analysis? As a consulting statistician, “how many subjects” is one of the most common reasons a scientist asks me for help.",
  "title": "Significance versus hypothesis testing"
}