Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreic77ysomz2e22tvpuvqfcw7upesl2xojoap6pe2o6sdg5wmgom62q",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mfucsq2ujh72"
  },
  "path": "/t/change-the-range-not-the-language-on-confidence-intervals/27740?page=2#post_32",
  "publishedAt": "2026-02-27T10:08:43.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "@Sander"
  ],
  "textContent": "Thank you @Sander for clarifying your position. I agree with your central caution: P-values and interval estimates are purely model-conditional quantities. They quantify the relationship between observed data and model predictions, not the truth of a model, nor the total uncertainty inherent in a scientific question. When uncontrolled biases, misspecification, or nonrandom treatment assignment are present – as in most observational research – any inferential interpretation must be grounded in the design and data-generation context rather than attributed to the statistical procedure alone.\n\nWhere I differ is not on that caution, but on what the realized interval represents _within_ its model-conditional framework.\n\nYou argue that describing intervals as “uncertainty measures” is misleading because they do not account for all sources of uncertainty, especially those arising from design limitations. I agree that they do not capture structural, causal, or model uncertainty. However, conditional on the specified model and assumptions, a 95% interval is precisely the set of parameter values whose associated test statistics are not rejected at the P=0.05 threshold (or whatever threshold we choose). In that sense, the interval is the inversion of a family of significance (or divergence) tests and defines a range of hypotheses that are less discordant with the observed data under the assumed sampling model. That is a statement about uncertainty – specifically, sampling uncertainty – albeit conditional and limited.\n\nThe distinction, then, may lie in the referent of “uncertainty.” I am not claiming that the interval measures total epistemic uncertainty about the world. Rather, it quantifies uncertainty about which parameter values remain viable given the data and the assumed model. That seems different from claiming “support” in a strong evidential sense. Indeed, I share your concern that language such as “supported by the data” can easily drift into evidential overstatement.\n\nRegarding “compatibility,” I appreciate your point that it is weaker than “certainty” or “support,” and therefore less prone to over-interpretation. But I question whether it avoids misinterpretation in practice. In ordinary language, compatibility suggests coherence or agreement in a broad sense. Statistically, however, compatibility is defined narrowly: the divergence measure underlying the test does not exceed the chosen threshold. The everyday meaning may therefore also risk inflation beyond its formal definition. In that respect, I am not convinced that “compatibility” is inherently safer than “uncertainty.”\n\nYour saturated-model example is important. A model with p=1 demonstrates that a P-value reflects only concordance between data and model predictions under a chosen discrepancy measure. It does not imply plausibility of the model in any contextual or scientific sense. I agree entirely. But this reinforces, rather than undermines, the view that intervals quantify uncertainty _about parameters within a model_ , not uncertainty about whether the model is correct. A saturated model eliminates sampling variability in residuals; it does not eliminate uncertainty about whether the model is meaningful.\n\nPerhaps the core issue is this: should terms like “uncertainty,” “confidence,” or “credibility” be reserved only for settings with defensible design features such as randomization? I agree that design justifies inferential interpretation. But even in well-designed randomized trials, interval estimates remain model-based constructions derived from hypothetical repetition. They quantify sampling variability under those assumptions. In observational settings, the same is true, though the interpretation must be more guarded. The limitation arises from the assumptions, not from the word “uncertainty” itself.\n\nIf anything, I worry that abandoning the language of uncertainty may obscure what intervals actually do: they delineate the imprecision inherent in estimation under a specified model. That imprecision exists whether or not the model is fully adequate. The remedy for overselling is not necessarily terminological substitution, but explicit articulation of assumptions and design features alongside the statistical summaries.\n\nSo perhaps a reconciliation is possible:\n\na) “Compatibility” accurately describes the narrow statistical relationship between data and model predictions.\n\nb) “Uncertainty” accurately describes the variability of parameter estimates under the assumed sampling process.\n\nc) Neither term should be interpreted as measuring total scientific uncertainty or evidential support.\n\nd) Both require explicit acknowledgment of design quality and model assumptions.\n\nTerminology does matter, but so does precision about what level of uncertainty we are discussing: sampling, model, causal, or epistemic. My concern is not to inflate what intervals provide, but to preserve clarity that their primary function is to express the limits of precision in estimation under stated assumptions.",
  "title": "Change the range not the language on confidence intervals"
}