Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiggpdhcznb6luwblswh3afbbjivrh5v265fbxosnwhod3efky7wxm",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mid6fvwsds62"
  },
  "path": "/t/trying-to-understand-statistical-methods-through-the-lens-of-replication/28665#post_5",
  "publishedAt": "2026-03-30T16:50:17.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "https://doi.org/10.1002/sim.4780110705",
    "https://doi.org/10.1111/j.1467-9280.2005.01650.x",
    "Replication and <i>p</i> Intervals: <i>p</i> Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better",
    "https://doi.org/10.1198/tas.2011.10129",
    "P-values in genomics: Apparent precision masks high uncertainty | Molecular Psychiatry",
    "The fickle P value generates irreproducible results | Nature Methods",
    "Bayesian prediction intervals for assessing P-value variability in prospective replication studies | Translational Psychiatry",
    "https://doi.org/10.1080/00031305.2019.1678521",
    "https://doi.org/10.1002/sim.9406",
    "https://doi.org/10.1056/EVIDoa2300003"
  ],
  "textContent": "f2harrell:\n\n> I … only care about the probability that the treatment works given whatever we currently know.\n\nThis is of course my main objective too, estimating the probability that the treatment works if we were to repeat the study impeccably with infinitely large sample sizes. Under these conditions, the probability of ‘replication’ by the treatment again to be barely working or clearly better (i.e. barely or clearly better than placebo) is arithmetically equal to 1 - P. (Erik’s expression and mine give the same result in this regard.) The probable impeccability of the study is estimated by considering a list of potential flaws and hopefully finding evidence that each of such flaws is improbable (i.e. by using a probability version of the disjunctive syllogism). Mayo calls this severe testing. If all the rival possibilities are of low probability (including of non-replication) then in the absence of something not considered, we could assume that the treatment probably works. However, if this severe testing fails because of evidence of biases etc., then one can turn to Bayesian modelling to assess what would happen without the flaws. I outline this in the discussion of the paper.\n\nBy the way, in case I am accused of misquoting David Spiegelhalter, I should point that the view about P values and hypotheses was not his but mine!\n\nHere is a more comprehensive list of potential references to the literature on replication as you suggest. Is there anything important missing?\n\n_Goodman, S. N. (1992). A comment on replication, p-values and evidence. Statistics in Medicine, 11(7), 875–879. https://doi.org/10.1002/sim.4780110705_\n\n_Killeen PR. (2005) An alternative to null-hypothesis significance tests. Psychological Science. 2005;16:345–353._\n\n_Cumming, G. (2005). Understanding the average probability of replication: Comment on Killeen (2005). Psychological Science, 16(12), 1002–1004. https://doi.org/10.1111/j.1467-9280.2005.01650.x_\n\n_Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3(4), 286–300._ Replication and <i>p</i> Intervals: <i>p</i> Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better\n\n_Boos, D. D., & Stefanski, L. A. (2011/2012). P-Value Precision and Reproducibility. The American Statistician, 65(4), 213–221. https://doi.org/10.1198/tas.2011.10129. Published in the 2011 issue, with the final edited form appearing in 2012._\n\n_Lazzeroni, L. C., Lu, Y., & Belitskaya-Lévy, I. (2014). P-values in genomics: apparent precision masks high uncertainty. Molecular Psychiatry, 19(12), 1336–1340._ P-values in genomics: Apparent precision masks high uncertainty | Molecular Psychiatry\n\n_Halsey, L. G., Curran-Everett, D., Vowler, S. L., & Drummond, G. B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12(3), 179–185._ The fickle P value generates irreproducible results | Nature Methods\n\n_Vsevolozhskaya, O. A., Ruiz, G., & Zaykin, D. V. (2017). Bayesian prediction intervals for assessing P-value variability in prospective replication studies. Translational Psychiatry, 7, Article 1271._ Bayesian prediction intervals for assessing P-value variability in prospective replication studies | Translational Psychiatry\n\n_Segal, B. D. (2021). Toward Replicability With Confidence Intervals for the Exceedance Probability. The American Statistician, 75(2), 128–138. https://doi.org/10.1080/00031305.2019.1678521_\n\n_van Zwet, E. W., & Goodman, S. N. (2022). How large should the next study be? Predictive power and sample size requirements for replication studies. Statistics in Medicine, 41(16), 3090–3101. https://doi.org/10.1002/sim.9406_\n\n_van Zwet, E., Gelman, A., Greenland, S., Imbens, G., Schwab, S., & Goodman, S. N. (2024). A New Look at P Values for Randomized Clinical Trials. NEJM Evidence, 3(1), EVIDoa2300003. https://doi.org/10.1056/EVIDoa2300003, Epub December 22, 2023._\n\n_Berrar, D. (2024). Estimating the Replication Probability of Significant Classification Benchmark Experiments. Journal of Machine Learning Research, 25(311), 1–42._",
  "title": "Trying to understand statistical methods through the lens of replication"
}