Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreid7gxicqtagc2ltdamjahl73uwrujismolm67pl7x23snl5hzg744",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mjc2iytilkt2"
  },
  "path": "/t/sample-size-in-prognostic-factor-research/28591#post_9",
  "publishedAt": "2026-04-12T09:05:26.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "Richard Riley et al in the BMJ",
    "Riley et al, arxiv 2025",
    "CHA2DS2-VASc score"
  ],
  "textContent": "I see that Frank Harrell has already suggested the papers that directly address your question, Richard Riley et al in the BMJ and their updated thoughts on the matter after years of deliberation, Riley et al, arxiv 2025.\n\nkoray_durak:\n\n> As these are usually explanatory models and we adjust based on subject matter knowledge, what if I am still worried of overfitting and want to use less degrees of freedom (all sample size formulas I know are for prediction models).\n\nI should emphasize that the only defensible way to test a new prognostic factor is to add it into a prediction model that includes other known, easily obtainable predictors and see if it adds value. So sample size formulas for prediction models are in fact what you need.\n\nHaving developed several prediction models that are in daily clinical use, I would add a point that I didn’t necessarily feel that was emphasized as much as I would have liked in the prediction modeling literature that I read before doing my own: consider carefully what information the **users** of your prediction model will have access to at the time of deciding to use your prediction model and consider the **cost** of obtaining the necessary predictors.\n\nI would suggest you do a formal sample size calculation, as described above in the papers by Riley at al, and use this to obtain the maximum candidate predictor variables you can formally consider. I would then decide which predictors to fill your allotted _budge_ t by choosing among the predictors that clinical expertise and prior literature have suggested are important, considering availability of the predictor in contemporary practice (will the user have to obtain predictors they wouldn’t normally measure just to use your model), cost of obtaining the predictor (would obtaining the predictor be costly in time, complexity, pain or discomfort, or money), and acceptability of the predictor to users (will the user understand the relevance of the predictor to the decision that needs to be made). Obviously you can’t even consider predictors that cannot be available to the user at the time of using your model (such as response to chemotherapy), but I don’t think I need to emphasize this.\n\nFinally, consider deeply what your aim is. I am both a clinician and a developer of clinical prediction models and the most common _error_ I see is prediction modeling development is not considering **how** or **why** a clinician would consider using your model. Your stated aim is severely testing the added value of a new biomarker in the prediction of one year mortality. You have already done what the majority of researchers refuse to do – decided to test your biomarker extensively in the correct framework. However, consider how or why a clinician would then use this biomarker? What decision does one year mortality inform?\n\nEvery prediction model and every biomarker in clinical use is couched in some kind of clinical decision (whether this is justified or not). The results of a model should inform a decision. If NT-proBNP is elevated in the setting of dyspnea, this pushes me in the direction of diuretics. If the CHA2DS2-VASc score is elevated predicting a higher risk of stroke in atrial fibrillation, this pushes me to prescribe anticoagulants.\n\nWhat does your prediction model push clinicians to do? What will the elevated biomarker or normal levels of the biomarker inform clinicians of? If you have trouble answering these questions, then the likelihood of the model or biomarker being used clinically is small.",
  "title": "Sample size in prognostic factor research"
}