Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidq2gc65srvqemkd72ync3rcoprt2om2be5s4ofyz5apag7rcuule",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3meccs7uwbpg2"
  },
  "path": "/t/dichotomization/26337?page=4#post_75",
  "publishedAt": "2026-02-07T20:53:39.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "https://pmc.ncbi.nlm.nih.gov/articles/PMC1112685/#B13"
  ],
  "textContent": "Congratulations on the article. The app you developed to show the effects of dichotomization on sample size should help many researchers.\n\nArguably, the only way to eradicate entrenched bad practices is to trace them back to their origins and then pull them out by the root. The “Context” section of your paper describes how and why responder analysis became a mainstream practice. I looked up some of the papers you cited (e.g., Kieser). It seems like the people who first proposed responder analysis were trying to satisfy regulators’ demands for an analysis that could help them to gauge the _clinical relevance_ of the effects shown by new drugs in RCTs.\n\nKieser notes (boldface is mine):\n\n> _“A number of regulatory guidelines propose that clinical relevance should be assessed by considering the**rate of responders** , that is, the proportion of patients who are observed to achieve an apparently meaningful benefit…”_\n\nThis paper, found through my own research on the history of responder analysis, represented early advocacy for the technique (which was closely linked to promotion of the of the concept of “number needed to treat”):\n\nhttps://pmc.ncbi.nlm.nih.gov/articles/PMC1112685/#B13\n\nDiscussion of a fictional scenario might serve to highlight a key issue.\n\nHypothetical scenario:\n\nAn RCT shows a mean between-arm difference of 6 points, where the outcome is a continuous variable measured on a 100-point scale (the higher the score, the better the patient’s clinical state). The mean baseline-adjusted final score for those in the new drug arm was 6 points better than the final score in the other arm. The trial was considered positive because the sponsor and regulator had agreed, prior to conducting the trial, that a 5-point mean between-arm difference would be considered “clinically meaningful.”\n\nThe researchers then performed an additional analysis, during which they “drilled down” into each arm of the trial, plotting how the score of each patient had changed from the beginning of the trial to the end. They found that the scores of 50% of patients in the new drug arm had changed by only 1 point, 2 points, 3 points, or 4 points over the course of the trial, while the scores of 10% of patients changed by 10 points or more following exposure to the new drug.\n\n**Key conceptual question that lies at the heart of the “responder analysis” controversy:** Does this observation mean that we can infer that the drug “worked exceptionally well” in 10% of patients and “barely at all” for 50% of patients in the trial? Why or why not?\n\nI have my own opinion on the answer to this question and on the paper linked above. I’d be interested to hear the opinions of others on this site (with rationale presented).",
  "title": "Dichotomization"
}