{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreicbpawwr532tzlraiaj6hh7uvxp6uncchx3wxkhapz7yrgpizucge",
"uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3ml6l22hiast2"
},
"path": "/t/censored-binomial-models/28732#post_1",
"publishedAt": "2026-05-05T22:39:27.000Z",
"site": "https://discourse.datamethods.org",
"tags": [
"here"
],
"textContent": "Hi all,\n\nSuppose I have the following problem:\n\n 1. I have 100 surgeons, each of whom perform N{i} procedures in a given year to treat a particular condition.\n 2. For our particular condition, they have the option of performing 2 procedures: A vs B\n 3. I am interested in modelling the probability that a given physician would choose to perform A rather than B based on a given set of predictor variables.\n 4. I can not simply use a binomial model/classic logistic regression because the database censors all records of a surgeon doing 10 or fewer procedures per year. For example, if surgeon performs operation A 16 times and operation B 7 times, I would be able to see that they did operation A 16 times but I would only know that they did operation B 10 times or less. Only surgeons with N of 11 or more are included, so you always know at least one of A or B. This sort of censoring is often done when there are concerns re: confidentiality on public databases.\n\n\n\nOne sensible approach I’ve found is outlined here. The approach basically consists of using the standard binomial likelihood for all exact observations (in this case, anything ≥11) and using the cumulative distribution function for all censored observation (<11).\n\nAnother intuition that I had (which I suspect is wrong) is to just use an ordinal logistic model to directly model the count of procedure A done (where the outcome variable <10, 11, 12, etc.) while controlling for the total number (A + B = N) of procedures performed by a given surgeon. I suspect this is not quite right because adjusting for N it doesn’t factor in the fact that A is always ≤ N.\n\nI’m wondering if anyone has tackled a similar issue before or whether they have any alternative suggestions on how to tackle the problem.",
"title": "Censored binomial models"
}