External Publication

Censored binomial models

Datamethods Discussion Forum [Unofficial] May 5, 2026

Hi all, Suppose I have the following problem: 1. I have 100 surgeons, each of whom perform N{i} procedures in a given year to treat a particular condition. 2. For our particular condition, they have the option of performing 2 procedures: A vs B 3. I am interested in modelling the probability that a given physician would choose to perform A rather than B based on a given set of predictor variables. 4. I can not simply use a binomial model/classic logistic regression because the database censors all records of a surgeon doing 10 or fewer procedures per year. For example, if surgeon performs operation A 16 times and operation B 7 times, I would be able to see that they did operation A 16 times but I would only know that they did operation B 10 times or less. Only surgeons with N of 11 or more are included, so you always know at least one of A or B. This sort of censoring is often done when there are concerns re: confidentiality on public databases. One sensible approach I’ve found is outlined here. The approach basically consists of using the standard binomial likelihood for all exact observations (in this case, anything ≥11) and using the cumulative distribution function for all censored observation (<11). Another intuition that I had (which I suspect is wrong) is to just use an ordinal logistic model to directly model the count of procedure A done (where the outcome variable <10, 11, 12, etc.) while controlling for the total number (A + B = N) of procedures performed by a given surgeon. I suspect this is not quite right because adjusting for N it doesn’t factor in the fact that A is always ≤ N. I’m wondering if anyone has tackled a similar issue before or whether they have any alternative suggestions on how to tackle the problem.

Discussion in the ATmosphere