Advice when models perform similarly but would treat different patients?
Datamethods Discussion Forum [Unofficial]
May 17, 2026
Tim this is not where we want to be in 2026 but I’ll leave it at that.
Being brutally honest about the performance of methods that need far more information than the data can provide, which is what you are trying to do, is the second-best thing.
See this where I use the bootstrap to get confidence intervals on the difference in predicted probabilities from two methods, where each method’s models are refit from scratch for each bootstrap repetition.
See the end of this where a single graph is used to show hugely varying predictions depending on how prostate cancer risk factors are modeled. Michael Kattan and others did a lot of work in prostate cancer prognostication that demonstrates such phenomena.
Since your specialty areas what’s to persist in using terrible statistical methods, showing that the proposed stepwise method is incapable of finding the “right” variables is of the utmost importance to your audience. The bootstrap can help with this. Re-simulation is also valuable. Use a fitted model and pretend it’s the truth. Simulate new datasets of the same size as your real dataset and do independent analyses on each, showing extreme volatility of the process, comparing each analysis result to the known truth. The re-simulation uses all the original X but simulates new Y under the model.
I’d like to hear more about the Bayesian tree. Is it actually producing a simple tree? If so what is undoubtedly happening is a “truth in advertising” problem in which the predictions have to be conservative (lower predictive discrimination) for the calibration of predictions to be perfect.
Discussion in the ATmosphere