Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiduntw2ojeie2ev5vp5tovkwgeizlfqb23s3tevksadbut52kg344",
    "uri": "at://did:plc:wwyqal4cnqhuwyacdj7rqq3n/app.bsky.feed.post/3mf2m5pln5kg2"
  },
  "path": "/t/rms-semiparametric-ordinal-longitudinal-model/4819?page=6#post_110",
  "publishedAt": "2026-02-16T11:32:17.000Z",
  "site": "https://discourse.datamethods.org",
  "tags": [
    "Ye et al. 2023",
    "Bluesky",
    "Score Based Approach to Wild Bootstrap Inference"
  ],
  "textContent": "I’ve been toiling away on an R Package for easier modeling of longitudinal ordinal outcomes with VGAM. If one wants to target the average treatment effect (ATE), interval estimation can be very time-intensive with nonparametric bootstrap refitting: for a trial with about 1,000 patients, it can take around 30 minutes without parallelization, even with fairly optimized/vectorized code.\n\nI therefore explored alternatives, including drawing coefficient vectors from a (cluster-robust) multivariate normal (MVN). That is much faster (often by at least an order of magnitude).\n\nHowever, after reading Ye et al. 2023 and discussion on Bluesky, my\nunderstanding is that MVN simulation mainly propagates coefficient uncertainty while conditioning on the observed covariate distribution X (i.e., closer to a sample average treatment effect (?)). In preliminary simulations under a superpopulation setup, I see slight CI undercoverage with this approach, which could worry some stakeholders.\n\nIf I understand correctly, nonparametric bootstrap should capture variability in X, as we resample X, but it just takes too long when simulating many scenarios, even on 80+ cores.\n\nSo I’ve been experimenting with a one-step score-based approach related to Score Based Approach to Wild Bootstrap Inference, and I’d appreciate feedback on whether this is theoretically reasonable.\n\nMy approach is as follows: We approximate uncertainty by simulation.\n\n  * Draw cluster (patient) weights w_g \\sim \\mathrm{Exp}(1) (exponential multiplier weights, analogous to Bayesian bootstrap weighting).\n\n  * Compute centered multipliers u_g = w_g - 1, and form a perturbed score at \\hat\\beta: \\sum_g u_g S_g(\\hat\\beta), where S_g is the patient-aggregated score contribution.\n\n  * Use one Newton step with the model’s variance-covariance information to get approximate perturbed coefficients instead of fully refitting.\n\n  * Repeat this many times to generate \\beta^{*}, then run the state-occupancy calculation pipeline.\n\n  * For marginalization, reuse the same draw’s cluster weights (mapped to baseline patients and normalized), so variability in X is propagated along with coefficient uncertainty.\n\n\n\n\nI know that the Bayesian Bootstrap doesn’t actually have the goal of frequentist coverage… But from some initial simulations things do look better than when simulating from an MVN. However, I usually assume that there is some theoretical arguments against most things one could be doing. So, is there a clear theoretical reason this approach should be avoided for frequentist inference on superpopulation-targeted marginal effects? Thanks!",
  "title": "RMS Semiparametric Ordinal Longitudinal Model"
}