External Publication
Visit Post

Collider in RCT Subgroup Analysis

Datamethods Discussion Forum [Unofficial] April 2, 2026
Source

Johannes_Schwenke:

I’ll try to simulate some data to wrap my head around this.

Yeah, this why I chose to model the confounding influence of U on the baseline variables that we know at time 0. While simpler than modeling the interaction, it has very high payoff in practice to focus on this.

Johannes_Schwenke:

In the example you give, if EGFR mutation was the only cause of oncogenic EGFR, would we still get a biased estimate in a subgroup of patients with oncogenic EGFR signaling? I would think not as in this instance this would be equivalent to an RCT in patients with EGFR mutation, would it not?

Correct. This is also related to @f2harrell’s comment:

f2harrell:

It seems to me that if you could have done a meaningful and easily interpretable clinical trial on the subgroup in question, you should be able to figure out interaction effects involving that factor in a larger trial.

Indeed, we exactly discuss and formalize this point in Section 3.4 here using the example of HER2. Notice that contextual knowledge from correlative and functional lab research is needed to choose the subgroup and develop the therapy for it. Hence the focus on that paper on transporting such knowledge across domains.

Johannes_Schwenke:

I’m also somewhat confused by the treatment indicator causing EGFR signaling. Temporally, I would assume EGFR signaling to already be present / absent before randomization (?)

Notice the qualifier oncogenic EGFR signaling (not just all EGFR signaling which exists in normal cells). The oncogenic mutations on the tyrosine kinase domain of EGFR induce oncogenic EGFR signaling that can then be targeted (causally modified) by EGFR tyrosine kinase inhibitors.

I had forgotten that the Impervious to Randomness paper focused on teasing out oncogenic EGFR signaling. The DAG was drawn in my head during a day hike with my then soon-to-be wife around Santorini on 7/18/2021. I drew it on the piece of paper below and then wrote that manuscript as a way of not forgetting this concept. But as shown here (from 1:34:00 onwards) that EGFR pathway mental dissection allowed us subsequently (in May 2022) to come up with the most powerful therapy developed to date for renal medullary carcinoma – the deadliest kidney cancer in adolescents and adults. There are patients alive today (some even cancer free) that would otherwise no longer be with us if not for this.

Once we started thinking in a structured way about randomizing a patient’s covariates to remove these confounders this led to sampling theory. Then we spent a lot of time thinking about the implications of random sampling versus random treatment assignment and wrote this very long paper to summarize these points.

Depressingly, this line of thinking then allowed me to recognize the oxymoronic nature of randomized non-comparative trials (RNCTs). To this day, I struggle to convince some biostatisticians why RNCTs are such a bad idea. These DAGs are one method of communicating these concepts but they still need attention and may not work for everyone. Different tools may be a better fit for at least some people.

Discussion in the ATmosphere

Loading comments...