--- title: RCT Limitations type: concept tags: [philosophy-of-science, rct, evidence-based-medicine, external-validity, ecological-fallacy] sources: ["[[sources/2023-anjum-rocca-phi403-causation-in-science]]", "[[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show]]", "[[sources/2023-anjum-rocca-phi403-lecture-11-is-more-data-better]]"] created: 2026-04-20 updated: 2026-04-20 --- # RCT Limitations Randomised Controlled Trials sit at the top of the **[[concepts/evidence-hierarchy|EBM evidence hierarchy]]** and are treated as the gold standard for establishing causation in medicine and many social sciences. The PHI403 course argues ([[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show]]) that RCTs systematically *exclude* categories of causally relevant information, so results cannot be applied universally without caveats. ## What RCTs exclude - **Severe effects** — ethical constraints prevent testing (air pollution, seatbelts). Cf. the satirical *BMJ* paper "Parachute use to prevent death… systematic review of RCTs". - **Risk groups** — individuals known to be vulnerable (allergic patients, children, pregnant women, the very sick) are excluded from the trial, so adverse effects in these populations are not evidenced. - **Variations and marginals** — RCTs test one causal factor and one type of effect; individual variation and heterogeneity must be ignored so that intra-group homogeneity can be assumed. - **Individual-level causation** — RCTs generate statistical frequencies at the **group level**. Inferring individual propensity from a population average is the **[[concepts/ecological-fallacy|ecological fallacy]]**. - **Negative results** — trials finding no effect are rarely published. Any systematic review of RCTs is biased toward showing intervention effectiveness and understating risks. ## External validity External validity is the most serious single issue. Cartwright & Hardie (*Evidence Based Policy*): *"it worked there, but will it work here?"* ([[sources/2023-anjum-rocca-phi403-lecture-11-is-more-data-better]]). Example: a nutrition-education-for-mothers program succeeded in India and failed in Bangladesh — because in Bangladesh the mother-in-law cooks. Without mechanistic/contextual knowledge, transfer from trial to field is unjustified. ## Policy is not derivable from RCT results Meta-analyses can rank known interventions by effect size, but **the choice of policy does not follow from facts alone** — a further normative step is required. Assuming otherwise conflates *is* with *ought*. ## Relevance to ML and BPM evaluation The same critique applies to evaluating predictive/prescriptive process monitoring models on public event-log benchmarks ([[sources/2020-rama-maneiro-deep-learning-ppm-review]] uses 12 public logs as community standard): - **External validity** — a model benchmarked on BPI Challenge logs may not generalise to a specific organisation's process. - **Ecological fallacy** — an aggregate AUC on the test set says nothing about the model's reliability for an individual running case. - **Negative / null results** under-reporting — the same publication bias operates in ML as in clinical research. - Benchmark contamination and selection bias — flagged as an open concern in the [[sources/2026-calvanese-agentic-bpm-manifesto|APM Manifesto]] (challenge C3). ## Related [[concepts/causation]] · [[concepts/evidence-hierarchy]] · [[concepts/methodological-pluralism]] · [[concepts/probabilistic-causation]] · [[concepts/dispositionalism]] · [[concepts/philosophical-bias]]