---
title: RCT Limitations
type: concept
tags: [philosophy-of-science, rct, evidence-based-medicine, external-validity, ecological-fallacy]
sources: ["[[sources/2023-anjum-rocca-phi403-causation-in-science]]", "[[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show]]", "[[sources/2023-anjum-rocca-phi403-lecture-11-is-more-data-better]]"]
created: 2026-04-20
updated: 2026-04-20
---

# RCT Limitations

Randomised Controlled Trials sit at the top of the **[[concepts/evidence-hierarchy|EBM evidence hierarchy]]** and are treated as the gold standard for establishing causation in medicine and many social sciences. The PHI403 course argues ([[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show]]) that RCTs systematically *exclude* categories of causally relevant information, so results cannot be applied universally without caveats.

## What RCTs exclude

- **Severe effects** — ethical constraints prevent testing (air pollution, seatbelts). Cf. the satirical *BMJ* paper "Parachute use to prevent death… systematic review of RCTs".
- **Risk groups** — individuals known to be vulnerable (allergic patients, children, pregnant women, the very sick) are excluded from the trial, so adverse effects in these populations are not evidenced.
- **Variations and marginals** — RCTs test one causal factor and one type of effect; individual variation and heterogeneity must be ignored so that intra-group homogeneity can be assumed.
- **Individual-level causation** — RCTs generate statistical frequencies at the **group level**. Inferring individual propensity from a population average is the **[[concepts/ecological-fallacy|ecological fallacy]]**.
- **Negative results** — trials finding no effect are rarely published. Any systematic review of RCTs is biased toward showing intervention effectiveness and understating risks.

## External validity

External validity is the most serious single issue. Cartwright & Hardie (*Evidence Based Policy*): *"it worked there, but will it work here?"* ([[sources/2023-anjum-rocca-phi403-lecture-11-is-more-data-better]]). Example: a nutrition-education-for-mothers program succeeded in India and failed in Bangladesh — because in Bangladesh the mother-in-law cooks. Without mechanistic/contextual knowledge, transfer from trial to field is unjustified.

## Policy is not derivable from RCT results

Meta-analyses can rank known interventions by effect size, but **the choice of policy does not follow from facts alone** — a further normative step is required. Assuming otherwise conflates *is* with *ought*.

## Relevance to ML and BPM evaluation

The same critique applies to evaluating predictive/prescriptive process monitoring models on public event-log benchmarks ([[sources/2020-rama-maneiro-deep-learning-ppm-review]] uses 12 public logs as community standard):

- **External validity** — a model benchmarked on BPI Challenge logs may not generalise to a specific organisation's process.
- **Ecological fallacy** — an aggregate AUC on the test set says nothing about the model's reliability for an individual running case.
- **Negative / null results** under-reporting — the same publication bias operates in ML as in clinical research.
- Benchmark contamination and selection bias — flagged as an open concern in the [[sources/2026-calvanese-agentic-bpm-manifesto|APM Manifesto]] (challenge C3).

## Related
[[concepts/causation]] · [[concepts/evidence-hierarchy]] · [[concepts/methodological-pluralism]] · [[concepts/probabilistic-causation]] · [[concepts/dispositionalism]] · [[concepts/philosophical-bias]]