--- title: "PHI403 Lecture 11 — Is More Data Better?" type: source tags: [philosophy-of-science, induction, singularism, universalism, cartwright, external-validity] authors: [Anjum, Rani Lill; Rocca, Elena] year: 2023 venue: "PHI403 Causation in Science, NMBU" kind: handout raw_path: "raw/Philosophy of Science/PHI302 11 Is More Data Better.pdf" created: 2026-04-20 updated: 2026-04-20 key_claims: - Humean causation is universal (covering-law) and therefore favours quantitative methods and big data. - The problem of induction: any finite sample underdetermines any universal causal claim (the "marble balls in a bag" illustration). - Cartwright & Hardie (Evidence Based Policy): positive results in India didn't transfer to Bangladesh because mothers-in-law, not mothers, cook — a failure of external validity. - Dispositionalist singularism favours qualitative methods, case studies, mechanism, and N=1 — each causal set-up is unique. --- # PHI403 Lecture 11 — Is More Data Better? The data-quantity question, reframed as an ontological one. Two views of [[concepts/causation|causation]] disagree on the answer: - **Humean universalism** — *C causes E* iff all instances of *C* are followed by *E*. Causation is general and derived from perfect correlations. Hence: more data is better, representative sampling is essential, external validity is achievable, quantitative methods win. - **Dispositionalist singularism** — each causal set-up has its own set of dispositions and manifestation partners; causation occurs in the single unique instance. Hence: understanding context, mechanism, and individual tendencies matters more than accumulating sample size. The problem of induction is illustrated by a bag of 1000 marbles — only after examining every marble do we know its colour distribution with certainty. Any pre-complete conclusion is inductively fallible. **Cartwright & Hardie** (*Evidence Based Policy: A Practical Guide to Doing it Better*) supply the key failure case: a nutrition-education-for-mothers program worked in India but not in Bangladesh — because **in Bangladesh the mother-in-law cooks the food**. Without local causal/contextual knowledge, even a well-run RCT's results do not transfer. This is the canonical statement of the **[[concepts/rct-limitations|external-validity problem]]**. The singularist's answer: the problem of induction is not about lacking a perfect data set, but about the ever-present possibility of interference. ## Connections Back-link: [[sources/2023-anjum-rocca-phi403-causation-in-science]]. Concepts: [[concepts/dispositionalism]] · [[concepts/regularity-theory-of-causation]] · [[concepts/rct-limitations]] · [[concepts/causation]].