--- title: "Process Mining: Data Science in Action (2nd ed.)" type: source tags: [bpm, process-mining, textbook, canonical, foundational, data-science] authors: [van der Aalst, Wil] year: 2016 first_edition: 2011 venue: "Springer, Berlin Heidelberg" kind: book raw_path: "raw/Process Frameworks & BPM/Van der Aalst - 2011 - Process Mining.pdf" raw_path_duplicate: "raw/mixed/Van der Aalst - 2011 - Process Mining.pdf" isbn: "978-3-662-49850-7" doi: "10.1007/978-3-662-49851-4" created: 2026-04-20 updated: 2026-04-20 key_claims: - Process mining is the missing link between data science and process science; it sits between data mining/ML and BPM/process modelling. - There are three fundamental types of process mining - discovery (log → model), conformance (log + model → deviations), and enhancement (log + model → repaired/extended model). - Four orthogonal perspectives enrich every mining task - control-flow, organisational, case, and time. - Event logs are the primary substrate; the XES standard defines their structure, and data-quality issues (trust, completeness, granularity) govern achievable mining quality. - The α-algorithm is the foundational discovery algorithm but is brittle; mature alternatives include heuristic miner, genetic miner, region-based miners (state-based and language-based), and inductive miner. - Conformance checking has matured beyond token replay to alignments (cost-optimal mappings between log and model) and footprint comparison, with quality dimensions fitness, precision, generalisation, simplicity. - Operational support extends process mining from offline analysis to online detect / predict / recommend, closing the BPM life-cycle. - Processes are not in steady state - concept drift, seasonality, and contextual factors must be handled. - Real-world logs fall into two archetypes - Lasagna processes (structured, well-behaved) and Spaghetti processes (unstructured, high variability). - Process mining scales to big event data via decomposition (case-based, activity-based), process cubes, and streaming mining. --- # Process Mining: Data Science in Action (2nd edition, 2016) The second foundational BPM textbook in this wiki (alongside [[sources/2018-dumas-fundamentals-of-bpm]]). Van der Aalst's single-author treatise — the definitive reference for the **process mining** tradition that he founded. 16 chapters across 6 parts, ~460 pages. First edition appeared in 2011 under the title *Process Mining: Discovery, Conformance and Enhancement of Business Processes*; the 2016 second edition expands it into a data-science framing, introduces process trees and alignments, and adds a full chapter on process mining at scale. This is the **raw file in `raw/`** — despite the filename referencing 2011, the PDF content is the 2nd edition (2016). Both editions are conventionally referenced as "Van der Aalst's Process Mining book". ## Why it matters here This book is the **data-driven counterpart** to [[sources/2018-dumas-fundamentals-of-bpm|Dumas et al. (2018)]]: where Dumas organises the BPM life-cycle model-first, Van der Aalst organises it log-first. Together they form the operational-frame baseline of this wiki. It is also the textbook expansion of the [[sources/2012-vanderaalst-process-mining-manifesto|Process Mining Manifesto (2012)]] — same author/community, same vocabulary, but from short position paper to full pedagogy. It anchors the entire `raw/Predictive process monitoring/` corpus: PPM extends this book's chapter on operational support (Ch. 10) from offline diagnostics into prospective prediction of running cases. ## Ingest strategy Meta-ingest per the textbook two-step pattern (see `CLAUDE.md`). This page summarises the whole book and its novel abstractions; chapter-level deep-dives are filed on demand as separate source pages with slug `sources/2011-vanderaalst-process-mining-ch-.md` back-linking here. ## Author [[entities/wil-van-der-aalst]] — single author; founding figure of process mining. ## Chapter index (coarse resolution) | Part | Ch | Title | Pages | Key abstractions | |---|---|---|---|---| | I Introduction | 1 | Data Science in Action | 3–23 | Internet of Events; data scientist; process science vs data science; Moore's law | | I | 2 | Process Mining: The Missing Link | 25–52 | Three mining types (discovery/conformance/enhancement); BPM life-cycle; four perspectives (control-flow/org/case/time); Petri-net running example; positioning vs BPM, data mining, Lean Six Sigma, BPR, CEP, BI, GRC, ABPD/BPI/WM, Big Data | | II Preliminaries | 3 | Process Modeling and Analysis | 55–88 | Transition systems; Petri nets; workflow nets; YAWL; BPMN; EPCs; **causal nets**; **process trees**; model-based verification and performance analysis | | II | 4 | Data Mining | 89–118 | Instances/variables; supervised (decision trees, regression); unsupervised (k-means, association rules); sequence & episode mining; quality (cross-validation, Occam) | | III From Logs to Models | 5 | Getting the Data | 125–160 | Data sources; event logs; **XES standard**; **data quality** (trust, completeness, granularity); flattening reality into logs | | III | 6 | Process Discovery: An Introduction | 163–192 | Problem statement; **α-algorithm**; rediscoverability; representational bias; noise/incompleteness; four quality criteria (fitness, precision, generalisation, simplicity); 2-D slice of 3-D reality | | III | 7 | Advanced Process Discovery Techniques | 195–239 | **Heuristic miner**; **genetic process mining**; **region-based miners** (state-based, language-based); **inductive miner**; historical perspective | | IV Beyond Discovery | 8 | Conformance Checking | 243–273 | Business alignment; **token replay**; **alignments**; **footprint comparison**; applications (repair, evaluation, connecting log to model) | | IV | 9 | Mining Additional Perspectives | 275–298 | Org perspective (social network analysis, role discovery); time/probability; decision mining; bringing it all together | | IV | 10 | Operational Support | 301–322 | **Refined process mining framework** (cartography, auditing, navigation); online mining; **detect**, **predict**, **recommend**; non-stationarity & **concept drift**; full process mining spectrum | | V Putting PM to Work | 11 | Process Mining Software | 325–352 | Tool taxonomy; **ProM** (open-source); commercial (Celonis, Disco, myInvenio, …); outlook | | V | 12 | Process Mining in the Large | 353–385 | Big event data; N=all; case-based & activity-based decomposition; process cubes; streaming mining | | V | 13 | Analyzing "Lasagna Processes" | 387–409 | Characterisation of structured/well-behaved processes; 5-stage methodology; applications by sector | | V | 14 | Analyzing "Spaghetti Processes" | 411–429 | Characterisation of unstructured/high-variability processes; fuzzy mining; examples | | VI Reflection | 15 | Cartography and Navigation | 431–445 | Business process maps; map quality; aggregation/abstraction; seamless zoom; process mining as "TomTom for business processes" | | VI | 16 | Epilogue | 447–451 | Mining as bridge between data mining and BPM; challenges; actionable next steps | ## Novel abstractions & why they matter Relative to [[methods/process-mining-basics]] (already seeded from Dumas 2018 and the Manifesto), this book introduces or deepens: - **Alignments** (§8.3) — cost-optimal mapping between log traces and model paths; has largely supplanted token replay as the state-of-the-art conformance technique. - **Footprint comparison** (§8.4) — log-vs-model comparison at the direct-succession-matrix level. - **Inductive miner** (§7.5) — the dominant modern discovery algorithm; guarantees soundness and handles incomplete logs via log splitting. - **Heuristic miner** (§7.2) — robust to noise via dependency thresholds. - **Genetic process mining** (§7.3) and **region-based miners** (§7.4) — alternative discovery families. - **Process trees** (§3.2.8) — hierarchical representation with composition operators; the target language for inductive mining. - **Causal nets (C-nets)** (§3.2.7) — representation combining direct succession with AND/XOR bindings. - **Four quality criteria** (§6.4.3) — fitness, precision, generalisation, simplicity — Van der Aalst's crisp formalisation of the over/under-fitting trade-off. - **Refined process mining framework** (§10.1) — three activities (cartography, auditing, navigation) × two regimes (offline/online) = the **Process Mining Spectrum**. - **Operational support: detect / predict / recommend** (§10.3–10.5) — the runtime side of mining; the conceptual foundation of [[concepts/predictive-process-monitoring|PPM]]. - **Concept drift** (§10.6) — non-stationary processes; sub-chapter that grounds later PPM work on drift handling. - **Lasagna vs Spaghetti processes** (Chs. 13–14) — a widely-used shorthand characterising process regularity. - **Data-quality taxonomy** (§5.4) — classification of event-log issues; drives log-preparation practice. - **Process cubes & streaming mining** (Chs. 12) — scaling mining to big event data. New shared pages created from this meta-ingest: - [[concepts/conformance-checking]] — covering alignments, token replay, footprints, four quality dimensions. - [[concepts/operational-support]] — detect/predict/recommend; concept drift. - [[concepts/lasagna-spaghetti-processes]] — characterising process regularity. - [[concepts/process-mining-spectrum]] — refined framework (cartography/auditing/navigation × offline/online). Deferred for chapter deep-dives: inductive miner internals (Ch. 7), alignments math (Ch. 8), full operational-support formalism (Ch. 10), process cubes (Ch. 12). ## Relation to the Process Mining Manifesto (2012) The book is the pedagogical counterpart to [[sources/2012-vanderaalst-process-mining-manifesto|van der Aalst et al. 2012]]. Same author, same IEEE Task Force community, same three-type taxonomy (discovery / conformance / enhancement). The Manifesto is a ~25-page rallying call; this book is the ~460-page textbook grounding. They cross-reference each other as position paper ↔ definitive reference and can be treated as canonical pair. ## Connections **Concepts:** [[concepts/business-process]] · [[concepts/bpm-lifecycle]] · [[concepts/process-discovery]] · [[concepts/process-model-quality]] · [[concepts/conformance-checking]] · [[concepts/operational-support]] · [[concepts/lasagna-spaghetti-processes]] · [[concepts/process-mining-spectrum]] · [[concepts/predictive-process-monitoring]] · [[concepts/declarative-process-modelling]] **Frameworks:** [[frameworks/bpmn]] · [[frameworks/declare]] **Methods:** [[methods/process-mining-basics]] · [[methods/process-discovery-methods]] **Entities:** [[entities/wil-van-der-aalst]] **Related sources:** [[sources/2018-dumas-fundamentals-of-bpm]] (model-first sibling) · [[sources/2012-vanderaalst-process-mining-manifesto]] (manifesto precursor) · [[sources/2021-dumas-process-mining-2-from-insights-to-action]] (next-generation evolution) · the entire `raw/Predictive process monitoring/` corpus builds on Ch. 10. ## Open questions / future deep-dive candidates - Chapter 7 (advanced discovery) — warrants dedicated ingest for the **inductive miner** (dominant algorithm today). - Chapter 8 (conformance) — warrants a deep-dive on alignment cost functions and their applications. - Chapter 10 (operational support) — the conceptual parent of the whole PPM literature; deserves a deep-dive to ground the PPM synthesis. - Chapter 12 (process mining in the large) — connects to OCEL 2.0 work ([[sources/2023-berti-ocel-2-specification]]).