---
title: "Process Mining: Data Science in Action (2nd ed.)"
type: source
tags: [bpm, process-mining, textbook, canonical, foundational, data-science]
authors: [van der Aalst, Wil]
year: 2016
first_edition: 2011
venue: "Springer, Berlin Heidelberg"
kind: book
raw_path: "raw/Process Frameworks & BPM/Van der Aalst - 2011 - Process Mining.pdf"
raw_path_duplicate: "raw/mixed/Van der Aalst - 2011 - Process Mining.pdf"
isbn: "978-3-662-49850-7"
doi: "10.1007/978-3-662-49851-4"
created: 2026-04-20
updated: 2026-04-20
key_claims:
  - Process mining is the missing link between data science and process science; it sits between data mining/ML and BPM/process modelling.
  - There are three fundamental types of process mining - discovery (log → model), conformance (log + model → deviations), and enhancement (log + model → repaired/extended model).
  - Four orthogonal perspectives enrich every mining task - control-flow, organisational, case, and time.
  - Event logs are the primary substrate; the XES standard defines their structure, and data-quality issues (trust, completeness, granularity) govern achievable mining quality.
  - The α-algorithm is the foundational discovery algorithm but is brittle; mature alternatives include heuristic miner, genetic miner, region-based miners (state-based and language-based), and inductive miner.
  - Conformance checking has matured beyond token replay to alignments (cost-optimal mappings between log and model) and footprint comparison, with quality dimensions fitness, precision, generalisation, simplicity.
  - Operational support extends process mining from offline analysis to online detect / predict / recommend, closing the BPM life-cycle.
  - Processes are not in steady state - concept drift, seasonality, and contextual factors must be handled.
  - Real-world logs fall into two archetypes - Lasagna processes (structured, well-behaved) and Spaghetti processes (unstructured, high variability).
  - Process mining scales to big event data via decomposition (case-based, activity-based), process cubes, and streaming mining.
---

# Process Mining: Data Science in Action (2nd edition, 2016)

The second foundational BPM textbook in this wiki (alongside [[sources/2018-dumas-fundamentals-of-bpm]]). Van der Aalst's single-author treatise — the definitive reference for the **process mining** tradition that he founded. 16 chapters across 6 parts, ~460 pages. First edition appeared in 2011 under the title *Process Mining: Discovery, Conformance and Enhancement of Business Processes*; the 2016 second edition expands it into a data-science framing, introduces process trees and alignments, and adds a full chapter on process mining at scale.

This is the **raw file in `raw/`** — despite the filename referencing 2011, the PDF content is the 2nd edition (2016). Both editions are conventionally referenced as "Van der Aalst's Process Mining book".

## Why it matters here

This book is the **data-driven counterpart** to [[sources/2018-dumas-fundamentals-of-bpm|Dumas et al. (2018)]]: where Dumas organises the BPM life-cycle model-first, Van der Aalst organises it log-first. Together they form the operational-frame baseline of this wiki. It is also the textbook expansion of the [[sources/2012-vanderaalst-process-mining-manifesto|Process Mining Manifesto (2012)]] — same author/community, same vocabulary, but from short position paper to full pedagogy.

It anchors the entire `raw/Predictive process monitoring/` corpus: PPM extends this book's chapter on operational support (Ch. 10) from offline diagnostics into prospective prediction of running cases.

## Ingest strategy

Meta-ingest per the textbook two-step pattern (see `CLAUDE.md`). This page summarises the whole book and its novel abstractions; chapter-level deep-dives are filed on demand as separate source pages with slug `sources/2011-vanderaalst-process-mining-ch<N>-<topic>.md` back-linking here.

## Author

[[entities/wil-van-der-aalst]] — single author; founding figure of process mining.

## Chapter index (coarse resolution)

| Part | Ch | Title | Pages | Key abstractions |
|---|---|---|---|---|
| I Introduction | 1 | Data Science in Action | 3–23 | Internet of Events; data scientist; process science vs data science; Moore's law |
| I | 2 | Process Mining: The Missing Link | 25–52 | Three mining types (discovery/conformance/enhancement); BPM life-cycle; four perspectives (control-flow/org/case/time); Petri-net running example; positioning vs BPM, data mining, Lean Six Sigma, BPR, CEP, BI, GRC, ABPD/BPI/WM, Big Data |
| II Preliminaries | 3 | Process Modeling and Analysis | 55–88 | Transition systems; Petri nets; workflow nets; YAWL; BPMN; EPCs; **causal nets**; **process trees**; model-based verification and performance analysis |
| II | 4 | Data Mining | 89–118 | Instances/variables; supervised (decision trees, regression); unsupervised (k-means, association rules); sequence & episode mining; quality (cross-validation, Occam) |
| III From Logs to Models | 5 | Getting the Data | 125–160 | Data sources; event logs; **XES standard**; **data quality** (trust, completeness, granularity); flattening reality into logs |
| III | 6 | Process Discovery: An Introduction | 163–192 | Problem statement; **α-algorithm**; rediscoverability; representational bias; noise/incompleteness; four quality criteria (fitness, precision, generalisation, simplicity); 2-D slice of 3-D reality |
| III | 7 | Advanced Process Discovery Techniques | 195–239 | **Heuristic miner**; **genetic process mining**; **region-based miners** (state-based, language-based); **inductive miner**; historical perspective |
| IV Beyond Discovery | 8 | Conformance Checking | 243–273 | Business alignment; **token replay**; **alignments**; **footprint comparison**; applications (repair, evaluation, connecting log to model) |
| IV | 9 | Mining Additional Perspectives | 275–298 | Org perspective (social network analysis, role discovery); time/probability; decision mining; bringing it all together |
| IV | 10 | Operational Support | 301–322 | **Refined process mining framework** (cartography, auditing, navigation); online mining; **detect**, **predict**, **recommend**; non-stationarity & **concept drift**; full process mining spectrum |
| V Putting PM to Work | 11 | Process Mining Software | 325–352 | Tool taxonomy; **ProM** (open-source); commercial (Celonis, Disco, myInvenio, …); outlook |
| V | 12 | Process Mining in the Large | 353–385 | Big event data; N=all; case-based & activity-based decomposition; process cubes; streaming mining |
| V | 13 | Analyzing "Lasagna Processes" | 387–409 | Characterisation of structured/well-behaved processes; 5-stage methodology; applications by sector |
| V | 14 | Analyzing "Spaghetti Processes" | 411–429 | Characterisation of unstructured/high-variability processes; fuzzy mining; examples |
| VI Reflection | 15 | Cartography and Navigation | 431–445 | Business process maps; map quality; aggregation/abstraction; seamless zoom; process mining as "TomTom for business processes" |
| VI | 16 | Epilogue | 447–451 | Mining as bridge between data mining and BPM; challenges; actionable next steps |

## Novel abstractions & why they matter

Relative to [[methods/process-mining-basics]] (already seeded from Dumas 2018 and the Manifesto), this book introduces or deepens:

- **Alignments** (§8.3) — cost-optimal mapping between log traces and model paths; has largely supplanted token replay as the state-of-the-art conformance technique.
- **Footprint comparison** (§8.4) — log-vs-model comparison at the direct-succession-matrix level.
- **Inductive miner** (§7.5) — the dominant modern discovery algorithm; guarantees soundness and handles incomplete logs via log splitting.
- **Heuristic miner** (§7.2) — robust to noise via dependency thresholds.
- **Genetic process mining** (§7.3) and **region-based miners** (§7.4) — alternative discovery families.
- **Process trees** (§3.2.8) — hierarchical representation with composition operators; the target language for inductive mining.
- **Causal nets (C-nets)** (§3.2.7) — representation combining direct succession with AND/XOR bindings.
- **Four quality criteria** (§6.4.3) — fitness, precision, generalisation, simplicity — Van der Aalst's crisp formalisation of the over/under-fitting trade-off.
- **Refined process mining framework** (§10.1) — three activities (cartography, auditing, navigation) × two regimes (offline/online) = the **Process Mining Spectrum**.
- **Operational support: detect / predict / recommend** (§10.3–10.5) — the runtime side of mining; the conceptual foundation of [[concepts/predictive-process-monitoring|PPM]].
- **Concept drift** (§10.6) — non-stationary processes; sub-chapter that grounds later PPM work on drift handling.
- **Lasagna vs Spaghetti processes** (Chs. 13–14) — a widely-used shorthand characterising process regularity.
- **Data-quality taxonomy** (§5.4) — classification of event-log issues; drives log-preparation practice.
- **Process cubes & streaming mining** (Chs. 12) — scaling mining to big event data.

New shared pages created from this meta-ingest:
- [[concepts/conformance-checking]] — covering alignments, token replay, footprints, four quality dimensions.
- [[concepts/operational-support]] — detect/predict/recommend; concept drift.
- [[concepts/lasagna-spaghetti-processes]] — characterising process regularity.
- [[concepts/process-mining-spectrum]] — refined framework (cartography/auditing/navigation × offline/online).

Deferred for chapter deep-dives: inductive miner internals (Ch. 7), alignments math (Ch. 8), full operational-support formalism (Ch. 10), process cubes (Ch. 12).

## Relation to the Process Mining Manifesto (2012)

The book is the pedagogical counterpart to [[sources/2012-vanderaalst-process-mining-manifesto|van der Aalst et al. 2012]]. Same author, same IEEE Task Force community, same three-type taxonomy (discovery / conformance / enhancement). The Manifesto is a ~25-page rallying call; this book is the ~460-page textbook grounding. They cross-reference each other as position paper ↔ definitive reference and can be treated as canonical pair.

## Connections

**Concepts:** [[concepts/business-process]] · [[concepts/bpm-lifecycle]] · [[concepts/process-discovery]] · [[concepts/process-model-quality]] · [[concepts/conformance-checking]] · [[concepts/operational-support]] · [[concepts/lasagna-spaghetti-processes]] · [[concepts/process-mining-spectrum]] · [[concepts/predictive-process-monitoring]] · [[concepts/declarative-process-modelling]]

**Frameworks:** [[frameworks/bpmn]] · [[frameworks/declare]]

**Methods:** [[methods/process-mining-basics]] · [[methods/process-discovery-methods]]

**Entities:** [[entities/wil-van-der-aalst]]

**Related sources:** [[sources/2018-dumas-fundamentals-of-bpm]] (model-first sibling) · [[sources/2012-vanderaalst-process-mining-manifesto]] (manifesto precursor) · [[sources/2021-dumas-process-mining-2-from-insights-to-action]] (next-generation evolution) · the entire `raw/Predictive process monitoring/` corpus builds on Ch. 10.

## Open questions / future deep-dive candidates
- Chapter 7 (advanced discovery) — warrants dedicated ingest for the **inductive miner** (dominant algorithm today).
- Chapter 8 (conformance) — warrants a deep-dive on alignment cost functions and their applications.
- Chapter 10 (operational support) — the conceptual parent of the whole PPM literature; deserves a deep-dive to ground the PPM synthesis.
- Chapter 12 (process mining in the large) — connects to OCEL 2.0 work ([[sources/2023-berti-ocel-2-specification]]).