--- title: "Clustering-Based Predictive Process Monitoring" type: source tags: [ppm, outcome-prediction, clustering, framework] authors: [Di Francescomarino Chiara; Dumas Marlon; Maggi Fabrizio Maria; Teinemaa Irene] year: 2016 venue: "IEEE Transactions on Services Computing (approximate; exact venue uncertain from this preprint)" kind: paper raw_path: "raw/Predictive process monitoring/clustering based predictive process monitoring.pdf" created: 2026-04-13 updated: 2026-04-13 key_claims: - Training a separate classifier per prefix length at runtime incurs significant overhead that precludes high-throughput use. - A two-phase offline framework — cluster prefixes by control flow, then train one classifier per cluster — reduces runtime overhead and retains accuracy. - Validated on a log from cancer patient treatment in a large hospital. --- # Di Francescomarino et al. — Clustering-Based PPM Framework for **outcome prediction** that addresses the runtime-efficiency problem of prefix-specific classifiers: cluster prefixes by control-flow similarity offline, train one classifier per cluster, then map running cases to clusters at inference time. ## Contribution - **Offline phase** — cluster completed prefixes (k-means, DBSCAN, model-based variants); train one classifier per cluster. - **Online phase** — match running case to its nearest cluster; apply that classifier. - **Predicates** generalised — temporal logic (LTL) constraints, time constraints, or any boolean function over completed traces. ## Significance An early framework paper contrasting **universal vs clustered** PPM. Its offline/online decomposition pattern reappears in the Apromore Nirdizati PPM tool ([[sources/2018-verenich-apromore-ppm]]). ## Connections **Concepts:** [[concepts/outcome-prediction]] · [[concepts/trace-encoding]] **Authors:** [[entities/chiara-di-francescomarino]] · [[entities/marlon-dumas]] · [[entities/fabrizio-maggi]] · [[entities/irene-teinemaa]]