---
title: "Clustering-Based Predictive Process Monitoring"
type: source
tags: [ppm, outcome-prediction, clustering, framework]
authors: [Di Francescomarino Chiara; Dumas Marlon; Maggi Fabrizio Maria; Teinemaa Irene]
year: 2016
venue: "IEEE Transactions on Services Computing (approximate; exact venue uncertain from this preprint)"
kind: paper
raw_path: "raw/Predictive process monitoring/clustering based predictive process monitoring.pdf"
created: 2026-04-13
updated: 2026-04-13
key_claims:
  - Training a separate classifier per prefix length at runtime incurs significant overhead that precludes high-throughput use.
  - A two-phase offline framework — cluster prefixes by control flow, then train one classifier per cluster — reduces runtime overhead and retains accuracy.
  - Validated on a log from cancer patient treatment in a large hospital.
---

# Di Francescomarino et al. — Clustering-Based PPM

Framework for **outcome prediction** that addresses the runtime-efficiency problem of prefix-specific classifiers: cluster prefixes by control-flow similarity offline, train one classifier per cluster, then map running cases to clusters at inference time.

## Contribution
- **Offline phase** — cluster completed prefixes (k-means, DBSCAN, model-based variants); train one classifier per cluster.
- **Online phase** — match running case to its nearest cluster; apply that classifier.
- **Predicates** generalised — temporal logic (LTL) constraints, time constraints, or any boolean function over completed traces.

## Significance
An early framework paper contrasting **universal vs clustered** PPM. Its offline/online decomposition pattern reappears in the Apromore Nirdizati PPM tool ([[sources/2018-verenich-apromore-ppm]]).

## Connections
**Concepts:** [[concepts/outcome-prediction]] · [[concepts/trace-encoding]]
**Authors:** [[entities/chiara-di-francescomarino]] · [[entities/marlon-dumas]] · [[entities/fabrizio-maggi]] · [[entities/irene-teinemaa]]