--- title: "Predictive Business Process Monitoring with LSTM Neural Networks" type: source tags: [ppm, lstm, deep-learning, next-activity, remaining-time, foundational] authors: [Tax Niek; Verenich Ilya; La Rosa Marcello; Dumas Marlon] year: 2017 venue: "CAiSE 2017 / arXiv:1612.02130v2" kind: paper raw_path: "raw/Predictive process monitoring/tax2017.pdf" duplicate_of_raw: "raw/Predictive process monitoring/Predictive Business Process Monitoring with LSTM Neural Networks.pdf" created: 2026-04-13 updated: 2026-04-13 key_claims: - A single LSTM architecture can solve next-activity, suffix, and remaining-time prediction — not just one task at a time. - LSTMs consistently outperform tailor-made baselines across four real-life event logs. - Full-case continuation can be predicted by recursive application of a next-activity LSTM. - Remaining cycle time can be framed as a derived quantity of the continuation prediction. --- # Tax et al. 2017 — Predictive Business Process Monitoring with LSTM Neural Networks Seminal LSTM-PPM paper; published at CAiSE 2017. Often cited as the reference point for neural [[concepts/predictive-process-monitoring|PPM]]. ## Contribution - A **single LSTM architecture** handles multiple prediction tasks via different output heads: 1. [[concepts/next-activity-prediction|Next activity]] + its timestamp. 2. **Case continuation (suffix)** — recursive next-activity prediction. 3. [[concepts/remaining-time-prediction|Remaining cycle time]] — derived from continuation. - Empirical comparison against tailor-made baselines (transition systems, non-parametric regression, stochastic Petri nets) on four real-life logs. LSTM wins consistently. - Addresses the "tailor-made and non-generalisable" critique of prior PPM techniques — accuracy of classical methods is highly dataset- and prefix-length-sensitive. ## Method - Input: one-hot activity + relative-time features (time-since-previous-event, time-in-case) per step. - Single or stacked LSTM layer. - Output heads: softmax (next activity) + linear (timestamp). - Greedy recursive decoding for suffix generation. ## Results (from abstract & intro) - Outperforms Evermann et al. 2017 ([[sources/2017-evermann-deep-learning-runtime]]) for next-activity. - Outperforms van der Aalst et al. annotated-transition-system for remaining time. - Outperforms Polato et al. for the same task. ## Lineage - **Predecessor:** Evermann et al. 2017 — first application of LSTM to PPM (single task only). - **Successors:** Camargo et al. 2019 (data-aware LSTM), Navarin et al. 2017 (data-aware remaining time), ProcessTransformer 2021 ([[sources/2021-bukhsh-processtransformer]]). ## Cited by - [[sources/2023-riess-temporal-loss-remaining-cycle-time]] — Riess uses the Tax et al. LSTM-prefix-log approach as the baseline architecture for his temporal-loss experiments. - [[sources/2024-riess-synbps-simulation-framework]] — Tax et al.'s qualitative finding about repeated-activity sequences degrading LSTM performance is the motivating example for SynBPS's controlled-variable simulation framework. ## Connections **Concepts:** [[concepts/predictive-process-monitoring]] · [[concepts/next-activity-prediction]] · [[concepts/remaining-time-prediction]] · [[concepts/lstm-ppm]] **Authors:** [[entities/niek-tax]] · [[entities/ilya-verenich]] · [[entities/marcello-la-rosa]] · [[entities/marlon-dumas]] **Related sources (in this wiki):** [[sources/2017-evermann-deep-learning-runtime]], [[sources/2017-navarin-lstm-data-aware-remaining-time]], [[sources/2017-senderovich-intra-inter-case]], [[sources/2021-bukhsh-processtransformer]]