---
title: "Predictive Business Process Monitoring with LSTM Neural Networks"
type: source
tags: [ppm, lstm, deep-learning, next-activity, remaining-time, foundational]
authors: [Tax Niek; Verenich Ilya; La Rosa Marcello; Dumas Marlon]
year: 2017
venue: "CAiSE 2017 / arXiv:1612.02130v2"
kind: paper
raw_path: "raw/Predictive process monitoring/tax2017.pdf"
duplicate_of_raw: "raw/Predictive process monitoring/Predictive Business Process Monitoring with LSTM Neural Networks.pdf"
created: 2026-04-13
updated: 2026-04-13
key_claims:
  - A single LSTM architecture can solve next-activity, suffix, and remaining-time prediction — not just one task at a time.
  - LSTMs consistently outperform tailor-made baselines across four real-life event logs.
  - Full-case continuation can be predicted by recursive application of a next-activity LSTM.
  - Remaining cycle time can be framed as a derived quantity of the continuation prediction.
---

# Tax et al. 2017 — Predictive Business Process Monitoring with LSTM Neural Networks

Seminal LSTM-PPM paper; published at CAiSE 2017. Often cited as the reference point for neural [[concepts/predictive-process-monitoring|PPM]].

## Contribution
- A **single LSTM architecture** handles multiple prediction tasks via different output heads:
  1. [[concepts/next-activity-prediction|Next activity]] + its timestamp.
  2. **Case continuation (suffix)** — recursive next-activity prediction.
  3. [[concepts/remaining-time-prediction|Remaining cycle time]] — derived from continuation.
- Empirical comparison against tailor-made baselines (transition systems, non-parametric regression, stochastic Petri nets) on four real-life logs. LSTM wins consistently.
- Addresses the "tailor-made and non-generalisable" critique of prior PPM techniques — accuracy of classical methods is highly dataset- and prefix-length-sensitive.

## Method
- Input: one-hot activity + relative-time features (time-since-previous-event, time-in-case) per step.
- Single or stacked LSTM layer.
- Output heads: softmax (next activity) + linear (timestamp).
- Greedy recursive decoding for suffix generation.

## Results (from abstract & intro)
- Outperforms Evermann et al. 2017 ([[sources/2017-evermann-deep-learning-runtime]]) for next-activity.
- Outperforms van der Aalst et al. annotated-transition-system for remaining time.
- Outperforms Polato et al. for the same task.

## Lineage
- **Predecessor:** Evermann et al. 2017 — first application of LSTM to PPM (single task only).
- **Successors:** Camargo et al. 2019 (data-aware LSTM), Navarin et al. 2017 (data-aware remaining time), ProcessTransformer 2021 ([[sources/2021-bukhsh-processtransformer]]).

## Cited by
- [[sources/2023-riess-temporal-loss-remaining-cycle-time]] — Riess uses the Tax et al. LSTM-prefix-log approach as the baseline architecture for his temporal-loss experiments.
- [[sources/2024-riess-synbps-simulation-framework]] — Tax et al.'s qualitative finding about repeated-activity sequences degrading LSTM performance is the motivating example for SynBPS's controlled-variable simulation framework.

## Connections
**Concepts:** [[concepts/predictive-process-monitoring]] · [[concepts/next-activity-prediction]] · [[concepts/remaining-time-prediction]] · [[concepts/lstm-ppm]]
**Authors:** [[entities/niek-tax]] · [[entities/ilya-verenich]] · [[entities/marcello-la-rosa]] · [[entities/marlon-dumas]]
**Related sources (in this wiki):** [[sources/2017-evermann-deep-learning-runtime]], [[sources/2017-navarin-lstm-data-aware-remaining-time]], [[sources/2017-senderovich-intra-inter-case]], [[sources/2021-bukhsh-processtransformer]]