---
title: LSTM / RNN for PPM
type: concept
tags: [ppm, deep-learning, lstm, rnn]
sources: []
created: 2026-04-13
updated: 2026-04-13
---

# LSTM / RNN for PPM

Recurrent neural network approaches — particularly **LSTM** (Long Short-Term Memory) and **GRU** — became the dominant paradigm in [[concepts/predictive-process-monitoring|PPM]] after Tax et al. 2017.

## Why RNNs for PPM
- Naturally handle **variable-length sequences** (no padding-to-max needed at inference).
- Learn **activity embeddings** jointly with the prediction task.
- Capture **long-range dependencies** better than N-gram Markov models.
- Easily extended to **multi-task learning** — joint next-activity + timestamp prediction.

## Canonical architectures

### Single-task LSTM (Tax et al. 2017 — [[sources/2017-tax-lstm-process-prediction]])
- Input: one-hot activity + relative time features per step.
- Single LSTM layer → softmax output for next-activity OR regression head for remaining time.

### Multi-task LSTM
- Shared LSTM trunk, separate heads for next-activity + remaining-time.
- Regularises via auxiliary task.

### Data-aware LSTM (Navarin et al., Camargo, Pasquadibisceglie)
- Event attributes (categorical → embeddings; numerical → normalized) concatenated to activity features before LSTM.
- Outperforms control-flow-only models on remaining-time regression.

### Stacked / bidirectional variants
Marginal gains; rarely justify the training cost on typical BPM logs.

## Limitations
- Sequential training — slow compared to Transformers.
- Vanishing gradients on very long traces.
- Hyperparameter-sensitive; hyperparameter optimisation (NSGA-II, genetic algorithms) non-trivially improves results (Di Francescomarino et al., Senderovich et al.).
- **Loss-function sensitivity** — unweighted MAE is standard but does not optimise for earliness; temporally weighted L1 variants (exponential / power / moderate decay) can improve earliness at the cost of temporal consistency ([[sources/2023-riess-temporal-loss-remaining-cycle-time]]).

## Superseded by
**[[concepts/transformer-ppm]]** (2021+) for long-range dependency modelling and training parallelism, though LSTM remains a competitive baseline.

## Related
[[concepts/predictive-process-monitoring]] · [[concepts/transformer-ppm]] · [[concepts/trace-encoding]]