--- title: LSTM / RNN for PPM type: concept tags: [ppm, deep-learning, lstm, rnn] sources: [] created: 2026-04-13 updated: 2026-04-13 --- # LSTM / RNN for PPM Recurrent neural network approaches — particularly **LSTM** (Long Short-Term Memory) and **GRU** — became the dominant paradigm in [[concepts/predictive-process-monitoring|PPM]] after Tax et al. 2017. ## Why RNNs for PPM - Naturally handle **variable-length sequences** (no padding-to-max needed at inference). - Learn **activity embeddings** jointly with the prediction task. - Capture **long-range dependencies** better than N-gram Markov models. - Easily extended to **multi-task learning** — joint next-activity + timestamp prediction. ## Canonical architectures ### Single-task LSTM (Tax et al. 2017 — [[sources/2017-tax-lstm-process-prediction]]) - Input: one-hot activity + relative time features per step. - Single LSTM layer → softmax output for next-activity OR regression head for remaining time. ### Multi-task LSTM - Shared LSTM trunk, separate heads for next-activity + remaining-time. - Regularises via auxiliary task. ### Data-aware LSTM (Navarin et al., Camargo, Pasquadibisceglie) - Event attributes (categorical → embeddings; numerical → normalized) concatenated to activity features before LSTM. - Outperforms control-flow-only models on remaining-time regression. ### Stacked / bidirectional variants Marginal gains; rarely justify the training cost on typical BPM logs. ## Limitations - Sequential training — slow compared to Transformers. - Vanishing gradients on very long traces. - Hyperparameter-sensitive; hyperparameter optimisation (NSGA-II, genetic algorithms) non-trivially improves results (Di Francescomarino et al., Senderovich et al.). - **Loss-function sensitivity** — unweighted MAE is standard but does not optimise for earliness; temporally weighted L1 variants (exponential / power / moderate decay) can improve earliness at the cost of temporal consistency ([[sources/2023-riess-temporal-loss-remaining-cycle-time]]). ## Superseded by **[[concepts/transformer-ppm]]** (2021+) for long-range dependency modelling and training parallelism, though LSTM remains a competitive baseline. ## Related [[concepts/predictive-process-monitoring]] · [[concepts/transformer-ppm]] · [[concepts/trace-encoding]]