--- title: Transformer / Attention for PPM type: concept tags: [ppm, deep-learning, transformer, attention] sources: [] created: 2026-04-13 updated: 2026-04-13 --- # Transformer / Attention for PPM Attention-based sequence models adapted from NLP to [[concepts/predictive-process-monitoring|PPM]]. Rose to prominence in 2020–2021 with **ProcessTransformer** (Bukhsh et al. 2021) and attention-augmented LSTMs (2020). ## Why Transformers for PPM - **Parallel training** — unlike LSTM, all positions processed simultaneously. - **Long-range dependencies** — self-attention is `O(n²)` but not bottlenecked by sequence length in terms of gradient flow. - **Explainability via attention weights** — which past events most influence the current prediction. ## Canonical architectures ### ProcessTransformer (2021) Transformer encoder stack on event sequences; separate heads for next-activity and remaining-time. Competitive with or exceeds LSTM baselines on many benchmarks. ### MTLFormer (2024+) **Multi-Task Learning guided Transformer** — joint training over next-activity, remaining-time, and outcome prediction with task-weighting mechanisms. ### Attention-augmented LSTM (2020) Hybrid: LSTM backbone with attention over prefix positions. Sometimes competitive with pure Transformers and cheaper to train on small logs. ## Data-awareness Data features (attributes, resources, timestamps) concatenated at the token level or incorporated via learned embeddings. Architecture same as for control-flow-only; only the input dimensionality changes. ## Limitations / open problems - **Small-log regime** — BPM event logs are typically small (thousands of cases) compared to NLP corpora; Transformers may overfit. - **Positional encoding** — process traces have non-uniform inter-event time gaps; classical sinusoidal encoding is suboptimal. - **Interpretability** — attention weights are not reliable explanations (Jain & Wallace 2019) but still used. ## Related [[concepts/predictive-process-monitoring]] · [[concepts/lstm-ppm]] · [[concepts/trace-encoding]]