---
title: Transformer / Attention for PPM
type: concept
tags: [ppm, deep-learning, transformer, attention]
sources: []
created: 2026-04-13
updated: 2026-04-13
---

# Transformer / Attention for PPM

Attention-based sequence models adapted from NLP to [[concepts/predictive-process-monitoring|PPM]]. Rose to prominence in 2020–2021 with **ProcessTransformer** (Bukhsh et al. 2021) and attention-augmented LSTMs (2020).

## Why Transformers for PPM
- **Parallel training** — unlike LSTM, all positions processed simultaneously.
- **Long-range dependencies** — self-attention is `O(n²)` but not bottlenecked by sequence length in terms of gradient flow.
- **Explainability via attention weights** — which past events most influence the current prediction.

## Canonical architectures

### ProcessTransformer (2021)
Transformer encoder stack on event sequences; separate heads for next-activity and remaining-time. Competitive with or exceeds LSTM baselines on many benchmarks.

### MTLFormer (2024+)
**Multi-Task Learning guided Transformer** — joint training over next-activity, remaining-time, and outcome prediction with task-weighting mechanisms.

### Attention-augmented LSTM (2020)
Hybrid: LSTM backbone with attention over prefix positions. Sometimes competitive with pure Transformers and cheaper to train on small logs.

## Data-awareness
Data features (attributes, resources, timestamps) concatenated at the token level or incorporated via learned embeddings. Architecture same as for control-flow-only; only the input dimensionality changes.

## Limitations / open problems
- **Small-log regime** — BPM event logs are typically small (thousands of cases) compared to NLP corpora; Transformers may overfit.
- **Positional encoding** — process traces have non-uniform inter-event time gaps; classical sinusoidal encoding is suboptimal.
- **Interpretability** — attention weights are not reliable explanations (Jain & Wallace 2019) but still used.

## Related
[[concepts/predictive-process-monitoring]] · [[concepts/lstm-ppm]] · [[concepts/trace-encoding]]