---
title: "LLM-based Predictive Process Monitoring (LLM-PPM)"
type: concept
tags: [ppm, llm, in-context-learning, small-scale-data, llm-bpm, emerging-method-family]
sources:
  - "[[sources/2026-padella-llm-features-ppm]]"
  - "[[sources/2026-theodorakopoulos-bi-bpm-genai-review]]"
created: 2026-05-11
updated: 2026-05-11
---

# LLM-based Predictive Process Monitoring (LLM-PPM)

A method family within [[concepts/predictive-process-monitoring|PPM]] that uses **large language models** to forecast process KPIs (remaining time, outcome, next activity, activity occurrence) directly from natural-language-encoded event-log traces — typically via prompt engineering, in-context examples, and chain-of-thought reasoning rather than supervised model training on event-log features.

## Why it matters

Classical PPM ([[concepts/lstm-ppm|LSTM-PPM]], [[concepts/transformer-ppm|Transformer-PPM]], [[sources/2021-bukhsh-processtransformer|ProcessTransformer]]) requires substantial training data — typically thousands of completed traces — and falters in **data-scarce settings**. LLM-PPM proposes that pre-trained foundation models can leverage two qualitatively different forms of knowledge that classical methods cannot:

1. **Embodied prior knowledge** — semantics of activity names, attribute names, and domain context drawn from the LLM's pre-training corpus.
2. **In-context reasoning** — chain-of-thought aggregation of patterns across a small number of training traces, performed at inference time.

[[sources/2026-padella-llm-features-ppm|Padella, de Leoni & Dumas 2026]] document empirically that an LLM (Gemini 2.5 Flash Thinking) trained on 100 traces (≤1.45 % of available data) matches or surpasses CatBoost and PGTNet trained on the full event log — across three datasets and two KPIs.

## Architectural ingredients

- **Trace-to-string encoding** — e.g. the `ρ_seq` encoding introduced by Padella et al.: global attributes ⊕ (activity, duration) sequence ⊕ target. Deliberately omits local attributes to respect LLM context-length limits and avoid long-context degradation.
- **Modular prompt template** — Padella et al.'s 7-part scheme: instruction header · attribute description · output spec · running-trace format · domain background · examples · prediction request.
- **In-context examples** — the few completed traces serving as both training data and reasoning anchors.
- **Reasoning-aware output format** — predicted value + structured reasoning trace, enabling downstream [[concepts/explainability-apm|explainability]] analysis.

## Evidence on what LLM-PPM does internally

The headline finding from [[sources/2026-padella-llm-features-ppm]]:

- **Reliance on embodied knowledge** confirmed by the [[concepts/semantic-hashing-probe|semantic-hashing probe]]: when activity/attribute names are deterministically hashed, MAE degrades by +42 % (BPI12), +71 % (Bac), and +1702 % (Hospital). Nemenyi post-hoc p < 0.01.
- **Higher-order reasoning** beyond pattern replication confirmed by [[concepts/beta-learner-distillation|β-learner distillation]]: the LLM outperforms every individually re-implemented reasoning pattern by 6–80 %.

## Position within the broader literature

[[sources/2026-theodorakopoulos-bi-bpm-genai-review|Theodorakopoulos & Theodoropoulou 2026]] place LLM-PPM within their **Augmentation layer**, arguing that LLM-based prediction is *complementary* to conventional ML, not a replacement: conventional ML remains preferred for tightly-specified prediction; LLMs offer the interaction + knowledge-mediation surface. Critically, they argue LLMs are *derivative* — only as strong as the evidentiary base they augment.

The [[sources/2026-calvanese-agentic-bpm-manifesto|APM Manifesto]] mentions PPM as a capability invoked by APM agents in the **Recommend** role — LLM-PPM is a candidate technical realisation of that capability.

## Open problems

- **Cross-LLM sensitivity** — Padella et al. tested only Gemini 2.5 Flash Thinking; performance variance across Claude / GPT / Llama families undocumented.
- **Inference cost vs. accuracy** — LLM inference is ~10²–10⁴× more expensive than CatBoost/PGTNet inference; no end-to-end deployment-economics analysis exists.
- **Concept drift** — how do pre-trained LLMs handle process drift? Do their priors decay slowly, or are they brittle? Empirical work missing.
- **Benchmark contamination** — public event logs (BPI12 et al.) may be in LLM pre-training data; the [[concepts/semantic-hashing-probe|semantic-hashing probe]] is one way to detect this, but systematic protocols are absent. Flagged as **C3** open challenge in the APM Manifesto.
- **Prompt-template generalisation** — the Padella 7-part scheme appears to transfer, but no formal cross-domain study.
- **Prescriptive extension** — Padella et al. flag prescriptive process analytics as future work, joining the broader [[concepts/prescriptive-process-monitoring|PrPM]] thread.

## Related

[[concepts/predictive-process-monitoring]] · [[concepts/trace-encoding]] · [[concepts/explainability-apm]] · [[concepts/agentic-bpm]] · [[concepts/prescriptive-process-monitoring]] · [[concepts/beta-learner-distillation]] · [[concepts/semantic-hashing-probe]] · [[syntheses/llm-bpm-reading-list]] · [[syntheses/ppm-landscape]]