--- title: "LLM-based Predictive Process Monitoring (LLM-PPM)" type: concept tags: [ppm, llm, in-context-learning, small-scale-data, llm-bpm, emerging-method-family] sources: - "[[sources/2026-padella-llm-features-ppm]]" - "[[sources/2026-theodorakopoulos-bi-bpm-genai-review]]" created: 2026-05-11 updated: 2026-05-11 --- # LLM-based Predictive Process Monitoring (LLM-PPM) A method family within [[concepts/predictive-process-monitoring|PPM]] that uses **large language models** to forecast process KPIs (remaining time, outcome, next activity, activity occurrence) directly from natural-language-encoded event-log traces — typically via prompt engineering, in-context examples, and chain-of-thought reasoning rather than supervised model training on event-log features. ## Why it matters Classical PPM ([[concepts/lstm-ppm|LSTM-PPM]], [[concepts/transformer-ppm|Transformer-PPM]], [[sources/2021-bukhsh-processtransformer|ProcessTransformer]]) requires substantial training data — typically thousands of completed traces — and falters in **data-scarce settings**. LLM-PPM proposes that pre-trained foundation models can leverage two qualitatively different forms of knowledge that classical methods cannot: 1. **Embodied prior knowledge** — semantics of activity names, attribute names, and domain context drawn from the LLM's pre-training corpus. 2. **In-context reasoning** — chain-of-thought aggregation of patterns across a small number of training traces, performed at inference time. [[sources/2026-padella-llm-features-ppm|Padella, de Leoni & Dumas 2026]] document empirically that an LLM (Gemini 2.5 Flash Thinking) trained on 100 traces (≤1.45 % of available data) matches or surpasses CatBoost and PGTNet trained on the full event log — across three datasets and two KPIs. ## Architectural ingredients - **Trace-to-string encoding** — e.g. the `ρ_seq` encoding introduced by Padella et al.: global attributes ⊕ (activity, duration) sequence ⊕ target. Deliberately omits local attributes to respect LLM context-length limits and avoid long-context degradation. - **Modular prompt template** — Padella et al.'s 7-part scheme: instruction header · attribute description · output spec · running-trace format · domain background · examples · prediction request. - **In-context examples** — the few completed traces serving as both training data and reasoning anchors. - **Reasoning-aware output format** — predicted value + structured reasoning trace, enabling downstream [[concepts/explainability-apm|explainability]] analysis. ## Evidence on what LLM-PPM does internally The headline finding from [[sources/2026-padella-llm-features-ppm]]: - **Reliance on embodied knowledge** confirmed by the [[concepts/semantic-hashing-probe|semantic-hashing probe]]: when activity/attribute names are deterministically hashed, MAE degrades by +42 % (BPI12), +71 % (Bac), and +1702 % (Hospital). Nemenyi post-hoc p < 0.01. - **Higher-order reasoning** beyond pattern replication confirmed by [[concepts/beta-learner-distillation|β-learner distillation]]: the LLM outperforms every individually re-implemented reasoning pattern by 6–80 %. ## Position within the broader literature [[sources/2026-theodorakopoulos-bi-bpm-genai-review|Theodorakopoulos & Theodoropoulou 2026]] place LLM-PPM within their **Augmentation layer**, arguing that LLM-based prediction is *complementary* to conventional ML, not a replacement: conventional ML remains preferred for tightly-specified prediction; LLMs offer the interaction + knowledge-mediation surface. Critically, they argue LLMs are *derivative* — only as strong as the evidentiary base they augment. The [[sources/2026-calvanese-agentic-bpm-manifesto|APM Manifesto]] mentions PPM as a capability invoked by APM agents in the **Recommend** role — LLM-PPM is a candidate technical realisation of that capability. ## Open problems - **Cross-LLM sensitivity** — Padella et al. tested only Gemini 2.5 Flash Thinking; performance variance across Claude / GPT / Llama families undocumented. - **Inference cost vs. accuracy** — LLM inference is ~10²–10⁴× more expensive than CatBoost/PGTNet inference; no end-to-end deployment-economics analysis exists. - **Concept drift** — how do pre-trained LLMs handle process drift? Do their priors decay slowly, or are they brittle? Empirical work missing. - **Benchmark contamination** — public event logs (BPI12 et al.) may be in LLM pre-training data; the [[concepts/semantic-hashing-probe|semantic-hashing probe]] is one way to detect this, but systematic protocols are absent. Flagged as **C3** open challenge in the APM Manifesto. - **Prompt-template generalisation** — the Padella 7-part scheme appears to transfer, but no formal cross-domain study. - **Prescriptive extension** — Padella et al. flag prescriptive process analytics as future work, joining the broader [[concepts/prescriptive-process-monitoring|PrPM]] thread. ## Related [[concepts/predictive-process-monitoring]] · [[concepts/trace-encoding]] · [[concepts/explainability-apm]] · [[concepts/agentic-bpm]] · [[concepts/prescriptive-process-monitoring]] · [[concepts/beta-learner-distillation]] · [[concepts/semantic-hashing-probe]] · [[syntheses/llm-bpm-reading-list]] · [[syntheses/ppm-landscape]]