---
title: "Reading list: LLMs in Business Process Management"
type: synthesis
tags: [reading-list, llm, bpm, apm, abpms, agentic, reference]
sources:
  - "[[sources/2026-calvanese-agentic-bpm-manifesto]]"
  - "[[sources/2023-dumas-ai-augmented-bpms]]"
  - "[[sources/2025-calvanese-autonomy-business-process-execution]]"
  - "[[sources/2025-elyasaf-self-modifying-abps]]"
  - "[[sources/2025-fournier-agentic-ai-process-observability]]"
  - "[[sources/2024-xu-the-agent-company-benchmark]]"
  - "[[sources/2024-kampik-large-process-models]]"
  - "[[sources/2023-chapela-campa-augmented-process-execution]]"
  - "[[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]]"
  - "[[sources/2024-riess-synbps-simulation-framework]]"
  - "[[sources/2023-qureshi-chatgpt-sr-automation]]"
  - "[[sources/2024-agarwal-litllms-are-we-there-yet]]"
  - "[[sources/2024-dennstaedt-llm-title-abstract-screening]]"
  - "[[sources/2025-scherbakov-llms-as-tools-literature-reviews]]"
  - "[[sources/2025-handa-which-economic-tasks-ai]]"
  - "[[sources/2025-tomlinson-working-with-ai]]"
  - "[[sources/2025-korst-wharton-gen-ai-enterprise-adoption]]"
  - "[[sources/2025-becker-metr-ai-developer-productivity]]"
  - "[[sources/2026-shen-ai-skill-formation]]"
  - "[[sources/2026-shen-anthropic-coding-skills-post]]"
  - "[[sources/2026-licardo-bpmn-assistant]]"
  - "[[sources/2025-varsani-neuro-symbolic-ai-sap-erp]]"
  - "[[sources/2024-kampik-large-process-models-correction]]"
  - "[[sources/2026-padella-llm-features-ppm]]"
  - "[[sources/2026-theodorakopoulos-bi-bpm-genai-review]]"
created: 2026-04-21
updated: 2026-05-11
---

# Reading list: LLMs in Business Process Management

Curated reading list assembled 2026-04-21 from the wiki's ingested and referenced corpus. Scope: **LLMs applied to Business Process Management** — agentic BPM, LLM-agent execution in processes, LLM-assisted discovery/redesign/monitoring, LLM observability and benchmarking in process-heavy domains. Excludes pure-PPM work without an LLM angle (classical LSTM-PPM, DECLARE, rule-based mining) and pure-LLM work without a BPM angle.

Organised into five layers:
- **A. Core LLM+BPM** — ingested sources on the agentic-BPM paradigm itself.
- **B. Referenced-not-ingested** — LLM+BPM citations living inside our ingested pages but not yet processed.
- **C. Adjacent: LLM-assisted research methodology** — LLM+SLR cluster (flagged separately since it is *about reviewing* BPM literature, not *applying LLMs to* processes).
- **D. Context: AI adoption & workforce evidence** — empirical baselines that frame the APM opportunity but are not themselves BPM.
- **E. Entry points & gaps** — wiki hubs to start from, and gaps the corpus acknowledges.

---

## A. Core LLM+BPM (ingested)

### A.1 Agentic BPM paradigm (APM / ABPMS)

- [[sources/2026-calvanese-agentic-bpm-manifesto]] — APM Manifesto (18 authors, arXiv 2026). Defines agentic BPM; four ordered capabilities (framed autonomy → explainability → conversational actionability → self-modification); agents as first-class.
- [[sources/2023-dumas-ai-augmented-bpms]] — ABPMS Manifesto (ACM TMIS 2023). Direct predecessor; four system-level characteristics (adaptable, proactive, explainable, context-sensitive). Read alongside APM for paradigm evolution — see [[syntheses/abpms-to-apm-evolution]].
- [[sources/2025-calvanese-autonomy-business-process-execution]] — Position paper promoting goals + normative frames to first-class BPM abstractions for governing autonomous agents. Operationalises framed autonomy.
- [[sources/2025-elyasaf-self-modifying-abps]] — Self-modifying ABPS via MAPE-K; 5-level SAE-inspired autonomy roadmap; adaptation vs. evolution taxonomy.

### A.2 Agent observability & evaluation

- [[sources/2025-fournier-agentic-ai-process-observability]] — IBM Research. Treats LLM-agent execution (CrewAI, LangGraph, AutoGen) trajectories as event logs; process-mining + causal-discovery for behavioral-variability detection.
- [[sources/2024-xu-the-agent-company-benchmark]] — CMU. 175 professional tasks in a simulated company (SDE, PM, HR, Finance, Admin). Best LLM-agent: 30.3% full success / 39.3% with partial credit. Closest public analogue to BPM-style task execution.

### A.3 LLM in process-relevant domains

- [[sources/2026-padella-llm-features-ppm]] — **Flagship LLM-PPM source** (Padella, de Leoni & Dumas 2026, arXiv 2601.11468). Gemini 2.5 Flash Thinking trained on 100 traces matches/surpasses CatBoost + PGTNet trained on full event logs across BPI12 / Bac / Hospital × Total Time + Activity Occurrence. Introduces: ρ_seq trace-to-string encoding · [[concepts/semantic-hashing-probe|semantic-hashing probe]] for embodied-knowledge isolation (Hospital MAE +1702 % under hashing) · [[concepts/beta-learner-distillation|β-learner distillation]] for reasoning interpretability (LLM beats every β-learner by 6–80 %). Future work: prescriptive extension. Anchor for [[concepts/llm-based-ppm]].
- [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]] — Telenor / NoDaLiDa. Zero-shot LLM classification over 300 Norwegian customer-service transcripts; instruction-tuned Gemma2 reaches ~60-62% accuracy. Motivation: LLM classification as drift-robust alternative to retraining classifiers — direct bridge to [[concepts/concept-drift]] and PrPM.
- [[sources/2024-riess-synbps-simulation-framework]] — Parametric event-log simulator (NOT an LLM paper itself) — included here because it underpins the [[syntheses/study-sketch-synbps-apm]] programme for controlled agentic-BPM evaluation.

### A.4 LLM-driven process modelling

- [[sources/2026-licardo-bpmn-assistant]] — *BPMN Assistant* (Applied Sciences 2026). LLM + hierarchical JSON intermediate representation + 5 atomic editing functions + schema-validation retry loop. Beats direct XML manipulation across all evaluated models (GPT-5.1, Claude 4.5 Sonnet, DeepSeek V3, …); 43% latency reduction, >75% output-token reduction. **First ingested source on direct LLM→BPMN modelling.** Partially fills gap §E.2.1.

### A.5 Neuro-symbolic BPM vision and instantiations

- [[sources/2024-kampik-large-process-models]] — *Large Process Models: A Vision for BPM in the Age of Generative AI* (Künstliche Intelligenz 2024). LPM = neuro-symbolic stack (process-fine-tuned LLM + knowledge graphs + process atoms + classical BPM tooling). APM Manifesto cites this as the LLM-centric BPM vision against which APM argues that symbolic frames remain essential. **Now fully ingested with §5/§6/§7 detail** (was previously a structural stub).
- [[sources/2024-kampik-large-process-models-correction]] — Springer erratum to Fig. 1 (December 2024); restores missing labels in the LPM architecture diagram.
- [[sources/2025-varsani-neuro-symbolic-ai-sap-erp]] — *Neuro-Symbolic AI in SAP ERP*. Concrete enterprise instantiation of LPM-style architecture: LLM (GPT-4 + LLaMA 2) + middleware (RAG + schema embeddings + RLHF) + SAP HANA + ABAP rule engine. Reports 89.6% query-translation accuracy vs 74.8% LLM-only. Year/venue *unverified*.

### A.6 Referenced-not-ingested

- [[sources/2023-chapela-campa-augmented-process-execution]] — *From Process Mining to Augmented Process Execution*. **Stub**; four-level analytics pyramid; complementary analytics-side lens on what APM now frames as agent-side.

### A.7 Integrative reviews

- [[sources/2026-theodorakopoulos-bi-bpm-genai-review]] — *Business Intelligence and Business Process Management in the Era of Generative AI* (MDPI Applied Sciences 2026, 55 pp). Narrative + conceptual review (NOT PRISMA). Proposes a **5-layer integrative framework** (Data & Computation → Organisational Insight → Process Intelligence → Augmentation → Decision) with cross-layer feedback loops. Positions GenAI explicitly as the Augmentation layer — *not* the new core. Tempers techno-optimism: explainability ≠ effective oversight; trust must be calibrated. 5 limitation categories for GenAI in process-critical environments. Complementary to APM's agent-centric pyramid. Wiki-maintainer caveat: MDPI venue, management-science authors — treat as integrative/teaching reference, not theoretical contribution.

---

## B. Referenced-not-ingested — priority LLM+BPM citations

Sources cited *inside* our ingested pages (APM Manifesto, autonomy paper, Fournier observability, ABPMS Manifesto) that would extend the corpus. Ordered by estimated relevance.

| # | Citation | Cited from | Relevance |
|---|---|---|---|
| 1 | ~~Kampik et al. 2024 — Large Process Models~~ | APM Manifesto | **Now ingested** — see §A.5. |
| 2 | **Acitelli, Alman, Maggi, Marrella 2025** — automated planning for framed-autonomy synthesis | [[sources/2025-calvanese-autonomy-business-process-execution]] | Operationalises normative frames via automated planning. |
| 3 | **Kampik & Okulmus 2024 — SIGNAL** | [[sources/2025-calvanese-autonomy-business-process-execution]] | Agent communication / signalling protocols; relevant to conversational actionability. |
| 4 | **Dong et al. 2024 — AgentOps / trajectory analysis** | [[sources/2025-fournier-agentic-ai-process-observability]] | Foundational to agent-process observability. |
| 5 | **Chopra et al. 2018 — Handbook of Normative Multi-Agent Systems** | [[sources/2025-calvanese-autonomy-business-process-execution]] | Normative-MAS theory underpinning frame design. |
| 6 | **Tamkin et al. 2024 — Clio** | [[sources/2025-handa-which-economic-tasks-ai]] | Privacy-preserving task classification; methodological primitive for agent-trajectory analysis at scale. |
| 7 | **[[sources/2012-vanderaalst-process-mining-manifesto]]** (stub) | APM Manifesto | Foundational manifesto APM positions itself as successor to; read for paradigm continuity. |
| 8 | **[[sources/2021-dumas-process-mining-2-from-insights-to-action]]** | APM Manifesto, PrPM corpus | Process Mining 2.0 keynote; foreshadows APM's Recommend role. |

**Recurring author programmes to sweep:** Kampik, Rebmann, Warmuth, Polyvyanyy, Rinderle-Ma, Lesperance, Marrella. These names appear repeatedly in APM citations without individual works ingested.

### LLM-driven BPMN modelling — referenced via [[sources/2026-licardo-bpmn-assistant]]

The BPMN Assistant paper provides a comprehensive review of concurrent LLM-BPMN tools. Priorities for ingest:

| # | Citation | Relevance |
|---|---|---|
| L1 | **Kourani et al. 2024 — ProMoAI: Process Modeling with Generative AI** (IJCAI) | Constrained Python POWL code as intermediate representation; Claude 3.5 Sonnet achieves 0.93 conformance. |
| L2 | **Kourani et al. 2024 — Evaluating LLMs on Business Process Modeling: Framework, Benchmark, and Self-Improvement** (arXiv:2412.00023) | Benchmarking framework that powers ProMoAI. |
| L3 | **Hörner, Möller, Reichert 2026 — BPMNGen** (BISE) | Comprehensive human-centred evaluation: cognitive load, acceptability, comprehension. Complements BPMN Assistant's structural focus. |
| L4 | **Köpke & Safan 2024 — BPMN-Chatbot** (Business Process Management) | Voice/text interface, 95% correctness, 94% token reduction. |
| L5 | **Klievtsova, Benzin, Kampik, Mangler, Rinderle-Ma 2023 — Conversational Process Modeling** (BPM Forum) | State-of-the-art review of conversational modelling. |
| L6 | **Grohs, Abb, Elsayed, Rehse 2024 — LLMs Can Accomplish Business Process Management Tasks** (BPM Workshops) | Survey of LLM capabilities across BPM tasks. |
| L7 | **Bellan, Dragoni, Ghidini 2023 — Process Extraction from Text: Benchmarking** (arXiv:2110.03754) | The BPMN-extraction benchmark study Licardo et al. cite as evidence the field needs standardised evaluation. |
| L8 | **Rebmann, Schmidt, Glavaš, van der Aa 2024 — Evaluating LLMs to Solve Semantics-Aware Process Mining Tasks** (ICPM) | LLM evaluation on process-mining-specific tasks; complements modelling-focused work. |

---

## C. Adjacent: LLM-assisted research methodology

**Flag:** these are about using LLMs to *review* (BPM) literature — not about applying LLMs *within* business processes. Useful operationally if you want to run an LLM-assisted systematic review of the LLM+BPM frontier itself.

- [[sources/2023-qureshi-chatgpt-sr-automation]] — Early critical commentary. "Uncanny valley": fluent output, unreliable citations.
- [[sources/2024-agarwal-litllms-are-we-there-yet]] — LitLLMs: retrieval (keyword + dual search + re-rank) + plan-then-generate. -18-26% hallucinated citations.
- [[sources/2024-dennstaedt-llm-title-abstract-screening]] — Four-model screening benchmark; Mixtral best-balanced (81.9% sens / 75.2% spec).
- [[sources/2025-scherbakov-llms-as-tools-literature-reviews]] — Meta-SLR of 172 LLM-SR studies; self-demonstrating Covidence + GPT-4o pipeline (83% prec / 86% rec on extraction).
- Hub page: [[concepts/llm-assisted-literature-review]] — stage-by-stage capability map.

---

## D. Context: AI adoption & workforce

Not BPM-direct, but establish the empirical backdrop for any APM claims about organisational deployment, skill impact, and task distribution.

- [[sources/2025-handa-which-economic-tasks-ai]] — 4M+ Claude conversations mapped to O\*NET; 57% augmentation / 43% automation.
- [[sources/2025-tomlinson-working-with-ai]] — Bing Copilot across occupations; user-goal vs. AI-action split.
- [[sources/2025-korst-wharton-gen-ai-enterprise-adoption]] — Wharton-GBK Year-3 survey; 82% weekly use, 72% measure ROI, 43% warn of skill atrophy.
- [[sources/2025-becker-metr-ai-developer-productivity]] — RCT: AI slowed experienced OSS devs 19% despite perceived speed-up. Perception-reality gap.
- [[sources/2026-shen-ai-skill-formation]] — RCT: AI-assisted learners score ~17% lower on post-task competency.
- [[sources/2026-shen-anthropic-coding-skills-post]] — Anthropic companion post.

---

## E. Entry points & gaps

### E.1 Wiki pages to read first (synthesis hubs)

- [[concepts/agentic-bpm]] — paradigm hub.
- [[syntheses/abpms-to-apm-evolution]] — what shifted 2023 → 2026.
- [[syntheses/apm-business-themes]] — APM research agenda mapped to 12 business themes.
- [[concepts/framed-autonomy]], [[concepts/agent-process-observability]], [[concepts/behavioral-variability]], [[concepts/self-modification]], [[concepts/conversational-actionability]], [[concepts/explainability-apm]].
- [[concepts/ai-agent-benchmarks]], [[concepts/ai-adoption]], [[concepts/ai-skill-formation]].
- Study sketches as research-programme exemplars: [[syntheses/study-sketch-synbps-apm]], [[syntheses/study-sketch-temporal-consistency-agents]], [[syntheses/study-sketch-agent-trajectory-drift]].

### E.2 Gaps the corpus acknowledges

The following are flagged in ingested sources as open — candidates for targeted literature search beyond this wiki:

1. **LLM-based process discovery & redesign** — flagged by APM Manifesto (agent-oriented process mining, C1). **Partially filled** by [[sources/2026-licardo-bpmn-assistant]] for the editing/refinement side; LLM-assisted extraction *from event logs* still uncovered.
2. **LLM-based conformance checking & prescriptive monitoring** — **PARTIALLY FILLED 2026-05-11** by [[sources/2026-padella-llm-features-ppm|Padella et al. 2026]] for *predictive* monitoring with LLMs in data-scarce settings. The paper explicitly flags prescriptive extension as future work. LLM-based conformance checking remains uncovered.
3. **Multi-agent coordination / cross-organisational choreography** — APM M3 open question.
4. **Frame elicitation & specification languages** — autonomy paper flags hybrid symbolic/sub-symbolic frame elicitation as a bottleneck; no ingested work operationalises it.
5. **Prompt injection & LLM-agent security in BPM** — APM C2 raises zero-trust architecture for agents as critical; Microsoft NLWeb incident cited but not ingested.
6. **Benchmark contamination risk (C3)** — **PARTIALLY ADDRESSED 2026-05-11**: [[sources/2026-padella-llm-features-ppm|Padella et al. 2026]]'s [[concepts/semantic-hashing-probe|semantic-hashing probe]] offers a concrete protocol to detect embodied-knowledge leakage from public event logs (BPI12 et al.) into LLM predictions. Systematic application across BPM benchmarks remains future work.
7. **APM "killer applications"** — APM Outlook §5 admits these are underexplored; no real-world agentic-BPM deployment cases are ingested.
8. **Process-worker skill formation** — analogue of [[sources/2026-shen-ai-skill-formation]] for business analysts, RPA ops, case managers is missing.
9. **LLM hallucination-mitigation for process state** — no ingested work on preventing agents from fabricating case history or compliance status.
10. **Domain-specific fine-tuning for BPM** — [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]] finds domain SFT can hurt; no ingested follow-up.

---

## Reading order (suggested)

1. **Paradigm orientation** — [[syntheses/abpms-to-apm-evolution]], then [[sources/2026-calvanese-agentic-bpm-manifesto]], then [[sources/2023-dumas-ai-augmented-bpms]].
2. **Neuro-symbolic alternative paradigm** — [[sources/2024-kampik-large-process-models]] (LPM vision) + [[concepts/neuro-symbolic-bpm]] hub; read [[sources/2025-varsani-neuro-symbolic-ai-sap-erp]] as concrete instantiation.
3. **Capability deep-dives** — [[sources/2025-calvanese-autonomy-business-process-execution]] (frames), [[sources/2025-fournier-agentic-ai-process-observability]] (observability), [[sources/2025-elyasaf-self-modifying-abps]] (self-modification).
4. **LLM→BPMN modelling** — [[sources/2026-licardo-bpmn-assistant]] (BPMN Assistant) + [[concepts/llm-assisted-process-modelling]] hub.
5. **Evaluation reality check** — [[sources/2024-xu-the-agent-company-benchmark]] (what LLM-agents can actually do) + [[sources/2025-becker-metr-ai-developer-productivity]] (perception-reality gap).
6. **Process-relevant LLM work** — [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]].
7. **Context layer (optional)** — AI-adoption cluster; read last or skim.
8. **Gap-driven search** — pick from §E.2 based on research focus; or sweep the §B LLM-driven BPMN modelling list (L1-L8).