--- title: "Reading list: LLMs in Business Process Management" type: synthesis tags: [reading-list, llm, bpm, apm, abpms, agentic, reference] sources: - "[[sources/2026-calvanese-agentic-bpm-manifesto]]" - "[[sources/2023-dumas-ai-augmented-bpms]]" - "[[sources/2025-calvanese-autonomy-business-process-execution]]" - "[[sources/2025-elyasaf-self-modifying-abps]]" - "[[sources/2025-fournier-agentic-ai-process-observability]]" - "[[sources/2024-xu-the-agent-company-benchmark]]" - "[[sources/2024-kampik-large-process-models]]" - "[[sources/2023-chapela-campa-augmented-process-execution]]" - "[[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]]" - "[[sources/2024-riess-synbps-simulation-framework]]" - "[[sources/2023-qureshi-chatgpt-sr-automation]]" - "[[sources/2024-agarwal-litllms-are-we-there-yet]]" - "[[sources/2024-dennstaedt-llm-title-abstract-screening]]" - "[[sources/2025-scherbakov-llms-as-tools-literature-reviews]]" - "[[sources/2025-handa-which-economic-tasks-ai]]" - "[[sources/2025-tomlinson-working-with-ai]]" - "[[sources/2025-korst-wharton-gen-ai-enterprise-adoption]]" - "[[sources/2025-becker-metr-ai-developer-productivity]]" - "[[sources/2026-shen-ai-skill-formation]]" - "[[sources/2026-shen-anthropic-coding-skills-post]]" - "[[sources/2026-licardo-bpmn-assistant]]" - "[[sources/2025-varsani-neuro-symbolic-ai-sap-erp]]" - "[[sources/2024-kampik-large-process-models-correction]]" - "[[sources/2026-padella-llm-features-ppm]]" - "[[sources/2026-theodorakopoulos-bi-bpm-genai-review]]" created: 2026-04-21 updated: 2026-05-11 --- # Reading list: LLMs in Business Process Management Curated reading list assembled 2026-04-21 from the wiki's ingested and referenced corpus. Scope: **LLMs applied to Business Process Management** — agentic BPM, LLM-agent execution in processes, LLM-assisted discovery/redesign/monitoring, LLM observability and benchmarking in process-heavy domains. Excludes pure-PPM work without an LLM angle (classical LSTM-PPM, DECLARE, rule-based mining) and pure-LLM work without a BPM angle. Organised into five layers: - **A. Core LLM+BPM** — ingested sources on the agentic-BPM paradigm itself. - **B. Referenced-not-ingested** — LLM+BPM citations living inside our ingested pages but not yet processed. - **C. Adjacent: LLM-assisted research methodology** — LLM+SLR cluster (flagged separately since it is *about reviewing* BPM literature, not *applying LLMs to* processes). - **D. Context: AI adoption & workforce evidence** — empirical baselines that frame the APM opportunity but are not themselves BPM. - **E. Entry points & gaps** — wiki hubs to start from, and gaps the corpus acknowledges. --- ## A. Core LLM+BPM (ingested) ### A.1 Agentic BPM paradigm (APM / ABPMS) - [[sources/2026-calvanese-agentic-bpm-manifesto]] — APM Manifesto (18 authors, arXiv 2026). Defines agentic BPM; four ordered capabilities (framed autonomy → explainability → conversational actionability → self-modification); agents as first-class. - [[sources/2023-dumas-ai-augmented-bpms]] — ABPMS Manifesto (ACM TMIS 2023). Direct predecessor; four system-level characteristics (adaptable, proactive, explainable, context-sensitive). Read alongside APM for paradigm evolution — see [[syntheses/abpms-to-apm-evolution]]. - [[sources/2025-calvanese-autonomy-business-process-execution]] — Position paper promoting goals + normative frames to first-class BPM abstractions for governing autonomous agents. Operationalises framed autonomy. - [[sources/2025-elyasaf-self-modifying-abps]] — Self-modifying ABPS via MAPE-K; 5-level SAE-inspired autonomy roadmap; adaptation vs. evolution taxonomy. ### A.2 Agent observability & evaluation - [[sources/2025-fournier-agentic-ai-process-observability]] — IBM Research. Treats LLM-agent execution (CrewAI, LangGraph, AutoGen) trajectories as event logs; process-mining + causal-discovery for behavioral-variability detection. - [[sources/2024-xu-the-agent-company-benchmark]] — CMU. 175 professional tasks in a simulated company (SDE, PM, HR, Finance, Admin). Best LLM-agent: 30.3% full success / 39.3% with partial credit. Closest public analogue to BPM-style task execution. ### A.3 LLM in process-relevant domains - [[sources/2026-padella-llm-features-ppm]] — **Flagship LLM-PPM source** (Padella, de Leoni & Dumas 2026, arXiv 2601.11468). Gemini 2.5 Flash Thinking trained on 100 traces matches/surpasses CatBoost + PGTNet trained on full event logs across BPI12 / Bac / Hospital × Total Time + Activity Occurrence. Introduces: ρ_seq trace-to-string encoding · [[concepts/semantic-hashing-probe|semantic-hashing probe]] for embodied-knowledge isolation (Hospital MAE +1702 % under hashing) · [[concepts/beta-learner-distillation|β-learner distillation]] for reasoning interpretability (LLM beats every β-learner by 6–80 %). Future work: prescriptive extension. Anchor for [[concepts/llm-based-ppm]]. - [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]] — Telenor / NoDaLiDa. Zero-shot LLM classification over 300 Norwegian customer-service transcripts; instruction-tuned Gemma2 reaches ~60-62% accuracy. Motivation: LLM classification as drift-robust alternative to retraining classifiers — direct bridge to [[concepts/concept-drift]] and PrPM. - [[sources/2024-riess-synbps-simulation-framework]] — Parametric event-log simulator (NOT an LLM paper itself) — included here because it underpins the [[syntheses/study-sketch-synbps-apm]] programme for controlled agentic-BPM evaluation. ### A.4 LLM-driven process modelling - [[sources/2026-licardo-bpmn-assistant]] — *BPMN Assistant* (Applied Sciences 2026). LLM + hierarchical JSON intermediate representation + 5 atomic editing functions + schema-validation retry loop. Beats direct XML manipulation across all evaluated models (GPT-5.1, Claude 4.5 Sonnet, DeepSeek V3, …); 43% latency reduction, >75% output-token reduction. **First ingested source on direct LLM→BPMN modelling.** Partially fills gap §E.2.1. ### A.5 Neuro-symbolic BPM vision and instantiations - [[sources/2024-kampik-large-process-models]] — *Large Process Models: A Vision for BPM in the Age of Generative AI* (Künstliche Intelligenz 2024). LPM = neuro-symbolic stack (process-fine-tuned LLM + knowledge graphs + process atoms + classical BPM tooling). APM Manifesto cites this as the LLM-centric BPM vision against which APM argues that symbolic frames remain essential. **Now fully ingested with §5/§6/§7 detail** (was previously a structural stub). - [[sources/2024-kampik-large-process-models-correction]] — Springer erratum to Fig. 1 (December 2024); restores missing labels in the LPM architecture diagram. - [[sources/2025-varsani-neuro-symbolic-ai-sap-erp]] — *Neuro-Symbolic AI in SAP ERP*. Concrete enterprise instantiation of LPM-style architecture: LLM (GPT-4 + LLaMA 2) + middleware (RAG + schema embeddings + RLHF) + SAP HANA + ABAP rule engine. Reports 89.6% query-translation accuracy vs 74.8% LLM-only. Year/venue *unverified*. ### A.6 Referenced-not-ingested - [[sources/2023-chapela-campa-augmented-process-execution]] — *From Process Mining to Augmented Process Execution*. **Stub**; four-level analytics pyramid; complementary analytics-side lens on what APM now frames as agent-side. ### A.7 Integrative reviews - [[sources/2026-theodorakopoulos-bi-bpm-genai-review]] — *Business Intelligence and Business Process Management in the Era of Generative AI* (MDPI Applied Sciences 2026, 55 pp). Narrative + conceptual review (NOT PRISMA). Proposes a **5-layer integrative framework** (Data & Computation → Organisational Insight → Process Intelligence → Augmentation → Decision) with cross-layer feedback loops. Positions GenAI explicitly as the Augmentation layer — *not* the new core. Tempers techno-optimism: explainability ≠ effective oversight; trust must be calibrated. 5 limitation categories for GenAI in process-critical environments. Complementary to APM's agent-centric pyramid. Wiki-maintainer caveat: MDPI venue, management-science authors — treat as integrative/teaching reference, not theoretical contribution. --- ## B. Referenced-not-ingested — priority LLM+BPM citations Sources cited *inside* our ingested pages (APM Manifesto, autonomy paper, Fournier observability, ABPMS Manifesto) that would extend the corpus. Ordered by estimated relevance. | # | Citation | Cited from | Relevance | |---|---|---|---| | 1 | ~~Kampik et al. 2024 — Large Process Models~~ | APM Manifesto | **Now ingested** — see §A.5. | | 2 | **Acitelli, Alman, Maggi, Marrella 2025** — automated planning for framed-autonomy synthesis | [[sources/2025-calvanese-autonomy-business-process-execution]] | Operationalises normative frames via automated planning. | | 3 | **Kampik & Okulmus 2024 — SIGNAL** | [[sources/2025-calvanese-autonomy-business-process-execution]] | Agent communication / signalling protocols; relevant to conversational actionability. | | 4 | **Dong et al. 2024 — AgentOps / trajectory analysis** | [[sources/2025-fournier-agentic-ai-process-observability]] | Foundational to agent-process observability. | | 5 | **Chopra et al. 2018 — Handbook of Normative Multi-Agent Systems** | [[sources/2025-calvanese-autonomy-business-process-execution]] | Normative-MAS theory underpinning frame design. | | 6 | **Tamkin et al. 2024 — Clio** | [[sources/2025-handa-which-economic-tasks-ai]] | Privacy-preserving task classification; methodological primitive for agent-trajectory analysis at scale. | | 7 | **[[sources/2012-vanderaalst-process-mining-manifesto]]** (stub) | APM Manifesto | Foundational manifesto APM positions itself as successor to; read for paradigm continuity. | | 8 | **[[sources/2021-dumas-process-mining-2-from-insights-to-action]]** | APM Manifesto, PrPM corpus | Process Mining 2.0 keynote; foreshadows APM's Recommend role. | **Recurring author programmes to sweep:** Kampik, Rebmann, Warmuth, Polyvyanyy, Rinderle-Ma, Lesperance, Marrella. These names appear repeatedly in APM citations without individual works ingested. ### LLM-driven BPMN modelling — referenced via [[sources/2026-licardo-bpmn-assistant]] The BPMN Assistant paper provides a comprehensive review of concurrent LLM-BPMN tools. Priorities for ingest: | # | Citation | Relevance | |---|---|---| | L1 | **Kourani et al. 2024 — ProMoAI: Process Modeling with Generative AI** (IJCAI) | Constrained Python POWL code as intermediate representation; Claude 3.5 Sonnet achieves 0.93 conformance. | | L2 | **Kourani et al. 2024 — Evaluating LLMs on Business Process Modeling: Framework, Benchmark, and Self-Improvement** (arXiv:2412.00023) | Benchmarking framework that powers ProMoAI. | | L3 | **Hörner, Möller, Reichert 2026 — BPMNGen** (BISE) | Comprehensive human-centred evaluation: cognitive load, acceptability, comprehension. Complements BPMN Assistant's structural focus. | | L4 | **Köpke & Safan 2024 — BPMN-Chatbot** (Business Process Management) | Voice/text interface, 95% correctness, 94% token reduction. | | L5 | **Klievtsova, Benzin, Kampik, Mangler, Rinderle-Ma 2023 — Conversational Process Modeling** (BPM Forum) | State-of-the-art review of conversational modelling. | | L6 | **Grohs, Abb, Elsayed, Rehse 2024 — LLMs Can Accomplish Business Process Management Tasks** (BPM Workshops) | Survey of LLM capabilities across BPM tasks. | | L7 | **Bellan, Dragoni, Ghidini 2023 — Process Extraction from Text: Benchmarking** (arXiv:2110.03754) | The BPMN-extraction benchmark study Licardo et al. cite as evidence the field needs standardised evaluation. | | L8 | **Rebmann, Schmidt, Glavaš, van der Aa 2024 — Evaluating LLMs to Solve Semantics-Aware Process Mining Tasks** (ICPM) | LLM evaluation on process-mining-specific tasks; complements modelling-focused work. | --- ## C. Adjacent: LLM-assisted research methodology **Flag:** these are about using LLMs to *review* (BPM) literature — not about applying LLMs *within* business processes. Useful operationally if you want to run an LLM-assisted systematic review of the LLM+BPM frontier itself. - [[sources/2023-qureshi-chatgpt-sr-automation]] — Early critical commentary. "Uncanny valley": fluent output, unreliable citations. - [[sources/2024-agarwal-litllms-are-we-there-yet]] — LitLLMs: retrieval (keyword + dual search + re-rank) + plan-then-generate. -18-26% hallucinated citations. - [[sources/2024-dennstaedt-llm-title-abstract-screening]] — Four-model screening benchmark; Mixtral best-balanced (81.9% sens / 75.2% spec). - [[sources/2025-scherbakov-llms-as-tools-literature-reviews]] — Meta-SLR of 172 LLM-SR studies; self-demonstrating Covidence + GPT-4o pipeline (83% prec / 86% rec on extraction). - Hub page: [[concepts/llm-assisted-literature-review]] — stage-by-stage capability map. --- ## D. Context: AI adoption & workforce Not BPM-direct, but establish the empirical backdrop for any APM claims about organisational deployment, skill impact, and task distribution. - [[sources/2025-handa-which-economic-tasks-ai]] — 4M+ Claude conversations mapped to O\*NET; 57% augmentation / 43% automation. - [[sources/2025-tomlinson-working-with-ai]] — Bing Copilot across occupations; user-goal vs. AI-action split. - [[sources/2025-korst-wharton-gen-ai-enterprise-adoption]] — Wharton-GBK Year-3 survey; 82% weekly use, 72% measure ROI, 43% warn of skill atrophy. - [[sources/2025-becker-metr-ai-developer-productivity]] — RCT: AI slowed experienced OSS devs 19% despite perceived speed-up. Perception-reality gap. - [[sources/2026-shen-ai-skill-formation]] — RCT: AI-assisted learners score ~17% lower on post-task competency. - [[sources/2026-shen-anthropic-coding-skills-post]] — Anthropic companion post. --- ## E. Entry points & gaps ### E.1 Wiki pages to read first (synthesis hubs) - [[concepts/agentic-bpm]] — paradigm hub. - [[syntheses/abpms-to-apm-evolution]] — what shifted 2023 → 2026. - [[syntheses/apm-business-themes]] — APM research agenda mapped to 12 business themes. - [[concepts/framed-autonomy]], [[concepts/agent-process-observability]], [[concepts/behavioral-variability]], [[concepts/self-modification]], [[concepts/conversational-actionability]], [[concepts/explainability-apm]]. - [[concepts/ai-agent-benchmarks]], [[concepts/ai-adoption]], [[concepts/ai-skill-formation]]. - Study sketches as research-programme exemplars: [[syntheses/study-sketch-synbps-apm]], [[syntheses/study-sketch-temporal-consistency-agents]], [[syntheses/study-sketch-agent-trajectory-drift]]. ### E.2 Gaps the corpus acknowledges The following are flagged in ingested sources as open — candidates for targeted literature search beyond this wiki: 1. **LLM-based process discovery & redesign** — flagged by APM Manifesto (agent-oriented process mining, C1). **Partially filled** by [[sources/2026-licardo-bpmn-assistant]] for the editing/refinement side; LLM-assisted extraction *from event logs* still uncovered. 2. **LLM-based conformance checking & prescriptive monitoring** — **PARTIALLY FILLED 2026-05-11** by [[sources/2026-padella-llm-features-ppm|Padella et al. 2026]] for *predictive* monitoring with LLMs in data-scarce settings. The paper explicitly flags prescriptive extension as future work. LLM-based conformance checking remains uncovered. 3. **Multi-agent coordination / cross-organisational choreography** — APM M3 open question. 4. **Frame elicitation & specification languages** — autonomy paper flags hybrid symbolic/sub-symbolic frame elicitation as a bottleneck; no ingested work operationalises it. 5. **Prompt injection & LLM-agent security in BPM** — APM C2 raises zero-trust architecture for agents as critical; Microsoft NLWeb incident cited but not ingested. 6. **Benchmark contamination risk (C3)** — **PARTIALLY ADDRESSED 2026-05-11**: [[sources/2026-padella-llm-features-ppm|Padella et al. 2026]]'s [[concepts/semantic-hashing-probe|semantic-hashing probe]] offers a concrete protocol to detect embodied-knowledge leakage from public event logs (BPI12 et al.) into LLM predictions. Systematic application across BPM benchmarks remains future work. 7. **APM "killer applications"** — APM Outlook §5 admits these are underexplored; no real-world agentic-BPM deployment cases are ingested. 8. **Process-worker skill formation** — analogue of [[sources/2026-shen-ai-skill-formation]] for business analysts, RPA ops, case managers is missing. 9. **LLM hallucination-mitigation for process state** — no ingested work on preventing agents from fabricating case history or compliance status. 10. **Domain-specific fine-tuning for BPM** — [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]] finds domain SFT can hurt; no ingested follow-up. --- ## Reading order (suggested) 1. **Paradigm orientation** — [[syntheses/abpms-to-apm-evolution]], then [[sources/2026-calvanese-agentic-bpm-manifesto]], then [[sources/2023-dumas-ai-augmented-bpms]]. 2. **Neuro-symbolic alternative paradigm** — [[sources/2024-kampik-large-process-models]] (LPM vision) + [[concepts/neuro-symbolic-bpm]] hub; read [[sources/2025-varsani-neuro-symbolic-ai-sap-erp]] as concrete instantiation. 3. **Capability deep-dives** — [[sources/2025-calvanese-autonomy-business-process-execution]] (frames), [[sources/2025-fournier-agentic-ai-process-observability]] (observability), [[sources/2025-elyasaf-self-modifying-abps]] (self-modification). 4. **LLM→BPMN modelling** — [[sources/2026-licardo-bpmn-assistant]] (BPMN Assistant) + [[concepts/llm-assisted-process-modelling]] hub. 5. **Evaluation reality check** — [[sources/2024-xu-the-agent-company-benchmark]] (what LLM-agents can actually do) + [[sources/2025-becker-metr-ai-developer-productivity]] (perception-reality gap). 6. **Process-relevant LLM work** — [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]]. 7. **Context layer (optional)** — AI-adoption cluster; read last or skim. 8. **Gap-driven search** — pick from §E.2 based on research focus; or sweep the §B LLM-driven BPMN modelling list (L1-L8).