--- title: "BPMN Assistant: An LLM-Based Approach to Business Process Modeling" type: source tags: [llm, bpm, bpmn, process-modelling, generative-ai, json-intermediate-representation, function-calling, evaluation] authors: [Licardo Josip Tomo; Tanković Nikola; Etinger Darko] year: 2026 venue: "Applied Sciences (MDPI), 16, 2213" kind: paper raw_path: "raw/ABPS/2026-licardo-bpmn-assistant-llm-process-modelling.pdf" key_claims: - "BPMN editing should be reframed as incremental transformation of mutable structures via a constrained set of atomic operations, not as one-shot regeneration of full BPMN 2.0 XML." - "A hierarchical JSON intermediate representation with type/id/label/branches and explicit has_join + next fields lets LLMs manipulate process logic without grappling with BPMN 2.0 XML verbosity." - "Five atomic editing functions (delete_element, redirect_branch, add_element, move_element, update_element) suffice to express complex BPMN modifications via composition." - "JSON beats raw-XML manipulation in editing tasks across all evaluated LLMs (GPT-5.1, GPT-5 mini, GPT-4o, Claude 3.5/4.5 Sonnet, Gemini 2.0 Flash, Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3) — most dramatically lifting DeepSeek V3 from 8% (XML) to 50% (JSON) success." - "Conformance checking via token-replay on simulated logs shows JSON achieves average F1 0.72 vs XML's 0.69; frontier models (GPT-5.1, Claude 4.5 Sonnet) maintain high precision in either format." - "JSON reduces editing latency by ~43% and output token count by >75% versus direct XML generation, despite higher input context cost." - "A strict JSON-schema validator with self-correction loop (LLM retries on validation failure) is the structural-correctness guardrail that XML baselines lack." - "Limitations: no support for pools/lanes (collaboration diagrams), data objects, or multi-pool layout; evaluation on synthetic descriptions rather than real enterprise narratives; no usability study with non-technical users." created: 2026-05-06 updated: 2026-05-06 sources: [] --- # BPMN Assistant — Licardo, Tanković & Etinger 2026 Open-source LLM-based system for natural-language **creation and editing** of executable BPMN 2.0 diagrams, from Juraj Dobrila University of Pula. Source code at ; evaluation dataset at . Published in *Applied Sciences* (MDPI), February 2026. This is the wiki's **first ingested source on direct LLM→BPMN modelling**, partially closing gap §E.2.1 in [[syntheses/llm-bpm-reading-list]] (LLM-based process discovery & redesign). ## Core idea BPMN Assistant rejects the dominant "one-shot generation" framing in LLM-driven process modelling and reframes BPMN editing as a **controlled transformation problem** over mutable structures. Instead of asking the LLM to emit BPMN 2.0 XML, the system asks it to manipulate a **hierarchical JSON intermediate representation** through a small fixed set of **atomic editing functions** invoked via function calling. The JSON is then deterministically translated to BPMN 2.0 XML by a separate transformer, with a layout server (Node.js + `bpmn-auto-layout`) appending Diagram Interchange (DI) coordinates without modifying semantic structure. Conceptually this separates **process logic** from **BPMN syntax**, on the grounds that LLMs reliably reason about logic but produce verbose, error-prone XML when asked to handle both simultaneously. ## System architecture Three components: 1. **Backend** (Python 3.12 + FastAPI) — intent recognition, prompt construction, LLM orchestration, JSON-schema validation with self-correction retry loop, JSON↔XML translation. Pseudocode (Listing 1, p. 7) shows: parse → conversational vs operational intent → up to K LLM retries against schema → layout pass → return enriched XML. 2. **Layout server** (Node.js + Express + `bpmn-auto-layout`) — appends DI coordinates to the BPMN XML produced by the backend. Limitation: cannot reliably lay out collaboration / multi-pool diagrams. 3. **Frontend** (Vue.js + bpmn-js) — dual-panel UI with chat on the left and BPMN canvas on the right; users select among supported LLM backends. ### JSON representation The intermediate format encodes processes as a `process` array of typed elements. Tasks (`task`, `userTask`, `serviceTask`, `sendTask`, `receiveTask`, `businessRuleTask`, `manualTask`, `scriptTask`) and events (`startEvent`, `endEvent`, `intermediateCatchEvent`, `intermediateThrowEvent` with optional `eventDefinition` for timer/message) are flat objects with `type`/`id`/`label`. Gateways are nested: - **`exclusiveGateway` / `inclusiveGateway`** — `branches: [{condition, path: [...]}]`. The boolean `has_join` flag declares whether branches reconverge (triggers a converging gateway in XML); the optional `next` field lets a branch jump to any element ID, enabling cyclic flows that break strict hierarchy. - **`parallelGateway`** — `branches: [[...], [...]]` where each sub-array is a concurrent path; converging parallel gateway is generated implicitly when the parent flow continues. This nested structure makes the LLM's editing task a tree-rewrite problem with a small number of escape hatches (`next`, `has_join`, `is_default`). ### Five atomic editing functions | Function | Parameters | Effect | |---|---|---| | `delete_element` | `element_id` | Remove element; surrounding flow is auto-reconnected | | `redirect_branch` | `branch_condition`, `next_id` | Redirect a gateway branch to a different target | | `add_element` | `element`, `before_id` *or* `after_id` | Insert at specified position | | `move_element` | `element_id`, `before_id` *or* `after_id` | Relocate without altering content | | `update_element` | `new_element` | Mutate properties (label, event-definition, etc.) | Complex transformations are decomposed into compositions of these atoms — e.g. converting a sequential flow to a gateway structure becomes "delete + add". The granularity gives the LLM smaller decisions to validate. ### Validation and self-correction A Python validation layer enforces BPMN structural constraints (unique IDs, correct connectivity, valid gateway-branch hierarchy, exactly one start event) on the generated JSON *before* XML conversion. On violation, the validator's error message is fed back to the LLM as a retry prompt — a programmatic guardrail the XML baseline does not have, contributing to the JSON approach's edge. ## Evaluation Synthetic dataset of process descriptions; gold-standard BPMN models constructed manually (the authors note Friedrich et al.'s 2011 corpus and PET dataset lack validated gold standards, so a curated synthetic set was necessary). **Metrics:** - **Generation reliability** — fraction of LLM outputs that pass schema validation and produce renderable BPMN. - **Edit success rate** — fraction of editing tasks completed correctly across multi-step refinements. - **GED / RGED** — Graph Edit Distance and Relative GED versus gold model. - **Conformance F1** — token-based replay on simulated event logs (harmonic mean of fitness and precision). **Headline results:** - **Editing reliability:** JSON ≥ XML across all models. The starkest case is **DeepSeek V3**, which achieves 50% success in JSON vs 8% in XML — JSON enables open-weight cost-effective models to perform editing tasks previously reserved for frontier models. - **Frontier-model parity:** GPT-5.1 and Claude 4.5 Sonnet reach parity in both formats; Claude 4.5 Sonnet achieves slightly *higher* precision with direct XML in some cases. - **Conformance:** JSON F1 = 0.72, XML F1 = 0.69 — both formats produce models with high behavioural fidelity to ground-truth processes. - **Efficiency:** JSON cuts editing latency by ~43% and output tokens by >75% versus XML, despite higher input context (the JSON system prompt + edit-function descriptions are larger). Cost analysis (Appendix C) shows JSON approaches lower total cost in most cases. ## Research questions answered 1. **RQ1 — Structured intermediate vs direct XML?** JSON is significantly more reliable for editing; equivalent or slightly worse for one-shot generation depending on model. 2. **RQ2 — Function-based editing democratisation?** Yes — atomic-function editing lets open-weight models (DeepSeek V3, Llama 3.3 70B) perform tasks that previously needed frontier models. 3. **RQ3 — Latency / efficiency trade-off?** JSON's higher input context is more than offset by reduced output complexity, yielding ~43% latency improvement and >75% output-token reduction. ## Positioning vs related work The paper compares BPMN Assistant explicitly against three concurrent LLM-BPMN tools (Table 1, p. 6): | Tool | Goal | Output | Evaluation focus | |---|---|---|---| | **ProMoAI** (Kourani et al. 2024) | NL → model generation + refinement | BPMN + PNML | Conformance-based model quality | | **BPMN-Chatbot** (Köpke & Safan 2024) | Efficient NL → BPMN | BPMN | Correctness + token efficiency + acceptance | | **BPMNGen** (Hörner et al. 2026) | NL → BPMN + human-centred quality | BPMN | Semantic alignment + cognitive load + acceptability + comprehension | | **BPMN Assistant** (this work) | Reliable NL → BPMN + edit robustness | BPMN | GED/RGED structural fidelity + failure rate | BPMN-Chatbot reports 95% correctness with up to 94% token reduction vs ProMoAI but limits its evaluation to one-shot creation; BPMN Assistant explicitly targets the iterative-edit gap. BPMNGen evaluates human-centred quality dimensions (cognitive load, acceptability) that are complementary to BPMN Assistant's structural-correctness focus. ## Limitations (§6 of the paper) - **Architectural scope:** No collaboration diagrams (pools/lanes), data objects, or complex artefacts. Workaround: decompose into single-participant processes. - **Methodological asymmetry:** JSON pipeline has schema-validation retries; XML baseline has only single-pass generation. May overstate JSON's structural advantage. - **Synthetic descriptions:** Real enterprise narratives would be more rigorous but unavailable; existing public corpora (Friedrich 2011, PET) lack validated gold-standard BPMN. - **Semantic validation:** Token-replay conformance on simulated traces; no deadlock-freedom or [[concepts/soundness]] verification, no production-trace conformance. - **Usability:** No HCI study with non-technical users; the "non-technical business analyst" target audience is not yet validated empirically. ## Connections **Concepts:** [[concepts/agentic-bpm]] · [[concepts/business-process]] · [[concepts/conformance-checking]] · [[concepts/declarative-process-modelling]] · [[concepts/process-discovery]] · [[concepts/soundness]] · [[concepts/token-semantics]] **Frameworks:** [[frameworks/bpmn]] · [[frameworks/dmn]] **Methods:** [[methods/process-mining-basics]] **Related sources:** - [[sources/2024-kampik-large-process-models]] — LPM vision; BPMN Assistant is a concrete instantiation of the "Augmenting modeling and analysis with contextualized knowledge" Step-1 capability Kampik et al. articulate. - [[sources/2018-dumas-fundamentals-of-bpm-ch3-essential-process-modeling]] — BPMN element catalogue BPMN Assistant subsets. - [[sources/2010-mendling-reijers-vanderaalst-7pmg]] — guideline-driven model-quality target the validation layer partially enforces. - [[sources/2026-calvanese-agentic-bpm-manifesto]] — APM positions LLM-driven modelling as one of the LLM-augmented BPM capabilities; BPMN Assistant operationalises it. - [[syntheses/bpmn-modelling-practical-guide]] — constructive builder's guide whose pattern library the JSON IR conceptually mirrors. - [[syntheses/llm-bpm-reading-list]] — entry under §A (Core LLM+BPM) and partial filler for §E.2.1 gap.