---
title: "BPMN Assistant: An LLM-Based Approach to Business Process Modeling"
type: source
tags: [llm, bpm, bpmn, process-modelling, generative-ai, json-intermediate-representation, function-calling, evaluation]
authors: [Licardo Josip Tomo; Tanković Nikola; Etinger Darko]
year: 2026
venue: "Applied Sciences (MDPI), 16, 2213"
kind: paper
raw_path: "raw/ABPS/2026-licardo-bpmn-assistant-llm-process-modelling.pdf"
key_claims:
  - "BPMN editing should be reframed as incremental transformation of mutable structures via a constrained set of atomic operations, not as one-shot regeneration of full BPMN 2.0 XML."
  - "A hierarchical JSON intermediate representation with type/id/label/branches and explicit has_join + next fields lets LLMs manipulate process logic without grappling with BPMN 2.0 XML verbosity."
  - "Five atomic editing functions (delete_element, redirect_branch, add_element, move_element, update_element) suffice to express complex BPMN modifications via composition."
  - "JSON beats raw-XML manipulation in editing tasks across all evaluated LLMs (GPT-5.1, GPT-5 mini, GPT-4o, Claude 3.5/4.5 Sonnet, Gemini 2.0 Flash, Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3) — most dramatically lifting DeepSeek V3 from 8% (XML) to 50% (JSON) success."
  - "Conformance checking via token-replay on simulated logs shows JSON achieves average F1 0.72 vs XML's 0.69; frontier models (GPT-5.1, Claude 4.5 Sonnet) maintain high precision in either format."
  - "JSON reduces editing latency by ~43% and output token count by >75% versus direct XML generation, despite higher input context cost."
  - "A strict JSON-schema validator with self-correction loop (LLM retries on validation failure) is the structural-correctness guardrail that XML baselines lack."
  - "Limitations: no support for pools/lanes (collaboration diagrams), data objects, or multi-pool layout; evaluation on synthetic descriptions rather than real enterprise narratives; no usability study with non-technical users."
created: 2026-05-06
updated: 2026-05-06
sources: []
---

# BPMN Assistant — Licardo, Tanković & Etinger 2026

Open-source LLM-based system for natural-language **creation and editing** of executable BPMN 2.0 diagrams, from Juraj Dobrila University of Pula. Source code at <https://github.com/jtlicardo/bpmn-assistant>; evaluation dataset at <https://huggingface.co/datasets/jtlicardo/bpmn-assistant-eval>. Published in *Applied Sciences* (MDPI), February 2026.

This is the wiki's **first ingested source on direct LLM→BPMN modelling**, partially closing gap §E.2.1 in [[syntheses/llm-bpm-reading-list]] (LLM-based process discovery & redesign).

## Core idea

BPMN Assistant rejects the dominant "one-shot generation" framing in LLM-driven process modelling and reframes BPMN editing as a **controlled transformation problem** over mutable structures. Instead of asking the LLM to emit BPMN 2.0 XML, the system asks it to manipulate a **hierarchical JSON intermediate representation** through a small fixed set of **atomic editing functions** invoked via function calling. The JSON is then deterministically translated to BPMN 2.0 XML by a separate transformer, with a layout server (Node.js + `bpmn-auto-layout`) appending Diagram Interchange (DI) coordinates without modifying semantic structure.

Conceptually this separates **process logic** from **BPMN syntax**, on the grounds that LLMs reliably reason about logic but produce verbose, error-prone XML when asked to handle both simultaneously.

## System architecture

Three components:

1. **Backend** (Python 3.12 + FastAPI) — intent recognition, prompt construction, LLM orchestration, JSON-schema validation with self-correction retry loop, JSON↔XML translation. Pseudocode (Listing 1, p. 7) shows: parse → conversational vs operational intent → up to K LLM retries against schema → layout pass → return enriched XML.
2. **Layout server** (Node.js + Express + `bpmn-auto-layout`) — appends DI coordinates to the BPMN XML produced by the backend. Limitation: cannot reliably lay out collaboration / multi-pool diagrams.
3. **Frontend** (Vue.js + bpmn-js) — dual-panel UI with chat on the left and BPMN canvas on the right; users select among supported LLM backends.

### JSON representation

The intermediate format encodes processes as a `process` array of typed elements. Tasks (`task`, `userTask`, `serviceTask`, `sendTask`, `receiveTask`, `businessRuleTask`, `manualTask`, `scriptTask`) and events (`startEvent`, `endEvent`, `intermediateCatchEvent`, `intermediateThrowEvent` with optional `eventDefinition` for timer/message) are flat objects with `type`/`id`/`label`. Gateways are nested:

- **`exclusiveGateway` / `inclusiveGateway`** — `branches: [{condition, path: [...]}]`. The boolean `has_join` flag declares whether branches reconverge (triggers a converging gateway in XML); the optional `next` field lets a branch jump to any element ID, enabling cyclic flows that break strict hierarchy.
- **`parallelGateway`** — `branches: [[...], [...]]` where each sub-array is a concurrent path; converging parallel gateway is generated implicitly when the parent flow continues.

This nested structure makes the LLM's editing task a tree-rewrite problem with a small number of escape hatches (`next`, `has_join`, `is_default`).

### Five atomic editing functions

| Function | Parameters | Effect |
|---|---|---|
| `delete_element` | `element_id` | Remove element; surrounding flow is auto-reconnected |
| `redirect_branch` | `branch_condition`, `next_id` | Redirect a gateway branch to a different target |
| `add_element` | `element`, `before_id` *or* `after_id` | Insert at specified position |
| `move_element` | `element_id`, `before_id` *or* `after_id` | Relocate without altering content |
| `update_element` | `new_element` | Mutate properties (label, event-definition, etc.) |

Complex transformations are decomposed into compositions of these atoms — e.g. converting a sequential flow to a gateway structure becomes "delete + add". The granularity gives the LLM smaller decisions to validate.

### Validation and self-correction

A Python validation layer enforces BPMN structural constraints (unique IDs, correct connectivity, valid gateway-branch hierarchy, exactly one start event) on the generated JSON *before* XML conversion. On violation, the validator's error message is fed back to the LLM as a retry prompt — a programmatic guardrail the XML baseline does not have, contributing to the JSON approach's edge.

## Evaluation

Synthetic dataset of process descriptions; gold-standard BPMN models constructed manually (the authors note Friedrich et al.'s 2011 corpus and PET dataset lack validated gold standards, so a curated synthetic set was necessary).

**Metrics:**
- **Generation reliability** — fraction of LLM outputs that pass schema validation and produce renderable BPMN.
- **Edit success rate** — fraction of editing tasks completed correctly across multi-step refinements.
- **GED / RGED** — Graph Edit Distance and Relative GED versus gold model.
- **Conformance F1** — token-based replay on simulated event logs (harmonic mean of fitness and precision).

**Headline results:**

- **Editing reliability:** JSON ≥ XML across all models. The starkest case is **DeepSeek V3**, which achieves 50% success in JSON vs 8% in XML — JSON enables open-weight cost-effective models to perform editing tasks previously reserved for frontier models.
- **Frontier-model parity:** GPT-5.1 and Claude 4.5 Sonnet reach parity in both formats; Claude 4.5 Sonnet achieves slightly *higher* precision with direct XML in some cases.
- **Conformance:** JSON F1 = 0.72, XML F1 = 0.69 — both formats produce models with high behavioural fidelity to ground-truth processes.
- **Efficiency:** JSON cuts editing latency by ~43% and output tokens by >75% versus XML, despite higher input context (the JSON system prompt + edit-function descriptions are larger). Cost analysis (Appendix C) shows JSON approaches lower total cost in most cases.

## Research questions answered

1. **RQ1 — Structured intermediate vs direct XML?** JSON is significantly more reliable for editing; equivalent or slightly worse for one-shot generation depending on model.
2. **RQ2 — Function-based editing democratisation?** Yes — atomic-function editing lets open-weight models (DeepSeek V3, Llama 3.3 70B) perform tasks that previously needed frontier models.
3. **RQ3 — Latency / efficiency trade-off?** JSON's higher input context is more than offset by reduced output complexity, yielding ~43% latency improvement and >75% output-token reduction.

## Positioning vs related work

The paper compares BPMN Assistant explicitly against three concurrent LLM-BPMN tools (Table 1, p. 6):

| Tool | Goal | Output | Evaluation focus |
|---|---|---|---|
| **ProMoAI** (Kourani et al. 2024) | NL → model generation + refinement | BPMN + PNML | Conformance-based model quality |
| **BPMN-Chatbot** (Köpke & Safan 2024) | Efficient NL → BPMN | BPMN | Correctness + token efficiency + acceptance |
| **BPMNGen** (Hörner et al. 2026) | NL → BPMN + human-centred quality | BPMN | Semantic alignment + cognitive load + acceptability + comprehension |
| **BPMN Assistant** (this work) | Reliable NL → BPMN + edit robustness | BPMN | GED/RGED structural fidelity + failure rate |

BPMN-Chatbot reports 95% correctness with up to 94% token reduction vs ProMoAI but limits its evaluation to one-shot creation; BPMN Assistant explicitly targets the iterative-edit gap. BPMNGen evaluates human-centred quality dimensions (cognitive load, acceptability) that are complementary to BPMN Assistant's structural-correctness focus.

## Limitations (§6 of the paper)

- **Architectural scope:** No collaboration diagrams (pools/lanes), data objects, or complex artefacts. Workaround: decompose into single-participant processes.
- **Methodological asymmetry:** JSON pipeline has schema-validation retries; XML baseline has only single-pass generation. May overstate JSON's structural advantage.
- **Synthetic descriptions:** Real enterprise narratives would be more rigorous but unavailable; existing public corpora (Friedrich 2011, PET) lack validated gold-standard BPMN.
- **Semantic validation:** Token-replay conformance on simulated traces; no deadlock-freedom or [[concepts/soundness]] verification, no production-trace conformance.
- **Usability:** No HCI study with non-technical users; the "non-technical business analyst" target audience is not yet validated empirically.

## Connections

**Concepts:** [[concepts/agentic-bpm]] · [[concepts/business-process]] · [[concepts/conformance-checking]] · [[concepts/declarative-process-modelling]] · [[concepts/process-discovery]] · [[concepts/soundness]] · [[concepts/token-semantics]]

**Frameworks:** [[frameworks/bpmn]] · [[frameworks/dmn]]

**Methods:** [[methods/process-mining-basics]]

**Related sources:**
- [[sources/2024-kampik-large-process-models]] — LPM vision; BPMN Assistant is a concrete instantiation of the "Augmenting modeling and analysis with contextualized knowledge" Step-1 capability Kampik et al. articulate.
- [[sources/2018-dumas-fundamentals-of-bpm-ch3-essential-process-modeling]] — BPMN element catalogue BPMN Assistant subsets.
- [[sources/2010-mendling-reijers-vanderaalst-7pmg]] — guideline-driven model-quality target the validation layer partially enforces.
- [[sources/2026-calvanese-agentic-bpm-manifesto]] — APM positions LLM-driven modelling as one of the LLM-augmented BPM capabilities; BPMN Assistant operationalises it.
- [[syntheses/bpmn-modelling-practical-guide]] — constructive builder's guide whose pattern library the JSON IR conceptually mirrors.
- [[syntheses/llm-bpm-reading-list]] — entry under §A (Core LLM+BPM) and partial filler for §E.2.1 gap.