---
title: "Process Model Quality & Soundness — Evaluation Guide for BPMN Models from Qualitative Interviews"
type: synthesis
tags: [bpm, modelling, quality, soundness, 7pmg, sequal, bpmn, evaluation, llm-assisted-review, rubric, methodology]
sources:
  - "[[sources/2018-dumas-fundamentals-of-bpm]]"
  - "[[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]]"
  - "[[sources/1998-vanderaalst-verification-of-workflow-nets]]"
  - "[[sources/2010-mendling-reijers-vanderaalst-7pmg]]"
  - "[[sources/2006-krogstie-sindre-jorgensen-revised-sequal-framework]]"
  - "[[sources/2012-ottensooser-graphical-vs-textual]]"
  - "[[sources/2011-vanderaalst-process-mining-book]]"
  - "[[sources/2008-pesic-declare-manual]]"
created: 2026-05-04
updated: 2026-05-04
---

# Process Model Quality & Soundness — Evaluation Guide for BPMN Models from Qualitative Interviews

A consolidated, exhaustive guide for evaluating a BPMN model produced by a process analyst from qualitative interview data. Synthesised from five primary-source layers: SEQUAL ([[sources/2006-krogstie-sindre-jorgensen-revised-sequal-framework]]), Dumas's tripartite ([[sources/2018-dumas-fundamentals-of-bpm]] §5.4), 7PMG ([[sources/2010-mendling-reijers-vanderaalst-7pmg]]), workflow soundness ([[sources/1998-vanderaalst-verification-of-workflow-nets]]), and conformance dimensions ([[sources/2011-vanderaalst-process-mining-book]] Ch. 8 + Dumas §11.4.4). The pragmatic-quality layer is also empirically anchored in Ottensooser et al. ([[sources/2012-ottensooser-graphical-vs-textual]]).

The guide is **operational**: it concludes in §10 with an LLM-assisted review rubric — a structured prompt scaffold the analyst (or an LLM acting as reviewer) applies to a candidate BPMN model.

For the *upstream* interview-conduct guidance (how to elicit the content the model is built from), see the companion [[syntheses/interview-structuring-for-process-models]].

---

## 1. Scope and audience

**You** are a process analyst who has just completed (or is finalising) interviews with domain experts about an existing process. The deliverable is a **BPMN as-is process model** — a structured artefact intended to:

1. Be **validated** by the same domain experts.
2. Be **understandable** by stakeholders who did not participate in modelling.
3. Be **mechanically verifiable** so that obvious correctness flaws (deadlocks, dead activities, unreachable states) can be caught before the model is signed off.
4. Serve as a baseline for redesign, simulation, monitoring, or executable workflow specification.

The guide covers: what quality dimensions exist (§2), how each is verified for a BPMN model from interview data (§§3–8), how they trade off (§9), and how to operationalise the evaluation as an LLM-assisted review (§10). §11 covers anti-patterns; §12 the workflow; §13 acknowledged gaps.

---

## 2. The quality framework — five layers, one stack

A BPMN model from interviews is judged across **five layered concerns**, derived by integrating the SEQUAL semiotic framework with the BPM-textbook tripartite and the formal verification layer.

```
┌─────────────────────────────────────────────────────┐
│  5. Goal-fulfilment / organisational quality        │  ← does the model serve the modelling goal?
├─────────────────────────────────────────────────────┤
│  4. Pragmatic quality                               │  ← is the model understandable / actionable?
├─────────────────────────────────────────────────────┤
│  3. Semantic quality                                │  ← does the model match reality?
├─────────────────────────────────────────────────────┤
│  2. Syntactic quality                               │  ← is the model well-formed?
│     2a. Notation-syntactic (BPMN rules)             │
│     2b. Behavioural-syntactic (soundness)           │
├─────────────────────────────────────────────────────┤
│  1. Physical / empirical quality                    │  ← is the model accessible & readable?
└─────────────────────────────────────────────────────┘
```

The five layers correspond to SEQUAL quality types ([[concepts/sequal-framework]]):

| This guide's layer | SEQUAL types it covers | Verification mode |
|---|---|---|
| 1. Physical / empirical | physical + empirical | Tooling check |
| 2a. Notation-syntactic | syntactic (M ⊆ L) | Mechanical (BPMN tool) |
| 2b. Behavioural-syntactic | (extension — formal correctness inside the syntactic bucket per Dumas §5.4) | Mechanical ([[concepts/soundness|soundness]] checker) |
| 3. Semantic | semantic + perceived semantic | Domain-expert validation |
| 4. Pragmatic | pragmatic | Reader test + 7PMG/metrics |
| 5. Organisational | social + organisational | Process owner sign-off |

**Conformance-checking dimensions** (fitness, precision, generalisation, simplicity — [[concepts/conformance-checking]]) belong to a *different axis*: they evaluate a model against an event log, not against reality or audience. They become relevant *after* discovery (when monitoring/mining begins). Section 8 covers them briefly for completeness; for an as-is model with no log, they do not yet apply.

---

## 3. Layer 1 — Physical and empirical quality

Often invisible because most modern BPMN tooling solves it by default; still worth a single check.

### 3.1 Physical quality
- The model exists as a stored, accessible artefact (BPMN 2.0 XML or equivalent).
- Single source of truth: avoid PowerPoint sketches, hand-redrawn versions, screenshot-only models.
- Versioned in a system the audience can access.

### 3.2 Empirical quality
- **Layout**: left-to-right or top-to-bottom flow; no crossing arcs unless unavoidable.
- **Spacing**: connectors and tasks aligned on a grid.
- **Colour** (if used): purposeful and consistent (e.g. lane colours for organisations).
- **Font**: legible at the size the model will be read.

Tools like Camunda Modeler, bpmn.io, Signavio handle most of this automatically. Bad layout *will* sink pragmatic quality regardless of correctness — Dumas §5.1.2 expert-box note: "neat diagrams engage stakeholders".

---

## 4. Layer 2a — Notation-syntactic quality (BPMN rules)

Mechanical, tool-enforceable, non-negotiable.

### 4.1 Element-level rules
- **Sequence flows** connect flow elements only (events, activities, gateways) — not data objects, not lanes.
- **Tasks** have exactly one incoming and one outgoing sequence flow (in basic models). Multiple flows must go through gateways.
- **Events**:
  - **Start events**: no incoming sequence flow.
  - **End events**: no outgoing sequence flow.
  - **Intermediate events**: catching or throwing variants typed correctly (message, timer, error, signal, escalation, compensation).
  - **Boundary events** attached to activities, not to other events or gateways.
- **Gateways**:
  - Logical type set (XOR / AND / OR / event-based / complex).
  - Exclusive gateway (XOR) typically has explicit *default* flow if conditions might all be false.
  - Parallel gateway (AND) joins all incoming branches before continuing.
  - Inclusive gateway (OR) — flag for review per [[concepts/7pmg]] G5.
- **Pools / lanes**: each task assigned to exactly one lane; a pool encapsulates one organisational entity.
- **Message flows**: cross pools only; never within a pool.
- **Data objects / data stores**: connected via association (dotted line), not sequence flow.

### 4.2 Common violations
- Sequence flow crossing pool boundaries (should be message flow).
- Task with two incoming sequence flows (implicit AND-join — discouraged; use explicit gateway).
- End event followed by sequence flow ("End" must terminate).
- Gateway with single in + single out (vestigial — remove).

### 4.3 Tooling
- **Camunda Modeler** — real-time syntactic validation; refuses some illegal constructs.
- **bpmn.io** — same engine.
- **Signavio / SAP Signavio** — server-side validation + suggestion engine.
- **bpmnlint** (open-source) — linting rules library; integrate into CI.

---

## 5. Layer 2b — Behavioural-syntactic quality (soundness)

The structural rules of §4 do not catch *behavioural* errors: deadlocks, livelocks, dead activities, orphan tokens. Those need [[concepts/soundness|soundness]] verification per [[sources/1998-vanderaalst-verification-of-workflow-nets]].

### 5.1 The soundness criterion

A BPMN model translates (under standard semantics) to a [[concepts/workflow-net|workflow net]]. The model is **sound** iff the WF-net is sound, i.e. iff:

1. **Option to complete**: from every reachable state, the end event is reachable. *No deadlocks.*
2. **Proper completion**: when the end event fires, no orphan tokens are left in the process. *No tokens stuck on parallel branches.*
3. **No dead transitions / activities**: every modelled activity is executable from some reachable state. *No unreachable work.*

These are exactly the three clauses Dumas §5.4 paraphrases as "option to complete · proper completion · no dead activities".

### 5.2 Why soundness is decidable in practice

For **free-choice** WF-nets — the class most naturally produced by BPMN with AND/XOR gateways — soundness is verifiable in **polynomial time** ([[sources/1998-vanderaalst-verification-of-workflow-nets]] Theorem 12). Theorem 11 shows the reduction: PN sound ⇔ extended net (PN + transition o→i) is live and bounded. Standard Petri-net analysis tools then suffice.

In industrial WFMS deployments the author surveyed (Dutch Customs, Justice, banks, insurance), almost all WF-nets are free-choice or *almost* free-choice (free-choice + inquiry transitions, Theorem 16 — also polynomial-time decidable).

### 5.3 The four behavioural anomalies (Dumas §5.4.1)

Soundness violations manifest as one of four anomalies — each with its own concept page:

| Anomaly | Symptom | Typical cause | Violates |
|---|---|---|---|
| **[[concepts/deadlock]]** | Token stuck; instance never completes | XOR-split → AND-join (mismatch); branch injected into AND-block | (1) option to complete |
| **[[concepts/livelock]]** | Token cycles in loop forever | Loop with always-true exit, or no exit gateway, or unsatisfiable exit | (1) option to complete |
| **[[concepts/lack-of-synchronization]]** | Multiple tokens on same flow; activity executes more times than intended | AND-split → XOR-join (mismatch); OR-split → XOR-join | (2) proper completion |
| **[[concepts/dead-activity]]** | Activity never executable in any instance | Provably-false branch condition; downstream of a deadlock; disconnected | (3) no dead activities |

A sound model contains *none* of these. Verifying soundness = checking the absence of all four. See [[concepts/token-semantics]] for the underlying token-flow reasoning.

**Common appearance patterns** in models from interviews:

| Pattern | Anomaly | Why it appears |
|---|---|---|
| Mismatched AND-split / XOR-join | Lack of synchronisation | Branch added late without updating join |
| Mismatched XOR-split / AND-join | Deadlock | Analyst inverts the gateway type |
| Activity reachable only via a gateway condition that's never true | Dead activity | "What-if" branches added speculatively |
| Implicit join (multiple sequence flows into a task) | Lack of synchronisation | Quick-and-dirty modelling shortcut |
| Loop without exit condition | Livelock | Iterate-without-checking pattern from interview |
| Loop with always-true exit | Livelock | Inverted loop polarity |

#### Block structure as anomaly prevention

[[concepts/block-structure]] (single-entry-single-exit fragments with matching split/join type) is **sound by construction** — no anomaly can arise within a block. Most discovery-phase models can be built block-structured; deviations should be the conscious exception, not the default.

### 5.4 Verification tools
- **Camunda Modeler** — basic structural validation; not full soundness.
- **Signavio** — soundness checks built in (commercial).
- **ProM** — academic gold-standard; multiple soundness plug-ins.
- **WOFLAN** — original [[sources/1998-vanderaalst-verification-of-workflow-nets]] tool.
- **bpmn-js-token-simulation** — interactive token-flow simulation; useful for visualising deadlocks.

### 5.5 Soundness variants

Beyond classical soundness, the literature defines (referenced-not-yet-ingested):
- **Weak soundness** — drops "no dead transitions".
- **Relaxed soundness** — every transition participates in *some* successful run.
- **Lazy / k-soundness** — for cyclic processes.
- **Data-aware soundness** — extends to data-flow correctness.

For an as-is model from interviews, **classical soundness is the right gate**. Relaxed variants are useful when modelling explicitly leaves dead branches as documentation of "we know this is a dead-end but it's there in policy".

---

## 6. Layer 3 — Semantic quality (does the model match reality?)

This is where the bulk of analyst skill is invested. SEQUAL distinguishes three flavours ([[concepts/sequal-framework]]):

| Flavour | Set correspondence | When it applies |
|---|---|---|
| **Descriptive semantic quality** | M vs D (current domain) | as-is model |
| **Prescriptive semantic quality** | M vs D^O (optimal domain) | to-be model |
| **Semantic quality** *(formerly "perceived")* | K vs M | model fits stakeholder mental models |

For an as-is BPMN model from interviews, **descriptive** quality dominates. Two sub-criteria, both from Dumas §5.4.2 / Krogstie 2006:

### 6.1 Validity (M ⊆ D — no false statements)

Every statement the model makes is consistent with reality.

**How to verify**:
- **Translate the model to natural language** before showing to domain experts ([[sources/2012-ottensooser-graphical-vs-textual]]: experimental n=196 evidence shows untrained readers gain *no statistically significant* understanding from BPMN alone, p=0.15; written use cases help both trained and untrained readers, p<0.01).
- A useful artefact is a **structured written use case** (trigger, primary actor, main success scenario as numbered steps, extensions for exceptions) — present it *before* the BPMN; that order maximises comprehension across audiences (Ottensooser H5, p<0.01).
- Walk the expert through the natural-language version and let them point out falsifications.

**Common validity failures**:
- An activity that "sometimes happens" modelled as always happening (no XOR gateway).
- Activity granularity mismatch — model shows a step the expert doesn't recognise as a coherent task.
- Wrong actor — task assigned to the wrong lane.
- Hidden upstream dependency — an activity is shown as triggered by event X when it actually requires X *and* Y.

### 6.2 Completeness (D ⊆ M — no missing essential paths)

No essential alternative path is missing — but per SEQUAL, *absolute* completeness is infeasible (D too large). Practical target: **feasible completeness** — all paths the modelling goal requires are present.

**How to verify**:
- Active probing: "what other outcome is possible here?"; "who else could perform this?"; "is there a scenario where this branch is skipped?"
- Sunny-day vs rainy-day balance — the well-known interview pitfall ([[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]]). Use rainy-day questions derived from the exception taxonomy:
  - Internal business exception (out-of-stock, ineligible application).
  - External business exception (customer cancels, supplier defaults).
  - Internal technology exception (system unresponsive).
  - Activity timeout (deadline missed).
- Cross-handoff verification — when activity A is followed by activity B in the model, separately ask the performer of B what they receive and from whom; mismatches reveal incomplete or wrong handoffs.

**Common completeness failures**:
- No exception path from any activity (sunny-day model only).
- No timeout handling for activities with implicit deadlines.
- Backward-flow (rework) loops missing.
- Authorisation/approval branches absent.

### 6.3 Iteration discipline

Expect ≥2 validation iterations per domain expert ([[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]] §5.2.2). Final approval from the **process owner** closes semantic validation (§5.4.2). Mark uncertainty *on the model* (e.g. coloured sticky / annotation) so the next interview has targeted questions.

### 6.4 Why the analyst cannot validate semantically alone

Dumas §5.1.2 challenge 1 — **fragmented knowledge**: tasks are split across specialists; each expert has deep local knowledge but conflicting upstream/downstream assumptions. The analyst can only catch this by interviewing both ends of every handoff and confronting the disagreement.

---

## 7. Layer 4 — Pragmatic quality (is the model understandable and actionable?)

SEQUAL's revised pragmatic quality is *more demanding* than mere comprehension: the model must enable **learning** (knowledge gain) and **action** (domain change toward goal). For an as-is model, action-enablement reduces to "the model can be read by stakeholders well enough to ground the next-phase decision" (analysis, redesign, sign-off).

### 7.1 7PMG — the operational toolkit

[[concepts/7pmg]] is the empirically-grounded distillation of pragmatic-quality engineering for control-flow models:

| # | Guideline | Mechanical check |
|---|---|---|
| **G1** | Few elements | Count nodes; flag if growing without reason |
| **G2** | Min routing paths per element | Max connector degree (in+out) ≤ 4; avg ≤ 3 |
| **G3** | One start, one end | Count start/end events |
| **G4** | Structured (split-join matched) | Mismatch metric → 0 |
| **G5** | Avoid OR | OR-connector count → 0 |
| **G6** | Verb-object labels | Label-style audit |
| **G7** | Decompose if >50 | Total elements ≤ 50 per view |

Apply the guidelines as transformations — they preserve behaviour modulo branching bisimulation. The model becomes more readable without behavioural change.

### 7.2 Empirical evidence

[[sources/2010-mendling-reijers-vanderaalst-7pmg]] cites the underlying studies:
- **Process model understanding** (n=73 across TU/e + Madeira + Vienna): OR-joins and average connector degree negatively correlate with comprehension.
- **Error probability** on SAP Reference Model (600 EPCs) and an industry corpus (2000 EPCs): size and complexity drive errors; >50 elements → >50% error probability.
- **Label ambiguity** (n=29 postgrad experiment): verb-object significantly less ambiguous and more useful than action-noun.

[[sources/2012-ottensooser-graphical-vs-textual]] additionally shows (n=196, p<0.01) that **for non-experts, a structured written use case presented BEFORE the BPMN diagram yields the highest comprehension across all reader groups**. Implication: pragmatic quality for mixed audiences requires pairing the BPMN with a written companion document.

### 7.3 Bottom-up complexity metrics

Where 7PMG is the *qualitative* checklist, [[concepts/process-model-complexity-metrics]] provides the underlying quantitative measures:

| Metric | What it captures | 7PMG link |
|---|---|---|
| **|N|, |A|** | Size | G1, G7 |
| **Density = |A| / (|N|·(|N|-1))** | Interconnectedness | G1 |
| **avg / max connector degree** | Routing complexity per node | G2 |
| **CFC (Cardoso Control-Flow Complexity)** | Path count under split semantics | G2, G5 |
| **Mismatch** | Splits unmatched by joins of the same type | G4 |
| **OR-connector count** | OR usage | G5 |
| **Cross-connectivity** | Causal coherence | (cross-cutting) |
| **Depth** | Nesting level | (G7) |

Compute these statically with ProM or programmatic BPMN parsers; report as a quality dashboard alongside the model.

### 7.4 Pragmatic-quality red flags from interview-derived models

| Red flag | Likely cause | Fix |
|---|---|---|
| 70+ activities in one diagram | Insufficient sub-process abstraction | Apply G7; identify single-entry-single-exit blocks |
| Multiple start events (each interview's "trigger" became an event) | Discovery-phase artefact | Apply G3; use one start with subsequent XOR |
| OR-join in the middle of the model | Analyst hedging on synchronisation | Apply G5; use XOR with explicit conditions |
| Long noun-phrase labels ("Complaint analysis and processing") | Dictation-style transcription from interviews | Apply G6; rewrite as verb-object |
| Connector with degree 6+ | Analyst aggregating multiple choice points | Apply G2; split into nested gateways |

---

## 8. Layer 5 — Organisational quality, social quality, conformance dimensions

### 8.1 Organisational quality (M vs G)

Does the model fulfil the modelling goal? An as-is model produced for *redesign* has different goal-fitness criteria than one produced for *documentation* or for *executable workflow specification*.

For each goal:

| Goal | Quality emphasis |
|---|---|
| Documentation | Pragmatic (readable for newcomers) + semantic (validity) |
| Compliance evidence | Semantic (completeness on rainy-day paths) + soundness |
| Redesign baseline | Semantic (validity, completeness) + complexity metrics for diagnostics |
| Simulation | Semantic + soundness + quantitative annotations (durations, probabilities) |
| Executable workflow | Soundness + executable BPMN constraints (data types, mappings, [[frameworks/dmn]] for rules) |

Closing organisational quality means **process owner sign-off**, not analyst self-assessment ([[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]] §5.4.2).

### 8.2 Social quality (agreement among stakeholders)

When multiple stakeholders reviewed the model and disagree, what does "the model" represent?

- Workshop discovery surfaces disagreement faster than serial interviews ([[methods/process-discovery-methods]] §5.2.3).
- Where disagreement persists, **annotate the model** with the disagreement as documentation rather than silently picking one side.
- The model can be sound and pragmatic and still **socially un-agreed** — that is a separate quality axis SEQUAL flags.

### 8.3 Conformance dimensions (when an event log exists)

[[concepts/conformance-checking]] provides four log-relative quality dimensions that become applicable *after* discovery, when monitoring or process mining begins:

- **Fitness** — can the model replay the traces in the log?
- **Precision** — does the model only allow behaviour seen in the log, or over-generalise?
- **Generalisation** — will the model handle unseen-but-plausible future behaviour?
- **Simplicity** — Occam: smallest adequate explanation.

These trade off — the conformance equivalent of the [[concepts/devils-quadrangle|Devil's Quadrangle]]. They are **not relevant for a freshly-built as-is model with no log**, but should be revisited once an event log is available (the model can then be cross-validated against actual executions).

---

## 9. How the layers trade off

The layers are not all independent. Some interact:

| Tension | Description |
|---|---|
| **Soundness ↔ pragmatic structuredness** | 7PMG G4 "structured" simplifies soundness analysis (free-choice) — they reinforce. |
| **Semantic completeness ↔ pragmatic simplicity** | Adding rainy-day paths (semantic completeness) increases size (pragmatic G1, G7 violations). Resolution: decompose to sub-processes (G7). |
| **Pragmatic G1 (few elements) ↔ G2 (low routing degree)** | Reducing degree often requires adding intermediate connectors. 7PMG explicitly notes this; the paper's pragmatic priority ordering is workshop-derived. |
| **Pragmatic G6 (verb-object) ↔ semantic specificity** | "Write down complaint" is shorter than "Write down complaint with form AZ2" but loses information. Trade depends on modelling goal — executable models need specificity; documentation can drop it. |
| **Soundness ↔ semantic validity** | A model can be sound (no deadlocks) yet semantically invalid (e.g., wrong activity order). Soundness is a gate, not a target. |
| **Pragmatic ↔ social** | A perfectly-structured model may suppress disagreement. Annotated disagreement is more socially valid but pragmatically messier. |

There is no global optimum. **Resolve by modelling goal** (organisational quality): compliance-driven models prioritise completeness + soundness; redesign-driven models prioritise complexity-metrics + structure for diagnosis; documentation-driven models prioritise pragmatic G6 + semantic validity.

---

## 10. LLM-assisted review rubric

This section is the operational deliverable: a structured prompt the analyst (or an LLM running review automation) applies to a candidate BPMN model.

### 10.1 Required inputs to the LLM

1. **The BPMN model** — preferred: BPMN 2.0 XML (machine-parsable). Acceptable: rendered diagram (image) + activity/gateway list. Acceptable for behavioural review: textual representation (activity → next-activity table + gateway type per fork).
2. **Modelling goal** — one of: documentation · compliance · redesign baseline · simulation · executable workflow. Influences prioritisation across layers.
3. **Audience description** — who will read the model? Domain experts in this process · domain experts in adjacent processes · trained business analysts · executives · modelling novices.
4. **Available companion artefacts** — written use case · interview transcripts · prior model version · event log (if any).

### 10.2 The structured review prompt

```text
You are a BPMN model reviewer applying the Process Model Quality &
Soundness Evaluation Guide. Output a structured report with five
layered findings (L1–L5) and a final verdict.

For each layer, output:
- VERDICT: pass / pass-with-issues / fail / not-evaluable
- FINDINGS: bullet list of concrete issues with element-level
  references (use BPMN element IDs or labels).
- FIXES: for each finding, propose the smallest fix that resolves
  it without changing modelled behaviour, citing the relevant
  guideline (e.g. "Apply 7PMG G3" or "Restructure to satisfy
  proper completion").
- SEVERITY: critical (must fix) / high / medium / low.

LAYER 1 — PHYSICAL & EMPIRICAL
- Is the model rendered legibly?
- Layout: predominant flow direction, crossing arcs, alignment.
- Spacing / font / colour — purposeful?

LAYER 2A — NOTATION-SYNTACTIC (BPMN rules)
Check:
- Sequence flows connect only flow elements.
- Tasks single-in / single-out (or via gateway).
- Start events have no incoming flow; end events no outgoing.
- Boundary events attached to activities only.
- Gateway types set; XOR has default if conditions are partial.
- Pool/lane assignment correct; message flows cross pools only.
- Data objects connected via association.

LAYER 2B — BEHAVIOURAL-SYNTACTIC (soundness)
Reason about the WF-net translation. Verify the three classical
soundness conditions:
1. Option to complete: from every reachable state, the end event is
   reachable. Identify any deadlocks.
2. Proper completion: when the end event fires, no orphan tokens
   remain on parallel branches. Identify any "lost" tokens.
3. No dead transitions: every activity is executable from some
   reachable state. Identify any unreachable activities.
Mention applicable verification: free-choice WF-net implies
polynomial-time decidability (Aalst 1997, Theorem 12).

LAYER 3 — SEMANTIC (descriptive — model vs reality)
This layer requires the candidate model alongside the available
companion artefacts (use case, transcripts).
- VALIDITY: are all modelled elements consistent with the
  available evidence? Flag any element not supported by the
  source material.
- COMPLETENESS: identify likely-missing elements by applying the
  exception taxonomy:
    a) internal business exception
    b) external business exception
    c) internal technology exception
    d) activity timeout
  For each activity, ask: is at least one rainy-day path modelled?
  If transcripts are absent, mark this layer "not-evaluable" and
  suggest the analyst conduct a completeness validation
  conversation with the domain expert.
- HANDOFFS: for each cross-lane handoff, flag the receiving
  activity and ask whether the receiver's expectations are
  documented and consistent.

LAYER 4 — PRAGMATIC (understandability & action-enablement)
Apply 7PMG (Mendling, Reijers, Aalst 2010):
- G1 size: count nodes; flag if > 30 in single view.
- G2 routing: max connector degree; flag if > 4. avg if > 3.
- G3 single start/end: count; flag if > 1 of either.
- G4 structuredness: identify split connectors without matching
  joins of the same type (mismatch metric); flag each.
- G5 OR usage: count OR connectors; flag any.
- G6 labels: classify each activity label as verb-object,
  action-noun, or other; flag non-verb-object.
- G7 decomposition: if total elements > 50, propose subprocess
  candidates (single-entry-single-exit blocks).

Also compute and report:
- Density: |A| / (|N|·(|N|-1)). Flag if > 0.10.
- Cyclomatic complexity: |A| - |N| + 2. Flag if > 10.

For the audience type, comment on whether the BPMN should be
paired with a written use case (Ottensooser et al. 2012:
non-experts gain no statistically significant understanding from
BPMN alone; written use cases help all readers; UC-then-BPMN
ordering maximises comprehension).

LAYER 5 — ORGANISATIONAL & SOCIAL
- Does the model fit the modelling goal? (state goal upfront)
- Are there annotated disagreements / parking-lot items?
- Is process-owner sign-off recorded?

FINAL VERDICT
- If any layer is "fail" or has "critical" severity findings: the
  model is NOT ready for sign-off; list the gating issues.
- If "pass-with-issues": list the highest-priority fix order
  using this default priority (override by modelling goal):
  1. Soundness (L2b critical) — must fix.
  2. Notation-syntactic (L2a critical) — must fix.
  3. Semantic completeness (L3) — must fix for compliance/redesign
     goals.
  4. 7PMG G3 single start/end (enables soundness).
  5. 7PMG G5 (eliminate OR).
  6. 7PMG G4 (structuredness).
  7. 7PMG G2 (degree reduction).
  8. 7PMG G1 / G7 (size / decomposition).
  9. 7PMG G6 (label rewrite).
- If "pass": confirm sign-off readiness.

Output JSON:
{
  "L1_physical_empirical": {...},
  "L2a_notation_syntactic": {...},
  "L2b_behavioural_syntactic_soundness": {...},
  "L3_semantic": {...},
  "L4_pragmatic": {
    "G1_size": {...},
    "G2_routing": {...},
    "G3_start_end": {...},
    "G4_structuredness": {...},
    "G5_or_usage": {...},
    "G6_labels": {...},
    "G7_decomposition": {...},
    "metrics": {"|N|": ..., "|A|": ..., "density": ..., "max_degree": ..., "avg_degree": ..., "CFC": ..., "cyclomatic": ..., "mismatch_count": ...}
  },
  "L5_organisational_social": {...},
  "verdict": "pass" | "pass-with-issues" | "fail" | "not-evaluable",
  "priority_fix_list": [...]
}
```

### 10.3 What the LLM cannot do

- **Layer 3 (semantic) requires evidence**: without transcripts, use cases, or expert access, semantic validity is unknowable — the LLM should mark "not-evaluable" rather than hallucinate. This is a hard limit: the [[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]] §5.1.2 challenge 1 — fragmented knowledge — applies to the LLM as much as to the analyst. The model is not the territory.
- **Layer 5 organisational sign-off** is a human commitment, not an LLM artefact. The LLM can flag missing sign-off but not provide it.
- **Soundness on non-trivial BPMN** may require a real verification tool (WOFLAN, ProM, Camunda) — the LLM can flag candidate violations but should not claim definitive soundness for complex models without tool corroboration.

### 10.4 Calibration notes

Run the rubric on a known-good model first to baseline the LLM's calibration. Common drift patterns:
- LLMs over-flag G1 size violations on intentionally large but well-decomposed models — anchor with G7 decomposition reasoning.
- LLMs sometimes confuse XOR-with-default-flow as an OR construct — verify with explicit gateway-type metadata.
- LLMs may confidently call a model "sound" without doing the formal reduction — require the LLM to *construct the WF-net translation* explicitly before claiming soundness.

---

## 11. Anti-patterns (top 12)

Consolidating Dumas §5.4 + 7PMG paper §3.4 + interview-synthesis observations:

1. **Sunny-day-only model** — no rainy-day paths. Most common discovery failure.
2. **Multiple start events** — one per interview "trigger". Violates G3.
3. **Implicit join** — multiple sequence flows into a task without a gateway.
4. **OR-join with vague semantics** — analyst hedging. Replace with XOR or AND.
5. **Mismatched split-join types** — AND-split → XOR-join (proper completion violation).
6. **Activity at micro-step level** ("put document on fax machine") — violates Dumas activity-level discipline.
7. **Single-source authority for cross-lane handoff** — receiving end never validated.
8. **Long noun-phrase labels** — violates G6.
9. **Connector with degree 6+** — should split.
10. **Loop without exit condition** — livelock; soundness violation.
11. **Showing raw BPMN to domain expert for validation** — Ottensooser counter-evidence.
12. **Treating every complaint as a structural variant** — frequency-blind modelling. Use frequency questions ([[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]] §5.2.2 and §6.3.1) to separate sporadic anecdote from structural variant.

---

## 12. Evaluation workflow (recommended order)

For a candidate model, evaluate in this order — early failures invalidate later layers:

```
1. L1 (physical/empirical) — quick, near-zero-cost
       │
       ▼
2. L2a (notation-syntactic) — mechanical, fast (Camunda / bpmnlint)
       │  ─ if fails: fix, return to step 2
       ▼
3. L2b (soundness) — mechanical, polynomial for free-choice
       │  ─ if fails: must fix before semantic validation;
       │            soundness violations make semantic walk-through pointless
       ▼
4. L4 partial (7PMG G3 only) — single start/end enables proper soundness
       │  (loop back to step 3 if changes affect behaviour)
       ▼
5. L3 (semantic) — translate to natural language; expert walkthrough.
       │   This is the expensive, irreducibly-human step.
       ▼
6. L4 full (7PMG G1, G2, G4, G5, G6, G7 + metrics) — refinement
       │  (these transformations preserve behaviour; do not loop to L3)
       ▼
7. L5 (organisational) — process owner sign-off
       │
       ▼
8. (when log exists) Conformance dimensions — fitness, precision,
   generalisation, simplicity (§8.3) — model maintenance phase
```

Rule of thumb: **fix soundness before validating semantically**. A model with a deadlock will have an expert notice the deadlock or — worse — not notice it but invalidate other parts of the walkthrough. Resolve correctness before correspondence.

---

## 13. Acknowledged gaps (next ingest candidates)

This guide rests on the five primary sources listed in the frontmatter, but several adjacent sources would tighten its rigour:

- **Lindland, Sindre, Sølvberg 1994 (IEEE Software)** — the *original* SEQUAL paper (this guide uses Krogstie 2006 revision).
- **Mendling 2008** *Metrics for Process Models* — the bottom-up complexity-metrics monograph; would deepen [[concepts/process-model-complexity-metrics]].
- **Verbeek, Basten, van der Aalst 2001** *Diagnosing Workflow Processes Using Woflan* — would give concrete diagnostics for soundness violations.
- **Trcka, van der Aalst, Sidorova 2009** *Data-Flow Anti-Patterns* — adds data-aware soundness layer.
- **Reijers & Mendling 2008** *Modularity in Process Models* — empirical foundation for G7 decomposition threshold.
- **Rosemann 22 Modelling Pitfalls** — the exhaustive anti-pattern catalogue (referenced from [[syntheses/interview-structuring-for-process-models]]).
- **Moody 2009** *The 'Physics' of Notations* — pragmatic-quality theory for visual notations.
- **Cardoso 2005** *Control-Flow Complexity for Web Processes* — primary source for CFC metric.

When these are ingested, this synthesis can be upgraded delta-by-delta — particularly §5 (richer soundness variants and diagnostics), §7 (deeper metric calibration), §11 (full Rosemann 22-pattern catalogue).

---

## 14. Quick-reference card

```
LAYER       CRITERION                    HOW VERIFIED                        OWNER
─────────────────────────────────────────────────────────────────────────────────
L1 phys/emp Accessible, legible          BPMN tool                           Analyst
L2a notation BPMN rules                  bpmnlint / Camunda                  Analyst
L2b sound   Option-to-complete           Soundness checker (free-choice      Analyst
            Proper-completion             ⇒ polynomial)
            No dead transitions
L3 semantic Validity (M ⊆ D)             Expert NL walkthrough               Domain expert
            Completeness (D ⊆ M)         + rainy-day taxonomy probing        Analyst + expert
L4 pragmatic G1–G7 + metrics              Static analyser + reader test       Analyst (+ LLM)
L5 organis  Goal fulfilment               Owner sign-off                      Process owner
─────────────────────────────────────────────────────────────────────────────────
(post-go-live, with log)
            Fitness, Precision,           ProM / Disco / Celonis              Analyst
            Generalisation, Simplicity
```

---

## Sources & related

**Primary**: [[sources/1998-vanderaalst-verification-of-workflow-nets]] · [[sources/2010-mendling-reijers-vanderaalst-7pmg]] · [[sources/2006-krogstie-sindre-jorgensen-revised-sequal-framework]] · [[sources/2018-dumas-fundamentals-of-bpm]] (§5.4 quality, §11.4 conformance) · [[sources/2018-dumas-fundamentals-of-bpm-ch5-discovery]] · [[sources/2012-ottensooser-graphical-vs-textual]]

**Concepts**: [[concepts/process-model-quality]] · [[concepts/soundness]] · [[concepts/workflow-net]] · [[concepts/7pmg]] · [[concepts/sequal-framework]] · [[concepts/process-model-complexity-metrics]] · [[concepts/conformance-checking]]

**Companion synthesis (upstream)**: [[syntheses/interview-structuring-for-process-models]] — the elicitation guide that produces the model this synthesis evaluates.

**Frameworks**: [[frameworks/bpmn]] · [[frameworks/declare]] (declarative analogue) · [[frameworks/dmn]] (decision-logic adjunct)