---
title: "Fundamentals of BPM (2018) — Ch. 5: Process Discovery"
type: source
tags: [bpm, discovery, elicitation, interview, workshop, modeling-method]
authors: [Dumas, Marlon; La Rosa, Marcello; Mendling, Jan; Reijers, Hajo A.]
year: 2018
venue: "Springer (2nd ed.), Chapter 5"
kind: chapter
raw_path: "raw/Process Frameworks & BPM/Fundamentals_of_BPM 2018.pdf"
sources: ["[[sources/2018-dumas-fundamentals-of-bpm]]"]
key_claims:
  - "Process discovery has four tasks: define setting, gather information, model, assure quality (§5.1)."
  - "Two roles: process analyst (modelling skill) and domain expert (process knowledge); process owner secures commitment (§5.1.1)."
  - "Three discovery challenges: fragmented knowledge, thinking in cases, unfamiliarity with modelling languages (§5.1.2)."
  - "Three method classes: evidence-based (documents, observation, automated), interview-based, workshop-based (§5.2)."
  - "Interview method is an iterative Interview → Modeling → Validation cycle (Fig 5.4), usually ≥2 iterations."
  - "Balance structured questioning (validate hypotheses, ~45 min) with free-form (~15 min) to avoid sunny-day bias."
  - "Rainy-day questions are essential; derive from internal/external exceptions and activity timeouts."
  - "Five-step modelling method: boundaries → activities/events → resources/handoffs → control flow → additional elements (§5.3)."
created: 2026-04-15
updated: 2026-04-15
---

# Fundamentals of BPM (2018) — Ch. 5: Process Discovery

Chapter deep-dive into [[sources/2018-dumas-fundamentals-of-bpm]] §5, focused on the **human-driven** elicitation material used to construct as-is process models. Back-links to the meta source page.

## 5.1 Setting

Process discovery = "gathering information about an existing process and organising it as an as-is process model." Four tasks:
1. Define the setting (team, scope).
2. Gather information (method choice — see §5.2).
3. Conduct modelling (§5.3 five-step method).
4. Assure model quality (§5.4 — syntactic / semantic / pragmatic).

### Roles (§5.1.1)
- **Process analyst** — modelling-language skill, gathering/organising info.
- **Domain expert** — intimate process knowledge; typically process participant, operational manager, or external role (customer, supplier).
- **Process owner** — secures commitment.

### Three challenges (§5.1.2)
1. **Fragmented knowledge** — tasks split across specialists; abstract view vs. deep local view; conflicting upstream/downstream assumptions.
2. **Thinking in cases** — experts describe specific instances, not general flow. Analyst must *reverse-engineer* the routing rules by asking about conditions, outcomes, deadlines.
3. **Unfamiliarity with modelling languages** — don't show BPMN to a domain expert; translate back to natural language for validation.

### Expert-analyst heuristics (sidebar)
- Get the right people on board — line-manager buy-in before approaching participant.
- Keep a **short, precise set of working hypotheses** at varying detail levels; prepare an extensive set of questions in advance.
- Identify control-flow patterns from phrasing: "alternative / exclusive / conditional" → XOR; "independent / either order" → AND.
- Pay attention to **model quality and layout** — neat diagrams engage stakeholders.

## 5.2 Discovery Methods

### 5.2.1 Evidence-based
- **Document analysis** — policies, org charts, forms, handbooks, work instructions, user forms. Issues: outdated, wrong granularity, normative rather than actual.
- **Observation** — active (analyst plays customer) or passive (shadowing). Passive observation risks Hawthorne effect; active only sees customer-visible parts.
- **Automated process discovery** — from event logs; see [[methods/process-mining-basics]].

### 5.2.2 Interview-based — the interview guide
Three-phase iterative cycle (Fig 5.4): **Interview → Modeling → Validation**. ≥2 iterations typical; complex processes need more.

**Two traversal strategies:**
- **Backward** — start from process outcomes (e.g., "order fulfilled"), work back to triggers.
- **Forward** — start from triggers (e.g., "PO received"), follow the flow.
Use both across interviews.

**Per-interviewee checklist** — clarify:
- What **input** is expected from prior upstream activities.
- What **decisions** are taken.
- What **output** is produced.
- To **what resource** it is then forwarded.

**Structured vs free-form balance** (for a 1-h interview):
- ~45 min **structured** — validate pre-formed hypotheses via predefined questions.
- ~15 min **free-form** — let interviewee raise what they think relevant.
- Pure structured risks a checklist feel that suppresses information; pure free-form wanders.

**Sunny-day pitfall** — asked how a process works, interviewees default to the normal path. Exceptions vanish.

**Rainy-day questions** — explicitly probe the exceptional:
- "How did you handle your most difficult customer?"
- "What was the most difficult case you worked on?"
- "What happens if the customer does not reply on time?"

Derive rainy-day questions systematically from the exception taxonomy:
- **Internal business exception** (e.g., out-of-stock).
- **External business exception** (e.g., customer cancels).
- **Internal technology exception** (e.g., stock-check system unresponsive).
- **Activity timeouts**.

**Validation phase** — translate the constructed model to natural language when showing it back; domain experts won't read BPMN confidently. Expect 1–2 validation iterations per expert.

### 5.2.3 Workshop-based
Additional roles beyond analyst: **facilitator** (moderates parole), **process modeler** (draws the model live), **scribe** (parks unresolved threads).

Effort: **3–5 sessions**, ≤10–12 participants each; consolidate model between sessions.

**Session 1** — lightweight participatory modelling: sticky notes on a wall, no gateways yet, tasks left-to-right in temporal order; facilitator keeps granularity consistent (no micro-steps like "put document on fax machine").

**Session 2** — brief BPMN primer (start/end, activities, XOR, AND), then validate the model assembled from Session 1 on whiteboard or projector.

**Culture note** — hierarchical/regulated organisations suppress openness; hand-pick participants and facilitator accordingly.

### 5.2.4 Strengths & weaknesses (Tables 5.1, 5.2)

| Aspect | Evidence | Interviews | Workshops |
|---|---|---|---|
| Objectivity | High | Med-high | Med-high |
| Richness | Medium | High | High |
| Time consumption | Low-med | Medium | Medium |
| Immediacy of feedback | Low | High | High |

Start with **document analysis** (readily available), then layer interviews/workshops. Workshops resolve conflicting perceptions fastest but require simultaneous availability. Automated discovery needs a supporting IT system and clean logs.

## 5.3 Five-step Modelling Method
1. **Process boundaries** — start/end events; input/output business objects; use terminate events for negative outcomes.
2. **Activities and events** — enumerate main activities and intermediate events; defer fine detail.
3. **Resources and handoffs** — assign activities to pools/lanes; handoff points mark assumption boundaries and candidate sub-processes.
4. **Control flow** — add XOR/AND, loops, event-based splits; refine handoffs into dependencies.
5. **Additional elements** — data objects/stores, exception handlers, compensation; annotations (cost/risk) per modelling purpose.

Steps map naturally onto workshop sessions (1–2 in one session, then 3, 4, 5 each in a follow-up).

## 5.4 Process Model Quality Assurance

Three quality aspects (Fig 5.9), each paired with its assurance activity:

| Quality aspect | Assurance activity | Verification mode |
|---|---|---|
| **Syntactic** | **Verification** | Mechanical (graph + behavioural rules) |
| **Semantic** | **Validation** | Domain-expert + tools |
| **Pragmatic** | **Certification** | User testing + design-in via guidelines |

In addition, **modelling guidelines and conventions** (e.g., 7PMG — [[concepts/7pmg]], [[sources/2010-mendling-reijers-vanderaalst-7pmg]]) can be used to achieve high quality from the start rather than retrofitting it.

### 5.4.1 Syntactic Quality and Verification

**Syntactic quality** = conformance of a process model to the syntactic rules of the modelling language. Two rule classes:

- **Structural rules** — how elements connect to each other.
- **Behavioural rules** — how the model can be instantiated; whether executions complete properly.

#### Structural rules (BPMN element-level)

| Element class | Rule |
|---|---|
| **Activity** | Must have ≥1 incoming AND ≥1 outgoing sequence flow |
| **Start event** | Must NOT have any incoming sequence flow |
| **End event** | Must NOT have any outgoing sequence flow |
| **Intermediate event** | Must have ≥1 incoming AND ≥1 outgoing sequence flow |
| **Boundary event** | Only **intermediate catching** events may attach to an activity's border |
| **Split gateway** | Exactly 1 incoming, ≥2 outgoing |
| **Join gateway** | ≥2 incoming, exactly 1 outgoing |
| **(X)OR-split outgoing arcs** | Must bear conditions |
| **Sequence flow** | Connects two flow nodes (activity, event, gateway) of the **same pool**; cannot cross pool boundaries |
| **Message flow** | Connects an activity or throwing message event in one pool to an activity or catching message event in a different pool |
| **Directed data association** | Connects data object ↔ activity or throwing message event; data store ↔ activity (in either direction) |
| **Undirected data association** | Connects data object ↔ sequence flow OR text annotation ↔ any element |

#### Structural rules (model-level)

> **All flow nodes must be on a path from a start event to an end event.**

This is the **connectedness** requirement — equivalent to the WF-net definition's "no dangling tasks/conditions" ([[sources/1998-vanderaalst-verification-of-workflow-nets]] Definition 6).

A model satisfying all structural rules is **structurally correct**. Verifiable by inspecting the graph; tool-enforceable.

#### Behavioural anomalies (Fig 5.11)

Behavioural rules avoid four classes of **anomaly**:

1. **Deadlock** ([[concepts/deadlock]]) — a running instance reaches a state from which it cannot progress further. Token gets stuck.
2. **Livelock** ([[concepts/livelock]]) — a token is trapped within a loop structure: it can move, but only inside the loop, and never reaches an end event. Often arises when a loop's exit condition always evaluates to true (or false-when-it-shouldn't).
3. **Lack of synchronisation** ([[concepts/lack-of-synchronization]]) — two or more tokens end up on the same sequence flow because they were not synchronised at a join gateway. Signals improper completion (multiple tokens reach the same end-flow position simultaneously).
4. **Dead activity** ([[concepts/dead-activity]]) — an activity that can never be executed in any instance.

Both deadlocks and livelocks prevent tokens from reaching an end event ⇒ violate **option to complete**. Lack of synchronisation puts multiple tokens on a flow that should carry one ⇒ violates **proper completion**. Dead activities violate **no dead activities** directly.

#### Block structure (Fig 5.11)

A **block structure** is a single-entry, single-exit (SESE) process-model fragment whose entry and exit are gateways: **one split, one matching join, both of the same type**, with each path from the split leading to the join.

If a block's split and join match in type (AND-AND, XOR-XOR, OR-OR), no behavioural anomaly arises *within* the block. Mismatched types cause anomalies:

| Split | Join | Anomaly |
|---|---|---|
| AND | XOR | Lack of synchronisation (XOR fires for first arrival; second token left on incoming flow) |
| AND | OR | Potential lack of synchronisation (depends on which branches are active) |
| XOR | AND | Deadlock at AND-join (XOR routes one token; AND waits for two) |
| XOR | OR | Livelock + lack-of-synchronisation possible in cyclic structures |

See [[concepts/block-structure]].

Anomalies *outside* block structures are also possible — and harder to spot. Example (Fig 5.12a): a branch injected into what would otherwise be a perfect AND-block creates a deadlock at the AND-join.

#### Soundness (Dumas's formulation)

A process model is **behaviourally correct (sound)** iff:

1. **Option to complete**: any running process instance must eventually complete.
2. **Proper completion**: at the moment of completion, each token of the process instance must be in a *different* end event.
3. **No dead activities**: any activity can be executed in at least one process instance.

This is Dumas's formulation. Compare with [[sources/1998-vanderaalst-verification-of-workflow-nets]] Definition 7 (the formal Petri-net version):

| Aspect | Aalst 1997 (formal) | Dumas 2018 (textbook) |
|---|---|---|
| (1) | From every reachable state, reach final state | Any running instance must eventually complete |
| (2) | Final marking has 1 token in *o*, all other places empty | Each token must be in a *different* end event (multi-end-event variant) |
| (3) | No dead transitions | No dead activities |

The substantive difference is in clause (2): Aalst's formulation uses a single sink place; Dumas's allows multiple end events but requires no two tokens reach the same end. They are equivalent for single-end models; Dumas's accommodates BPMN's allowance for multiple end events (per [[sources/2018-dumas-fundamentals-of-bpm-ch3-essential-process-modeling|Ch. 3]] §3.2.2 — implicit termination semantics).

#### Scope of soundness

This definition covers **control flow only**. It assumes:
- All input data objects + incoming messages are available when an activity is to be executed.
- All output data objects + outgoing messages are produced upon completion.

Data-flow correctness, message-availability correctness, and resource-availability correctness are **out of scope** for this soundness definition. Those are addressed in later work (data-aware soundness, etc.).

#### Achieving correctness — three modes

1. **Verification after the fact**: build the model, then run a verifier (tool-enforced soundness check).
2. **Construction by design**: use a tool that only allows soundness-preserving edit operations.
3. **Block-structuring**: restrict edits to nested blocks of matching split-join pairs (so-called *structured* process models). Sound by construction, but with **limited expressiveness** — some legal cyclic behaviours cannot be expressed (cf. §4.1 on cycles). See [[concepts/block-structure]] and [[concepts/7pmg]] G4.

Parts of a model that cause unsoundness should be **reworked** — typically these trigger questions about specific behaviour that the analyst must clarify with domain experts.

### 5.4.2 Semantic Quality and Validation

**Semantic quality** = adherence of the model to the real-world process. Activity: **validation** = checking semantic quality by comparing model with real-world process.

Two essential aspects:

1. **Validity** — every statement made by the model is correct and relevant to the real-world process. Assessed by explaining the model to domain experts and asking them to point out divergences.

2. **Completeness** — the model contains all relevant statements about the corresponding business process. **Harder to assess than validity** — analyst must actively ask about alternative processing options at different stages.

#### Worked distinction (Dumas example)

Consider a loan-assessment process. The model states "any financial officer may carry out the task of checking credit history".
- If reality requires *specific authorisation* for this task → **invalid** statement (validity violation).
- If the credit-history check is *not in the model at all* but happens in reality → **incomplete** model (completeness violation).

Completeness must be judged against the **modelling objective**: a documentation model and a redesign baseline have different completeness thresholds (see [[concepts/sequal-framework]] feasible-completeness notion).

#### Validation methods

- **Interviews** — translate model to natural language; expert points out falsifications. See [[syntheses/interview-structuring-for-process-models]].
- **Workshops** — collective walk-through; resolves inconsistent perceptions in real time.
- **Tools** — automated process discovery from event logs provides "truthfulness by design" (see Ch. 11).
- **Process owner approval** — special validation step; endorsement of validity + completeness; establishes the model's **normative character** so it can be published, used, archived.

### 5.4.3 Pragmatic Quality and Certification

**Pragmatic quality** = usability of the model. Activity: **certification** = checking pragmatic quality by investigating actual use.

Three usability aspects:

- **Understandability** — how easy to read and comprehend.
- **Maintainability** — how easy to apply changes.
- **Learning** — how well the model reveals how the corresponding business process works in reality.

Influencing characteristics: **size**, **structural complexity**, **graphical layout** — measurable via [[concepts/process-model-complexity-metrics]].

#### Two essential checks for understandability

**Check 1 — Consistency between visual structure and logical structure.**
- Predominant flow direction: top-left → bottom-right (or left → right, or top → bottom).
- **No crossing arcs** (eliminate via re-routing or activity duplication).
- **Block-structuring** where possible — structured models are typically easier to understand than unstructured (Fig 5.15).

Re-layout (Fig 5.16) often gives substantial pragmatic-quality lift without changing semantics: same model, repositioned with consistent direction and no crossings.

**Check 2 — Meaningful labels.**

Activities, events, and gateways must use labels following naming conventions (cf. [[sources/2018-dumas-fundamentals-of-bpm-ch3-essential-process-modeling|Ch. 3]] §3.1):

| Element | Convention | Good | Bad |
|---|---|---|---|
| Activity | **verb-object** (imperative + business object) | *Approve order* | *Order approval* (action-noun) · *Cost planning* (noun ambiguous) |
| Event | **noun + past-participle** (state) | *Invoice emitted* · *Expenses approved* | *Approved* (no business object) |
| (X)OR-split gateway | Avoid labelling gateway; use **explicit conditions** on outgoing arcs | branches: *plan acceptable* / *plan unacceptable* | branches: *yes* / *no* |

Common label-quality problems (Fig 5.17 example):
- Mixing verb-object and action-noun within one model (creates terminological inconsistency).
- Gateway label disguising a missing decision activity ("Acceptable?" hides a "Check plan acceptability" activity).
- End-event label without business object ("Approved" → should be "Expenses approved").
- Generic verbs: *to make*, *to do*, *to perform*, *to conduct* — replace with specific verbs.
- Words like *process* / *order* used as both verb and noun within one model — pick one part of speech and stick to it.

#### Pragmatic quality by design

Pragmatic quality can be designed-in rather than retrofitted via:
- **Block-structuring** ([[concepts/block-structure]]) — also gives soundness-by-construction.
- **7PMG** ([[concepts/7pmg]]) — seven empirically-grounded guidelines.
- **Decomposition** — sub-processes for portions exceeding 30–50 elements (G7).
- **Layout discipline** — top-left to bottom-right, no crossings, alignment.

## Connections
- Expands [[methods/process-discovery-methods]] with concrete interview-guide material.
- [[concepts/process-discovery]] · [[concepts/process-model-quality]]
- Contrast with automated discovery: [[methods/process-mining-basics]]
- Meta source: [[sources/2018-dumas-fundamentals-of-bpm]]