--- title: "Fundamentals of BPM (2018) — Ch. 5: Process Discovery" type: source tags: [bpm, discovery, elicitation, interview, workshop, modeling-method] authors: [Dumas, Marlon; La Rosa, Marcello; Mendling, Jan; Reijers, Hajo A.] year: 2018 venue: "Springer (2nd ed.), Chapter 5" kind: chapter raw_path: "raw/Process Frameworks & BPM/Fundamentals_of_BPM 2018.pdf" sources: ["[[sources/2018-dumas-fundamentals-of-bpm]]"] key_claims: - "Process discovery has four tasks: define setting, gather information, model, assure quality (§5.1)." - "Two roles: process analyst (modelling skill) and domain expert (process knowledge); process owner secures commitment (§5.1.1)." - "Three discovery challenges: fragmented knowledge, thinking in cases, unfamiliarity with modelling languages (§5.1.2)." - "Three method classes: evidence-based (documents, observation, automated), interview-based, workshop-based (§5.2)." - "Interview method is an iterative Interview → Modeling → Validation cycle (Fig 5.4), usually ≥2 iterations." - "Balance structured questioning (validate hypotheses, ~45 min) with free-form (~15 min) to avoid sunny-day bias." - "Rainy-day questions are essential; derive from internal/external exceptions and activity timeouts." - "Five-step modelling method: boundaries → activities/events → resources/handoffs → control flow → additional elements (§5.3)." created: 2026-04-15 updated: 2026-04-15 --- # Fundamentals of BPM (2018) — Ch. 5: Process Discovery Chapter deep-dive into [[sources/2018-dumas-fundamentals-of-bpm]] §5, focused on the **human-driven** elicitation material used to construct as-is process models. Back-links to the meta source page. ## 5.1 Setting Process discovery = "gathering information about an existing process and organising it as an as-is process model." Four tasks: 1. Define the setting (team, scope). 2. Gather information (method choice — see §5.2). 3. Conduct modelling (§5.3 five-step method). 4. Assure model quality (§5.4 — syntactic / semantic / pragmatic). ### Roles (§5.1.1) - **Process analyst** — modelling-language skill, gathering/organising info. - **Domain expert** — intimate process knowledge; typically process participant, operational manager, or external role (customer, supplier). - **Process owner** — secures commitment. ### Three challenges (§5.1.2) 1. **Fragmented knowledge** — tasks split across specialists; abstract view vs. deep local view; conflicting upstream/downstream assumptions. 2. **Thinking in cases** — experts describe specific instances, not general flow. Analyst must *reverse-engineer* the routing rules by asking about conditions, outcomes, deadlines. 3. **Unfamiliarity with modelling languages** — don't show BPMN to a domain expert; translate back to natural language for validation. ### Expert-analyst heuristics (sidebar) - Get the right people on board — line-manager buy-in before approaching participant. - Keep a **short, precise set of working hypotheses** at varying detail levels; prepare an extensive set of questions in advance. - Identify control-flow patterns from phrasing: "alternative / exclusive / conditional" → XOR; "independent / either order" → AND. - Pay attention to **model quality and layout** — neat diagrams engage stakeholders. ## 5.2 Discovery Methods ### 5.2.1 Evidence-based - **Document analysis** — policies, org charts, forms, handbooks, work instructions, user forms. Issues: outdated, wrong granularity, normative rather than actual. - **Observation** — active (analyst plays customer) or passive (shadowing). Passive observation risks Hawthorne effect; active only sees customer-visible parts. - **Automated process discovery** — from event logs; see [[methods/process-mining-basics]]. ### 5.2.2 Interview-based — the interview guide Three-phase iterative cycle (Fig 5.4): **Interview → Modeling → Validation**. ≥2 iterations typical; complex processes need more. **Two traversal strategies:** - **Backward** — start from process outcomes (e.g., "order fulfilled"), work back to triggers. - **Forward** — start from triggers (e.g., "PO received"), follow the flow. Use both across interviews. **Per-interviewee checklist** — clarify: - What **input** is expected from prior upstream activities. - What **decisions** are taken. - What **output** is produced. - To **what resource** it is then forwarded. **Structured vs free-form balance** (for a 1-h interview): - ~45 min **structured** — validate pre-formed hypotheses via predefined questions. - ~15 min **free-form** — let interviewee raise what they think relevant. - Pure structured risks a checklist feel that suppresses information; pure free-form wanders. **Sunny-day pitfall** — asked how a process works, interviewees default to the normal path. Exceptions vanish. **Rainy-day questions** — explicitly probe the exceptional: - "How did you handle your most difficult customer?" - "What was the most difficult case you worked on?" - "What happens if the customer does not reply on time?" Derive rainy-day questions systematically from the exception taxonomy: - **Internal business exception** (e.g., out-of-stock). - **External business exception** (e.g., customer cancels). - **Internal technology exception** (e.g., stock-check system unresponsive). - **Activity timeouts**. **Validation phase** — translate the constructed model to natural language when showing it back; domain experts won't read BPMN confidently. Expect 1–2 validation iterations per expert. ### 5.2.3 Workshop-based Additional roles beyond analyst: **facilitator** (moderates parole), **process modeler** (draws the model live), **scribe** (parks unresolved threads). Effort: **3–5 sessions**, ≤10–12 participants each; consolidate model between sessions. **Session 1** — lightweight participatory modelling: sticky notes on a wall, no gateways yet, tasks left-to-right in temporal order; facilitator keeps granularity consistent (no micro-steps like "put document on fax machine"). **Session 2** — brief BPMN primer (start/end, activities, XOR, AND), then validate the model assembled from Session 1 on whiteboard or projector. **Culture note** — hierarchical/regulated organisations suppress openness; hand-pick participants and facilitator accordingly. ### 5.2.4 Strengths & weaknesses (Tables 5.1, 5.2) | Aspect | Evidence | Interviews | Workshops | |---|---|---|---| | Objectivity | High | Med-high | Med-high | | Richness | Medium | High | High | | Time consumption | Low-med | Medium | Medium | | Immediacy of feedback | Low | High | High | Start with **document analysis** (readily available), then layer interviews/workshops. Workshops resolve conflicting perceptions fastest but require simultaneous availability. Automated discovery needs a supporting IT system and clean logs. ## 5.3 Five-step Modelling Method 1. **Process boundaries** — start/end events; input/output business objects; use terminate events for negative outcomes. 2. **Activities and events** — enumerate main activities and intermediate events; defer fine detail. 3. **Resources and handoffs** — assign activities to pools/lanes; handoff points mark assumption boundaries and candidate sub-processes. 4. **Control flow** — add XOR/AND, loops, event-based splits; refine handoffs into dependencies. 5. **Additional elements** — data objects/stores, exception handlers, compensation; annotations (cost/risk) per modelling purpose. Steps map naturally onto workshop sessions (1–2 in one session, then 3, 4, 5 each in a follow-up). ## 5.4 Process Model Quality Assurance Three quality aspects (Fig 5.9), each paired with its assurance activity: | Quality aspect | Assurance activity | Verification mode | |---|---|---| | **Syntactic** | **Verification** | Mechanical (graph + behavioural rules) | | **Semantic** | **Validation** | Domain-expert + tools | | **Pragmatic** | **Certification** | User testing + design-in via guidelines | In addition, **modelling guidelines and conventions** (e.g., 7PMG — [[concepts/7pmg]], [[sources/2010-mendling-reijers-vanderaalst-7pmg]]) can be used to achieve high quality from the start rather than retrofitting it. ### 5.4.1 Syntactic Quality and Verification **Syntactic quality** = conformance of a process model to the syntactic rules of the modelling language. Two rule classes: - **Structural rules** — how elements connect to each other. - **Behavioural rules** — how the model can be instantiated; whether executions complete properly. #### Structural rules (BPMN element-level) | Element class | Rule | |---|---| | **Activity** | Must have ≥1 incoming AND ≥1 outgoing sequence flow | | **Start event** | Must NOT have any incoming sequence flow | | **End event** | Must NOT have any outgoing sequence flow | | **Intermediate event** | Must have ≥1 incoming AND ≥1 outgoing sequence flow | | **Boundary event** | Only **intermediate catching** events may attach to an activity's border | | **Split gateway** | Exactly 1 incoming, ≥2 outgoing | | **Join gateway** | ≥2 incoming, exactly 1 outgoing | | **(X)OR-split outgoing arcs** | Must bear conditions | | **Sequence flow** | Connects two flow nodes (activity, event, gateway) of the **same pool**; cannot cross pool boundaries | | **Message flow** | Connects an activity or throwing message event in one pool to an activity or catching message event in a different pool | | **Directed data association** | Connects data object ↔ activity or throwing message event; data store ↔ activity (in either direction) | | **Undirected data association** | Connects data object ↔ sequence flow OR text annotation ↔ any element | #### Structural rules (model-level) > **All flow nodes must be on a path from a start event to an end event.** This is the **connectedness** requirement — equivalent to the WF-net definition's "no dangling tasks/conditions" ([[sources/1998-vanderaalst-verification-of-workflow-nets]] Definition 6). A model satisfying all structural rules is **structurally correct**. Verifiable by inspecting the graph; tool-enforceable. #### Behavioural anomalies (Fig 5.11) Behavioural rules avoid four classes of **anomaly**: 1. **Deadlock** ([[concepts/deadlock]]) — a running instance reaches a state from which it cannot progress further. Token gets stuck. 2. **Livelock** ([[concepts/livelock]]) — a token is trapped within a loop structure: it can move, but only inside the loop, and never reaches an end event. Often arises when a loop's exit condition always evaluates to true (or false-when-it-shouldn't). 3. **Lack of synchronisation** ([[concepts/lack-of-synchronization]]) — two or more tokens end up on the same sequence flow because they were not synchronised at a join gateway. Signals improper completion (multiple tokens reach the same end-flow position simultaneously). 4. **Dead activity** ([[concepts/dead-activity]]) — an activity that can never be executed in any instance. Both deadlocks and livelocks prevent tokens from reaching an end event ⇒ violate **option to complete**. Lack of synchronisation puts multiple tokens on a flow that should carry one ⇒ violates **proper completion**. Dead activities violate **no dead activities** directly. #### Block structure (Fig 5.11) A **block structure** is a single-entry, single-exit (SESE) process-model fragment whose entry and exit are gateways: **one split, one matching join, both of the same type**, with each path from the split leading to the join. If a block's split and join match in type (AND-AND, XOR-XOR, OR-OR), no behavioural anomaly arises *within* the block. Mismatched types cause anomalies: | Split | Join | Anomaly | |---|---|---| | AND | XOR | Lack of synchronisation (XOR fires for first arrival; second token left on incoming flow) | | AND | OR | Potential lack of synchronisation (depends on which branches are active) | | XOR | AND | Deadlock at AND-join (XOR routes one token; AND waits for two) | | XOR | OR | Livelock + lack-of-synchronisation possible in cyclic structures | See [[concepts/block-structure]]. Anomalies *outside* block structures are also possible — and harder to spot. Example (Fig 5.12a): a branch injected into what would otherwise be a perfect AND-block creates a deadlock at the AND-join. #### Soundness (Dumas's formulation) A process model is **behaviourally correct (sound)** iff: 1. **Option to complete**: any running process instance must eventually complete. 2. **Proper completion**: at the moment of completion, each token of the process instance must be in a *different* end event. 3. **No dead activities**: any activity can be executed in at least one process instance. This is Dumas's formulation. Compare with [[sources/1998-vanderaalst-verification-of-workflow-nets]] Definition 7 (the formal Petri-net version): | Aspect | Aalst 1997 (formal) | Dumas 2018 (textbook) | |---|---|---| | (1) | From every reachable state, reach final state | Any running instance must eventually complete | | (2) | Final marking has 1 token in *o*, all other places empty | Each token must be in a *different* end event (multi-end-event variant) | | (3) | No dead transitions | No dead activities | The substantive difference is in clause (2): Aalst's formulation uses a single sink place; Dumas's allows multiple end events but requires no two tokens reach the same end. They are equivalent for single-end models; Dumas's accommodates BPMN's allowance for multiple end events (per [[sources/2018-dumas-fundamentals-of-bpm-ch3-essential-process-modeling|Ch. 3]] §3.2.2 — implicit termination semantics). #### Scope of soundness This definition covers **control flow only**. It assumes: - All input data objects + incoming messages are available when an activity is to be executed. - All output data objects + outgoing messages are produced upon completion. Data-flow correctness, message-availability correctness, and resource-availability correctness are **out of scope** for this soundness definition. Those are addressed in later work (data-aware soundness, etc.). #### Achieving correctness — three modes 1. **Verification after the fact**: build the model, then run a verifier (tool-enforced soundness check). 2. **Construction by design**: use a tool that only allows soundness-preserving edit operations. 3. **Block-structuring**: restrict edits to nested blocks of matching split-join pairs (so-called *structured* process models). Sound by construction, but with **limited expressiveness** — some legal cyclic behaviours cannot be expressed (cf. §4.1 on cycles). See [[concepts/block-structure]] and [[concepts/7pmg]] G4. Parts of a model that cause unsoundness should be **reworked** — typically these trigger questions about specific behaviour that the analyst must clarify with domain experts. ### 5.4.2 Semantic Quality and Validation **Semantic quality** = adherence of the model to the real-world process. Activity: **validation** = checking semantic quality by comparing model with real-world process. Two essential aspects: 1. **Validity** — every statement made by the model is correct and relevant to the real-world process. Assessed by explaining the model to domain experts and asking them to point out divergences. 2. **Completeness** — the model contains all relevant statements about the corresponding business process. **Harder to assess than validity** — analyst must actively ask about alternative processing options at different stages. #### Worked distinction (Dumas example) Consider a loan-assessment process. The model states "any financial officer may carry out the task of checking credit history". - If reality requires *specific authorisation* for this task → **invalid** statement (validity violation). - If the credit-history check is *not in the model at all* but happens in reality → **incomplete** model (completeness violation). Completeness must be judged against the **modelling objective**: a documentation model and a redesign baseline have different completeness thresholds (see [[concepts/sequal-framework]] feasible-completeness notion). #### Validation methods - **Interviews** — translate model to natural language; expert points out falsifications. See [[syntheses/interview-structuring-for-process-models]]. - **Workshops** — collective walk-through; resolves inconsistent perceptions in real time. - **Tools** — automated process discovery from event logs provides "truthfulness by design" (see Ch. 11). - **Process owner approval** — special validation step; endorsement of validity + completeness; establishes the model's **normative character** so it can be published, used, archived. ### 5.4.3 Pragmatic Quality and Certification **Pragmatic quality** = usability of the model. Activity: **certification** = checking pragmatic quality by investigating actual use. Three usability aspects: - **Understandability** — how easy to read and comprehend. - **Maintainability** — how easy to apply changes. - **Learning** — how well the model reveals how the corresponding business process works in reality. Influencing characteristics: **size**, **structural complexity**, **graphical layout** — measurable via [[concepts/process-model-complexity-metrics]]. #### Two essential checks for understandability **Check 1 — Consistency between visual structure and logical structure.** - Predominant flow direction: top-left → bottom-right (or left → right, or top → bottom). - **No crossing arcs** (eliminate via re-routing or activity duplication). - **Block-structuring** where possible — structured models are typically easier to understand than unstructured (Fig 5.15). Re-layout (Fig 5.16) often gives substantial pragmatic-quality lift without changing semantics: same model, repositioned with consistent direction and no crossings. **Check 2 — Meaningful labels.** Activities, events, and gateways must use labels following naming conventions (cf. [[sources/2018-dumas-fundamentals-of-bpm-ch3-essential-process-modeling|Ch. 3]] §3.1): | Element | Convention | Good | Bad | |---|---|---|---| | Activity | **verb-object** (imperative + business object) | *Approve order* | *Order approval* (action-noun) · *Cost planning* (noun ambiguous) | | Event | **noun + past-participle** (state) | *Invoice emitted* · *Expenses approved* | *Approved* (no business object) | | (X)OR-split gateway | Avoid labelling gateway; use **explicit conditions** on outgoing arcs | branches: *plan acceptable* / *plan unacceptable* | branches: *yes* / *no* | Common label-quality problems (Fig 5.17 example): - Mixing verb-object and action-noun within one model (creates terminological inconsistency). - Gateway label disguising a missing decision activity ("Acceptable?" hides a "Check plan acceptability" activity). - End-event label without business object ("Approved" → should be "Expenses approved"). - Generic verbs: *to make*, *to do*, *to perform*, *to conduct* — replace with specific verbs. - Words like *process* / *order* used as both verb and noun within one model — pick one part of speech and stick to it. #### Pragmatic quality by design Pragmatic quality can be designed-in rather than retrofitted via: - **Block-structuring** ([[concepts/block-structure]]) — also gives soundness-by-construction. - **7PMG** ([[concepts/7pmg]]) — seven empirically-grounded guidelines. - **Decomposition** — sub-processes for portions exceeding 30–50 elements (G7). - **Layout discipline** — top-left to bottom-right, no crossings, alignment. ## Connections - Expands [[methods/process-discovery-methods]] with concrete interview-guide material. - [[concepts/process-discovery]] · [[concepts/process-model-quality]] - Contrast with automated discovery: [[methods/process-mining-basics]] - Meta source: [[sources/2018-dumas-fundamentals-of-bpm]]