--- title: "SynBPS: A Parametric Simulation Framework for the Generation of Event-Log Data" type: source tags: [simulation, synthetic-data, event-log, ppm, evaluation, markov-chain, python, framework] authors: [Riess, Mike] year: 2024 venue: "SIMULATION: Transactions of the Society for Modeling and Simulation International, SAGE. DOI: 10.1177/00375497241233326" kind: paper raw_path: "raw/Riess/Riess 2024.pdf" sources: ["[[sources/2023-riess-phd-thesis-ppm]]"] key_claims: - "Existing open-source Business Process Simulation (BPS) frameworks optimise for ecological validity via calibration to real event logs — this makes them ill-suited for controlled-variable hypothesis testing in predictive process monitoring, where the researcher needs to manipulate data-generating-process factors (complexity, memory, stability) independently." - "SynBPS is a purely parametric Python framework based on Markov chains (of configurable order k) for control flow, Poisson-process arrivals, and exponential/hypoexponential activity-duration distributions." - "User-controlled simulation levers: process memory (Markov order), state-space size, transition entropy (minimum/medium/maximum), activity duration distributions, case arrival rate, process stability (concept-drift injection)." - "Framework is designed for academic audiences needing synthetic benchmarks to decompose confounds in PPM evaluation — not for integration with existing processes or information systems." - "Limitation: Markov-chain control flow cannot directly represent concurrent/parallel activities without exponentially expanding state-space; prescriptive-process-monitoring use cases needing explicit resource modelling are better served by agent-based calibrated tools (López-Pintado & Dumas 2022)." - "Open-source, pip-installable as `SynBPS` on PyPI; code on GitHub (mikeriess/SynBPS; demonstration at mikeriess/SBPS_results)." - "Methodological motivation cited from machine-learning tradition (Lasso, LARS, elastic net) of evaluating methods on controlled synthetic data." created: 2026-04-20 updated: 2026-04-20 --- # Riess 2024 — SynBPS: Parametric Simulation Framework for Event-Log Data Single-authored article in SIMULATION (SCS). Corresponds to Paper II of [[sources/2023-riess-phd-thesis-ppm|Riess's PhD thesis]]. Authored during his transition from [[entities/nmbu|NMBU]] to [[entities/telenor|Telenor Research]]. ## Summary The paper opens with a methodological critique: virtually all published PPM models are evaluated on a small set of public event logs (BPIC, Sepsis, Helpdesk, etc.). Each event log represents **one process** from **one organisation**, so a common PPM evaluation benchmark is effectively sampling size N=3–9 process variants — and the research-question logic is almost always to prove that a new method beats a baseline on these specific logs. When external validity is the goal, this is insufficient. Worse, when the researcher wants to understand *which* data-generating-process characteristics drive performance differences (e.g., [[sources/2017-tax-lstm-process-prediction|Tax et al. 2017]] attributing a bad result on one log to repeating activities), only qualitative post-hoc coding is possible; control variables and interaction testing are not. Two research questions: - **RQ1** — to what extent do current BPS frameworks support model-robustness assessment in PPM? - **RQ2** — how can their limitations be overcome by a new framework? RQ1 is answered via a targeted review of open-source BPS tools (Camargo et al. Simod; López-Pintado & Dumas; Pourbafrani et al. SimPT/PMSD; Fracca et al. bpsimpy; Peeperkorn et al.; Grüger et al. SAMPLe; Arena; CPN Tools). Common denominator: these tools optimise for **ecological validity** (realistic event logs for what-if analysis of *specific* processes) — their control flows are learned or hand-specified from observed behaviour. They are not designed for the opposite problem: generating synthetic event logs from **theoretical** data-generating processes with independent manipulation of complexity, memory, stability. RQ2 introduces **SynBPS**. Design choices: - **Control flow** = discrete-time Markov chain, order *k* configurable (k=1 memoryless, higher-order with D × D^k transition tensor). Absorbing state terminates the trace. - **Entropy control** over the transition matrix: minimum (deterministic), maximum (uniform/random), medium (n transitions across state space of size d). - **Arrivals** = Poisson process. - **Activity durations** = exponential / hypoexponential distributions (chosen for analytic tractability of trace durations). - **Stability**: process parameters can be drifted deterministically or stochastically to generate [[concepts/concept-drift|concept-drift]]-exposed logs. - Output = standard event-log tables (case ID, activity, timestamp, resource, attributes). The paper demonstrates generation of 2304 unique synthetic event logs across manipulated parameters; three representative control flows (minimum, medium, maximum entropy) are inspected via inductive miner + [[methods/process-mining-basics|PM4Py]]. Acknowledged limitations: Markov chains do not natively model parallel/concurrent activities (requires state-space explosion); resource behaviour and interruptions cannot be modelled explicitly (they are subsumed into duration distributions), which makes SynBPS less suitable for prescriptive-process-monitoring research; duration distributions are restricted to exponential/hypoexponential in the initial release. Future directions: conditional transition matrices (attribute-dependent routing), time-dependent rate parameters for duration distributions, mixture distributions. ## Connections - Paper II of [[sources/2023-riess-phd-thesis-ppm]]. - Establishes [[concepts/business-process-simulation]] as a PPM-evaluation methodology — complementing, not replacing, calibrated BPS tools. - Explicitly contrasts with BPS tools reviewed: Camargo et al. Simod (see also [[sources/2021-dumas-process-mining-2-from-insights-to-action]]), López-Pintado & Dumas, Pourbafrani et al., [[sources/2023-chapela-campa-augmented-process-execution|Chapela-Campa & Dumas]] family. - Cites [[sources/2017-tax-lstm-process-prediction]], [[sources/2017-navarin-lstm-data-aware-remaining-time]], [[sources/2016-teinemaa-structured-unstructured-ppm]], [[sources/2019-verenich-survey-ppm]], [[sources/2011-vanderaalst-process-mining-book]]. - Concepts: [[concepts/business-process-simulation]], [[concepts/predictive-process-monitoring]], [[concepts/behavioral-variability]]. - Methods: [[methods/process-simulation]], [[methods/process-mining-basics]]. - Author: [[entities/mike-riess]]; affiliation [[entities/telenor]] (primary) + [[entities/nmbu]] (secondary).