--- title: "Study sketch — SynBPS-APM: A controlled testbed for agentic BPM" type: synthesis tags: [study-sketch, apm, simulation, evaluation, synbps, riess] sources: - "[[sources/2024-riess-synbps-simulation-framework]]" - "[[sources/2026-calvanese-agentic-bpm-manifesto]]" - "[[sources/2025-calvanese-autonomy-business-process-execution]]" - "[[sources/2025-elyasaf-self-modifying-abps]]" - "[[sources/2025-fournier-agentic-ai-process-observability]]" - "[[sources/2024-xu-the-agent-company-benchmark]]" - "[[sources/2022-kubrak-prescriptive-ppm-slr]]" - "[[sources/2023-anjum-rocca-phi403-lecture-11-is-more-data-better]]" - "[[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show]]" created: 2026-04-20 updated: 2026-04-20 --- # Study sketch — SynBPS-APM: A controlled testbed for agentic BPM ## Motivation and gap The [[sources/2026-calvanese-agentic-bpm-manifesto|APM Manifesto]] defines four required capabilities — [[concepts/framed-autonomy]], [[concepts/explainability-apm]], [[concepts/conversational-actionability]], [[concepts/self-modification]] — but supplies *no evaluation methodology*. Existing agent benchmarks are generic and non-BPM ([[sources/2024-xu-the-agent-company-benchmark|TheAgentCompany]]), while BPM evaluation remains fixated on a small stable of public event logs (Sepsis, Helpdesk, BPIC) whose external validity is directly challenged by the Cartwright-Hardie argument in [[sources/2023-anjum-rocca-phi403-lecture-11-is-more-data-better|PHI403 L11]] and the RCT-limitations catalogue in [[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show|L19]]. [[sources/2024-riess-synbps-simulation-framework|SynBPS (Riess 2024)]] already solved this for single-model PPM research by introducing a parametric Markov-chain-based simulator where process characteristics are control variables. An extension to agent-driven processes is a natural next step and would make the framework the reference testbed for APM research. ## Research questions - **RQ1 (feasibility).** Can SynBPS be extended with LLM-agent decision hooks at branching activities + a frame specification DSL, while preserving its parametric-factorial design properties? - **RQ2 (capability mapping).** How does LLM-agent performance on the four APM capabilities vary under controlled manipulation of process variability ([[concepts/lasagna-spaghetti-processes|memory order, transition entropy]]) and autonomy level ([[concepts/abps-autonomy-levels|Elyasaf's 5-level roadmap]])? - **RQ3 (failure-mode discovery).** Which process characteristics reveal APM agent failure modes that single-log benchmarks obscure — particularly under drift conditions drawn from [[sources/2022-riess-metaheuristics-concept-drift-survey|Riess 2022]]'s taxonomy? ## Hypotheses - **H1.** Agent-driven process completion rate decreases with increasing transition entropy (holding frame constant), but the slope differs significantly across autonomy levels — operationalising the autonomy-predictability trade-off the APM manifesto asserts but does not measure. - **H2.** High-autonomy agents (Elyasaf levels 4–5) exhibit larger between-run variance on identical process configurations than low-autonomy agents ([[concepts/behavioral-variability|behavioural variability]]) — quantifying a phenomenon named in [[sources/2025-fournier-agentic-ai-process-observability|Fournier 2025]]. - **H3.** Drift-injection (sudden transition-probability shifts mid-run) disproportionately degrades agents without [[concepts/self-modification|self-modification]] capability — extending Riess 2022's drift claim from predictive models to agentic execution. - **H4.** [[concepts/framed-autonomy|Framed autonomy]] (Calvanese 2025's normative frames) measurably constrains failure modes compared to unframed agents on the same process — supplying the first empirical check on a capability the manifesto treats as postulate. ## Method **Architecture.** SynBPS-APM adds three components to the existing SynBPS Markov backbone: 1. **Agent-hook interface** at branching activities — when case reaches a decision node, a registered LLM agent receives the current trajectory + frame and returns a next-activity choice. 2. **Frame specification language** — JSON/DSL expressing allowed transitions, goal conditions, and [[concepts/normative-frame|normative constraints]] (drawn from the [[sources/2025-calvanese-autonomy-business-process-execution|Calvanese 2025 frame taxonomy]]). 3. **Trajectory-scoped logger** instrumented per [[concepts/agent-process-observability|Fournier's observability requirements]] — perceive / reason / act tuples captured alongside the standard event log. **Experimental design.** Factorial: 4 transition-entropy levels × 3 autonomy levels ([[concepts/abps-autonomy-levels|Elyasaf L2/L3/L4]]) × {stationary, drifting} × 3 LLM backbones (per [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm|BRAGE-style model selection]]) = 72 cells × N ≥ 30 replications. **Measurements.** Per case: (a) goal attainment (binary), (b) frame-conformance (count of violations), (c) cycle time, (d) MAPE-K activations ([[concepts/mape-k-loop|Elyasaf]]), (e) [[concepts/behavioral-variability|run-to-run variance]]. Per batch: [[sources/2022-kubrak-prescriptive-ppm-slr|Kubrak's six PrPM dimensions]] scored. **Baselines.** Oracle agent (knows transition probabilities); rule-based agent (deterministic policy); workflow-only execution (no agent). These triangulate the agent's marginal contribution. **Analysis.** Mixed-effects ANOVA with random effect on replication seed; simple-slopes for H1; Brown-Forsythe on H2; change-point detection for H3 drift response; pre-registered per [[methods/systematic-literature-review|Kitchenham-style]] protocol to avoid post-hoc slicing. ## Validity threats and responses - **Ecological validity** (per [[concepts/rct-limitations|L19]]): Markov backbone cannot model concurrency or resources — acknowledged as in [[sources/2024-riess-synbps-simulation-framework|SynBPS 2024]]. Complement with one calibrated industry log for cross-check. - **Construct validity** of "autonomy level": operationalised via prompt-scaffolding rigour, not LLM capability alone. Pre-registered operational definitions. - **External validity** of LLM choice: 3 models is thin; mitigate by making the harness model-agnostic and releasing benchmark so others can add models. ## Deliverables and venues - **Software.** `SynBPS-APM` as a PyPI package + GitHub extension of the existing [[entities/mike-riess|mikeriess/SynBPS]] repo. Apache-2.0 or MIT. Benchmark suite + agent harness separately. - **Paper.** Target primary venue: **BPM 2027 main track** (method + evaluation infrastructure fits). Fallback: **SIMULATION (SCS)** for methodology emphasis (continuity with 2024 paper). Workshop first venue if time-pressured: **PMAI'26 / AutoBiz** — where the APM manifesto co-authors gather. ## Connections Extends [[syntheses/riess-research-arc|the Riess research arc]] capability #2 (control-variable manipulation via simulation) to APM. Supplies the missing evaluation layer to [[syntheses/abpms-to-apm-evolution]]. Feeds [[concepts/agentic-bpm]] and [[concepts/agent-process-observability]].