--- title: "Study sketch — Concept drift in LLM-agent trajectories" type: synthesis tags: [study-sketch, apm, concept-drift, observability, self-modification, riess] sources: - "[[sources/2022-riess-metaheuristics-concept-drift-survey]]" - "[[sources/2025-fournier-agentic-ai-process-observability]]" - "[[sources/2025-elyasaf-self-modifying-abps]]" - "[[sources/2026-calvanese-agentic-bpm-manifesto]]" - "[[sources/2025-calvanese-autonomy-business-process-execution]]" - "[[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]]" - "[[sources/2011-vanderaalst-process-mining-book]]" - "[[sources/2019-nolle-binet-anomaly]]" - "[[sources/2023-anjum-rocca-phi403-lecture-05-same-cause-same-effect]]" created: 2026-04-20 updated: 2026-04-20 --- # Study sketch — Concept drift in LLM-agent trajectories ## Motivation and gap [[sources/2022-riess-metaheuristics-concept-drift-survey|Riess 2022]] surveyed drift-adaptation in *process-oriented* machine learning, where drift sources are process- or environment-level. When the *agents themselves* become first-class citizens per the [[sources/2026-calvanese-agentic-bpm-manifesto|APM manifesto]], a new drift category emerges: **agent-induced drift** — LLM version changes, prompt regressions, tool-registration changes, upstream-data-source evolution. [[sources/2025-fournier-agentic-ai-process-observability|Fournier 2025]] operationalises agent trajectory observability but does not address drift. [[sources/2025-elyasaf-self-modifying-abps|Elyasaf 2025]] proposes [[concepts/mape-k-loop|MAPE-K]] as the adaptation mechanism but does not ground it in Riess's drift-detection machinery. [[sources/2023-anjum-rocca-phi403-lecture-05-same-cause-same-effect|PHI403 L5]] sharpens the philosophical stakes: if the same cause stops producing the same effect (agent version A and B respond differently to identical frames), the regularity theory of causation breaks down and we must choose: refit, re-frame, or accept irreducible variability. This is not solvable in-abstract; it requires empirical characterisation. ## Research questions - **RQ1 (taxonomy).** What categories of drift manifest in LLM-agent trajectories beyond the process-oriented types catalogued in Riess 2022 — specifically: model-version drift, prompt-template drift, tool-availability drift, upstream-data drift? - **RQ2 (detection).** Can [[concepts/conformance-checking|process-mining conformance-checking techniques]] ([[sources/2011-vanderaalst-process-mining-book|van der Aalst]]) adapted to agent-trajectory logs detect each drift type, and at what latency (cases-to-detection)? - **RQ3 (adaptation).** Given a detected drift, which MAPE-K response — frame-tightening, prompt regeneration, model re-selection — restores frame-conformance fastest, and under which drift type? ## Hypotheses - **H1.** Agent-induced drift has a distinct temporal signature — *discontinuous* (abrupt at version changes) — compared to process-oriented drift which is typically *gradual* (distribution shift over weeks/months). - **H2.** Conformance-alignment-cost change-point detection ([[sources/2019-nolle-binet-anomaly|BINet-style]]) on agent trajectories detects discontinuous drift within ≤10 completed cases, while gradual drift requires ≥50 cases — quantifying a detection-cost trade-off absent from Riess 2022. - **H3.** Frame-tightening ([[concepts/framed-autonomy|reducing autonomy level]] per [[concepts/abps-autonomy-levels|Elyasaf's roadmap]]) recovers frame-conformance faster than model re-selection under model-version drift, but slower under upstream-data drift where the frame itself is now mis-specified. - **H4.** Drift adaptation policies drawn from Riess 2022's metaheuristics taxonomy (population-based re-selection over a set of frame variants) outperform single-variant adaptation in heterogeneous-drift regimes — generalising Riess 2022's "Full Model Selection" thesis to frames. ## Method **Data sources (two-pronged).** 1. **Controlled — SynBPS-APM** ([[syntheses/study-sketch-synbps-apm|sibling study]]): inject each drift type at pre-registered time-points; measure detection latency and adaptation recovery per policy. 2. **Observational — Telenor operational log**: 6–12 months of agent trajectories from the [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm|BRAGE-context]] customer-service pipeline, crossing at least one LLM-model version change and one known prompt-template revision. **Drift-injection protocol (controlled arm).** - **Model-version drift.** Mid-run swap between two LLMs of similar benchmark scores but different fine-tuning lineage. - **Prompt-template drift.** Rewrite system prompt preserving semantic content; measure behavioural delta. - **Tool-availability drift.** Remove/add one registered tool; agents must re-plan without the affected primitive. - **Upstream-data drift.** Change distribution of input cases (new product categories per BRAGE's taxonomy shift). **Detection instruments.** - **Alignment cost** of observed trajectory vs. frame-conformant trajectory (standard process-mining). - **Page-Hinkley** / **CUSUM** change-point on alignment-cost time-series. - **Behavioural anomaly** per [[sources/2019-nolle-binet-anomaly|Nolle BINet]] adapted to agent-action sequences. - **Observability metrics** from [[concepts/agent-process-observability|Fournier]] — reasoning-trace length distributions, tool-call patterns, frame-boundary proximity. **Adaptation policies.** - **Frame-tightening** — drop autonomy one Elyasaf-level; narrow allowed transitions. - **Prompt regeneration** — LLM-based prompt-rewriter given the drift signature. - **Model re-selection** — population-based choice (per Riess 2022) from a registered pool. - **Null (no adaptation)** — control baseline. **Outcome measures.** Detection latency (cases), recovery latency (cases from detection to restored conformance), total frame-violation cost during drift episode. **Analysis.** Survival analysis on detection/recovery latencies; factorial drift-type × adaptation-policy ANOVA; qualitative case study on Telenor observational data. ## Validity threats - **Confounding** between model-version change and upstream-data evolution in the observational arm — SynBPS-APM controlled experiments disambiguate. - **Ecological validity** of injected drift — addressed by sequencing: hypotheses sharpened on controlled data, validated on Telenor log. - **Ethical / IP constraints** on the Telenor arm — private data handled per BRAGE conventions (aggregated results public, raw trajectories private). ## Deliverables and venues - **Paper.** Target: **BPM 2027 main or industry track** (real industry case lands better there than at a methods venue). Fallback: **ICPM** or **Decision Support Systems** (continuity with the Paper III DSS submission track). - **Extension to SynBPS-APM** — drift-injection module as a companion to the base testbed. - **Operational artefact.** If Telenor permits: an internal drift-dashboard prototype on [[entities/telenor|Telenor]]'s agent-monitoring stack. ## Connections Closes the loop between [[syntheses/riess-research-arc|Riess research commitment #1]] (evaluation rigour) and the APM manifesto's [[concepts/self-modification|self-modification]] capability. Provides the drift-aware instantiation of [[concepts/mape-k-loop|MAPE-K]] that [[sources/2025-elyasaf-self-modifying-abps|Elyasaf 2025]] postulates without measurement. Feeds [[concepts/concept-drift]] and complements [[syntheses/study-sketch-synbps-apm|the SynBPS-APM study]] as its observational arm.