---
title: "Study sketch — Concept drift in LLM-agent trajectories"
type: synthesis
tags: [study-sketch, apm, concept-drift, observability, self-modification, riess]
sources:
  - "[[sources/2022-riess-metaheuristics-concept-drift-survey]]"
  - "[[sources/2025-fournier-agentic-ai-process-observability]]"
  - "[[sources/2025-elyasaf-self-modifying-abps]]"
  - "[[sources/2026-calvanese-agentic-bpm-manifesto]]"
  - "[[sources/2025-calvanese-autonomy-business-process-execution]]"
  - "[[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm]]"
  - "[[sources/2011-vanderaalst-process-mining-book]]"
  - "[[sources/2019-nolle-binet-anomaly]]"
  - "[[sources/2023-anjum-rocca-phi403-lecture-05-same-cause-same-effect]]"
created: 2026-04-20
updated: 2026-04-20
---

# Study sketch — Concept drift in LLM-agent trajectories

## Motivation and gap

[[sources/2022-riess-metaheuristics-concept-drift-survey|Riess 2022]] surveyed drift-adaptation in *process-oriented* machine learning, where drift sources are process- or environment-level. When the *agents themselves* become first-class citizens per the [[sources/2026-calvanese-agentic-bpm-manifesto|APM manifesto]], a new drift category emerges: **agent-induced drift** — LLM version changes, prompt regressions, tool-registration changes, upstream-data-source evolution. [[sources/2025-fournier-agentic-ai-process-observability|Fournier 2025]] operationalises agent trajectory observability but does not address drift. [[sources/2025-elyasaf-self-modifying-abps|Elyasaf 2025]] proposes [[concepts/mape-k-loop|MAPE-K]] as the adaptation mechanism but does not ground it in Riess's drift-detection machinery. [[sources/2023-anjum-rocca-phi403-lecture-05-same-cause-same-effect|PHI403 L5]] sharpens the philosophical stakes: if the same cause stops producing the same effect (agent version A and B respond differently to identical frames), the regularity theory of causation breaks down and we must choose: refit, re-frame, or accept irreducible variability. This is not solvable in-abstract; it requires empirical characterisation.

## Research questions

- **RQ1 (taxonomy).** What categories of drift manifest in LLM-agent trajectories beyond the process-oriented types catalogued in Riess 2022 — specifically: model-version drift, prompt-template drift, tool-availability drift, upstream-data drift?
- **RQ2 (detection).** Can [[concepts/conformance-checking|process-mining conformance-checking techniques]] ([[sources/2011-vanderaalst-process-mining-book|van der Aalst]]) adapted to agent-trajectory logs detect each drift type, and at what latency (cases-to-detection)?
- **RQ3 (adaptation).** Given a detected drift, which MAPE-K response — frame-tightening, prompt regeneration, model re-selection — restores frame-conformance fastest, and under which drift type?

## Hypotheses

- **H1.** Agent-induced drift has a distinct temporal signature — *discontinuous* (abrupt at version changes) — compared to process-oriented drift which is typically *gradual* (distribution shift over weeks/months).
- **H2.** Conformance-alignment-cost change-point detection ([[sources/2019-nolle-binet-anomaly|BINet-style]]) on agent trajectories detects discontinuous drift within ≤10 completed cases, while gradual drift requires ≥50 cases — quantifying a detection-cost trade-off absent from Riess 2022.
- **H3.** Frame-tightening ([[concepts/framed-autonomy|reducing autonomy level]] per [[concepts/abps-autonomy-levels|Elyasaf's roadmap]]) recovers frame-conformance faster than model re-selection under model-version drift, but slower under upstream-data drift where the frame itself is now mis-specified.
- **H4.** Drift adaptation policies drawn from Riess 2022's metaheuristics taxonomy (population-based re-selection over a set of frame variants) outperform single-variant adaptation in heterogeneous-drift regimes — generalising Riess 2022's "Full Model Selection" thesis to frames.

## Method

**Data sources (two-pronged).**
1. **Controlled — SynBPS-APM** ([[syntheses/study-sketch-synbps-apm|sibling study]]): inject each drift type at pre-registered time-points; measure detection latency and adaptation recovery per policy.
2. **Observational — Telenor operational log**: 6–12 months of agent trajectories from the [[sources/2025-riess-jorgensen-brage-benchmark-norwegian-llm|BRAGE-context]] customer-service pipeline, crossing at least one LLM-model version change and one known prompt-template revision.

**Drift-injection protocol (controlled arm).**
- **Model-version drift.** Mid-run swap between two LLMs of similar benchmark scores but different fine-tuning lineage.
- **Prompt-template drift.** Rewrite system prompt preserving semantic content; measure behavioural delta.
- **Tool-availability drift.** Remove/add one registered tool; agents must re-plan without the affected primitive.
- **Upstream-data drift.** Change distribution of input cases (new product categories per BRAGE's taxonomy shift).

**Detection instruments.**
- **Alignment cost** of observed trajectory vs. frame-conformant trajectory (standard process-mining).
- **Page-Hinkley** / **CUSUM** change-point on alignment-cost time-series.
- **Behavioural anomaly** per [[sources/2019-nolle-binet-anomaly|Nolle BINet]] adapted to agent-action sequences.
- **Observability metrics** from [[concepts/agent-process-observability|Fournier]] — reasoning-trace length distributions, tool-call patterns, frame-boundary proximity.

**Adaptation policies.**
- **Frame-tightening** — drop autonomy one Elyasaf-level; narrow allowed transitions.
- **Prompt regeneration** — LLM-based prompt-rewriter given the drift signature.
- **Model re-selection** — population-based choice (per Riess 2022) from a registered pool.
- **Null (no adaptation)** — control baseline.

**Outcome measures.** Detection latency (cases), recovery latency (cases from detection to restored conformance), total frame-violation cost during drift episode.

**Analysis.** Survival analysis on detection/recovery latencies; factorial drift-type × adaptation-policy ANOVA; qualitative case study on Telenor observational data.

## Validity threats

- **Confounding** between model-version change and upstream-data evolution in the observational arm — SynBPS-APM controlled experiments disambiguate.
- **Ecological validity** of injected drift — addressed by sequencing: hypotheses sharpened on controlled data, validated on Telenor log.
- **Ethical / IP constraints** on the Telenor arm — private data handled per BRAGE conventions (aggregated results public, raw trajectories private).

## Deliverables and venues

- **Paper.** Target: **BPM 2027 main or industry track** (real industry case lands better there than at a methods venue). Fallback: **ICPM** or **Decision Support Systems** (continuity with the Paper III DSS submission track).
- **Extension to SynBPS-APM** — drift-injection module as a companion to the base testbed.
- **Operational artefact.** If Telenor permits: an internal drift-dashboard prototype on [[entities/telenor|Telenor]]'s agent-monitoring stack.

## Connections

Closes the loop between [[syntheses/riess-research-arc|Riess research commitment #1]] (evaluation rigour) and the APM manifesto's [[concepts/self-modification|self-modification]] capability. Provides the drift-aware instantiation of [[concepts/mape-k-loop|MAPE-K]] that [[sources/2025-elyasaf-self-modifying-abps|Elyasaf 2025]] postulates without measurement. Feeds [[concepts/concept-drift]] and complements [[syntheses/study-sketch-synbps-apm|the SynBPS-APM study]] as its observational arm.