---
title: "β-learner distillation"
type: concept
tags: [llm-ppm, explainability, interpretability, reasoning-distillation, ppm-methodology]
sources:
  - "[[sources/2026-padella-llm-features-ppm]]"
created: 2026-05-11
updated: 2026-05-11
---

# β-learner distillation

A methodology introduced by [[sources/2026-padella-llm-features-ppm|Padella, de Leoni & Dumas 2026]] for **distilling the reasoning patterns of an LLM-based predictor into a catalogue of interpretable standalone models**, then re-implementing each catalogued pattern and benchmarking it against the LLM to test whether the LLM merely *replicates* a known prediction strategy or performs *higher-order* reasoning across multiple strategies.

## Procedure

1. **Collect reasoning traces**. Run the LLM-based PPM predictor on a sample of running traces (Padella et al. used 50 per dataset × KPI = 150 per KPI), retaining the structured reasoning output alongside each predicted value.
2. **Manually code recurring patterns**. Following the visualisation methodology of "Landscape of Thoughts" (Zhou et al. arXiv:2503.22165), catalogue families of reasoning patterns. Padella et al. identified 4 families × 3 aggregations = 12 patterns for the Total Time regression KPI, and 4 patterns for the Activity Occurrence classification KPI.
3. **Re-implement each pattern as a standalone β-learner**. A mathematically-defined, reproducible model (typically classical: k-nearest-neighbours, simple aggregation rule, decision tree, etc.) that mimics the documented reasoning strategy.
4. **Benchmark β-learners against the LLM** on a held-out test set using paired statistical tests (Padella et al. used Wilcoxon signed-rank, α = 0.05).
5. **Coverage estimation**. Use Good-Turing frequency smoothing to estimate P(novel β-learner | new trace) and confirm the catalogue is saturating.

## The β-learner catalogue (Padella et al. 2026)

**For Total Time prediction:**

| Family | Aggregation | Intuition |
|---|---|---|
| `knn-act` | mean / median / mode | k-NN on activity-sequence representations |
| `knn-att` | mean / median / mode | k-NN on attribute-vector representations |
| `time-seq` | mean / median / mode | Temporal-sequence aggregation of similar traces |
| `path-pred` | mean / median / mode | Future-path prediction with aggregated outcomes |

**For Activity Occurrence (classification):**

| Family | Intuition |
|---|---|
| Activity-Based | Inference from observed activity sequence patterns |
| State-Based | Inference from the last observed event/state |
| Att-Based | Inference from trace attribute values |
| Positive Evidence | Check whether the target activity already occurred |

## Why it matters

Beyond a domain-specific finding, β-learner distillation is a **general explainability protocol** for opaque PPM predictors:

- It gives an interpretable post-hoc *vocabulary* for what the predictor is doing — useful for the [[concepts/explainability-apm|APM explainability]] requirement.
- It enables a *quantified test* of whether the LLM (or any opaque predictor) is doing something more than recapitulating a known method — by re-implementing β-learners as competing predictors and comparing.
- It can in principle be applied to non-LLM PPM models (CatBoost, PGTNet, attention models) — though derivation of reasoning patterns from non-text outputs is harder.

## Empirical headline from Padella et al.

The LLM **outperforms every individual β-learner** by 6–80 % in MAE / F1 with Wilcoxon significance. Good-Turing analysis confirms ~0 novel β-learners are expected after 100 additional traces (m=100, P₀ < 0.014 across all use cases × KPIs). Interpretation: the LLM is *combining* and *aggregating* β-learners adaptively per-trace, not picking one.

## Limitations

- **Manual coding** — Padella et al. distilled patterns by hand. No automated rule mining or LLM-aided distillation tested. Scalability to many KPIs is open.
- **Catalogue completeness** depends on the diversity of reasoning traces sampled — Good-Turing offers a coverage argument but not a completeness guarantee.
- **Pattern-implementation fidelity** — re-implementations as classical models may degrade vs. the spirit of the LLM's reasoning; gaps then look like "higher-order reasoning" when they may be implementation artifact.
- **Domain transferability** — patterns derived for BPI12 / Bac / Hospital may not generalise to other process domains.

## Related

[[concepts/llm-based-ppm]] · [[concepts/semantic-hashing-probe]] · [[concepts/explainability-apm]] · [[concepts/predictive-process-monitoring]] · [[sources/2026-padella-llm-features-ppm]]