--- title: "β-learner distillation" type: concept tags: [llm-ppm, explainability, interpretability, reasoning-distillation, ppm-methodology] sources: - "[[sources/2026-padella-llm-features-ppm]]" created: 2026-05-11 updated: 2026-05-11 --- # β-learner distillation A methodology introduced by [[sources/2026-padella-llm-features-ppm|Padella, de Leoni & Dumas 2026]] for **distilling the reasoning patterns of an LLM-based predictor into a catalogue of interpretable standalone models**, then re-implementing each catalogued pattern and benchmarking it against the LLM to test whether the LLM merely *replicates* a known prediction strategy or performs *higher-order* reasoning across multiple strategies. ## Procedure 1. **Collect reasoning traces**. Run the LLM-based PPM predictor on a sample of running traces (Padella et al. used 50 per dataset × KPI = 150 per KPI), retaining the structured reasoning output alongside each predicted value. 2. **Manually code recurring patterns**. Following the visualisation methodology of "Landscape of Thoughts" (Zhou et al. arXiv:2503.22165), catalogue families of reasoning patterns. Padella et al. identified 4 families × 3 aggregations = 12 patterns for the Total Time regression KPI, and 4 patterns for the Activity Occurrence classification KPI. 3. **Re-implement each pattern as a standalone β-learner**. A mathematically-defined, reproducible model (typically classical: k-nearest-neighbours, simple aggregation rule, decision tree, etc.) that mimics the documented reasoning strategy. 4. **Benchmark β-learners against the LLM** on a held-out test set using paired statistical tests (Padella et al. used Wilcoxon signed-rank, α = 0.05). 5. **Coverage estimation**. Use Good-Turing frequency smoothing to estimate P(novel β-learner | new trace) and confirm the catalogue is saturating. ## The β-learner catalogue (Padella et al. 2026) **For Total Time prediction:** | Family | Aggregation | Intuition | |---|---|---| | `knn-act` | mean / median / mode | k-NN on activity-sequence representations | | `knn-att` | mean / median / mode | k-NN on attribute-vector representations | | `time-seq` | mean / median / mode | Temporal-sequence aggregation of similar traces | | `path-pred` | mean / median / mode | Future-path prediction with aggregated outcomes | **For Activity Occurrence (classification):** | Family | Intuition | |---|---| | Activity-Based | Inference from observed activity sequence patterns | | State-Based | Inference from the last observed event/state | | Att-Based | Inference from trace attribute values | | Positive Evidence | Check whether the target activity already occurred | ## Why it matters Beyond a domain-specific finding, β-learner distillation is a **general explainability protocol** for opaque PPM predictors: - It gives an interpretable post-hoc *vocabulary* for what the predictor is doing — useful for the [[concepts/explainability-apm|APM explainability]] requirement. - It enables a *quantified test* of whether the LLM (or any opaque predictor) is doing something more than recapitulating a known method — by re-implementing β-learners as competing predictors and comparing. - It can in principle be applied to non-LLM PPM models (CatBoost, PGTNet, attention models) — though derivation of reasoning patterns from non-text outputs is harder. ## Empirical headline from Padella et al. The LLM **outperforms every individual β-learner** by 6–80 % in MAE / F1 with Wilcoxon significance. Good-Turing analysis confirms ~0 novel β-learners are expected after 100 additional traces (m=100, P₀ < 0.014 across all use cases × KPIs). Interpretation: the LLM is *combining* and *aggregating* β-learners adaptively per-trace, not picking one. ## Limitations - **Manual coding** — Padella et al. distilled patterns by hand. No automated rule mining or LLM-aided distillation tested. Scalability to many KPIs is open. - **Catalogue completeness** depends on the diversity of reasoning traces sampled — Good-Turing offers a coverage argument but not a completeness guarantee. - **Pattern-implementation fidelity** — re-implementations as classical models may degrade vs. the spirit of the LLM's reasoning; gaps then look like "higher-order reasoning" when they may be implementation artifact. - **Domain transferability** — patterns derived for BPI12 / Bac / Hospital may not generalise to other process domains. ## Related [[concepts/llm-based-ppm]] · [[concepts/semantic-hashing-probe]] · [[concepts/explainability-apm]] · [[concepts/predictive-process-monitoring]] · [[sources/2026-padella-llm-features-ppm]]