--- title: Outcome (Goal) Prediction type: concept tags: [ppm, prediction, classification, outcome] sources: - "[[sources/2026-padella-llm-features-ppm]]" created: 2026-04-13 updated: 2026-05-11 --- # Outcome Prediction A [[concepts/predictive-process-monitoring|PPM]] task: given a prefix of an event trace, predict a **case-level outcome** — typically binary or small-multi-class. ## Typical targets - **SLA compliance** — will this case complete within the deadline? - **Business outcome** — will this loan be repaid / approved / defaulted? - **Positive / negative closure** — will this complaint be resolved favourably? - **Constraint violation** — will a DECLARE rule be violated? - **Activity occurrence** — will a specific (typically high-cost or high-rework) activity occur within the running case? Used as a classification KPI in [[sources/2026-padella-llm-features-ppm|Padella, de Leoni & Dumas 2026]] (`W_Nabellen incomplete dossiers` in BPI12, `Service Closure with BO Responsibility` in Bac, `LABORATORIO` in Hospital). ## Formulation - **Input:** prefix of a running case. - **Output:** categorical outcome `y ∈ {0, 1}` or `y ∈ {c₁, …, cₘ}`. - **Training data:** extract prefixes from completed cases, labelled with the known ground-truth outcome. ## Benchmark (Teinemaa et al. 2016+) A systematic literature review and benchmark by Teinemaa, Dumas, La Rosa, Maggi established the field's evaluation baseline — see [[sources/2016-teinemaa-outcome-ppm-review]]. ## Evaluation - **AUC-ROC / AUC-PR** — the standard in imbalanced settings. - **Accuracy / F1** — less informative under class imbalance. - **Earliness-accuracy trade-off** — a prediction earlier in the case is more actionable but harder; often plotted as accuracy vs prefix length. ## Philosophical caveats An outcome prediction is a probability — but *which* kind? Interpreted as frequentist it gives the population-average rate of the outcome for cases with similar prefixes; applied to an individual case this is the **[[concepts/rct-limitations|ecological fallacy]]**. Interpreted as credence it expresses the model's epistemic uncertainty (see [[concepts/aleatoric-vs-epistemic-uncertainty]]). Interpreted as propensity it claims an intrinsic tendency of the case itself ([[concepts/probabilistic-causation]]). The choice matters whenever a prediction drives an intervention — see [[sources/2023-anjum-rocca-phi403-lecture-19-what-rcts-do-not-show]] and [[sources/2023-anjum-rocca-phi403-lecture-18-risky-predictions]]. ## LLM-based outcome prediction [[sources/2026-padella-llm-features-ppm|Padella et al. 2026]] benchmark Gemini 2.5 Flash Thinking against CatBoost Classifier on the Activity Occurrence variant with 100 training traces. LLM (non-hashed) achieves F1 0.77 / 0.98 / 0.90 on BPI12 / Bac / Hospital — matching or exceeding CatBoost trained on the full event log. Performance degrades slightly under [[concepts/semantic-hashing-probe|semantic hashing]] (-2 % to -7 %), confirming partial reliance on embodied prior knowledge for classification too. ## Related [[concepts/predictive-process-monitoring]] · [[concepts/next-activity-prediction]] · [[concepts/remaining-time-prediction]] · [[concepts/trace-encoding]] · [[concepts/probabilistic-causation]] · [[concepts/llm-based-ppm]]