---
title: Systematic Literature Review (SLR)
type: method
tags: [literature-review, systematic-review, methodology, evidence-based]
sources: ["[[sources/2007-kitchenham-slr-guidelines]]", "[[sources/2023-qureshi-chatgpt-sr-automation]]", "[[sources/2024-agarwal-litllms-are-we-there-yet]]", "[[sources/2024-dennstaedt-llm-title-abstract-screening]]", "[[sources/2025-scherbakov-llms-as-tools-literature-reviews]]"]
created: 2026-04-20
updated: 2026-04-20
---

# Systematic Literature Review (SLR)

Hub page for the **method of conducting systematic literature reviews**. Anchored on [[sources/2007-kitchenham-slr-guidelines|Kitchenham & Charters 2007]] for the procedure, with LLM-era amendments from the four 2023–2025 papers in the batch (see [[concepts/llm-assisted-literature-review]]).

## Definition

An SLR is a secondary study that uses a **well-defined, pre-registered, auditable methodology** to identify, analyse and interpret all available evidence relevant to a specific research question. Distinguishing features versus a traditional narrative review: pre-specified protocol, inclusion/exclusion criteria, quality instrument, data extraction form, and reproducible synthesis.

## The three phases (Kitchenham 2007)

### 1. Planning
- **Identify the need** — confirm no existing SLR already answers the question (search DARE and similar).
- **Commission** (optional) — for contracted reviews; produce a commissioning document with question, scope, advisory group, budget, timetable.
- **Specify the research question(s)** — the most important activity. Use **PICOC** (Population, Intervention, Comparison, Outcome, Context) per Petticrew and Roberts. Question must be meaningful to both practitioners and researchers.
- **Develop the review protocol** — search strategy, inclusion/exclusion criteria, quality instrument, data-extraction form, synthesis plan, schedule.
- **Evaluate the protocol** — ideally peer-reviewed by independent experts.

### 2. Conducting
- **Identification of research** — multi-database search (automated + manual + citation chasing); address publication bias; use bibliography management; document the search trail for reproducibility.
- **Study selection** — apply inclusion/exclusion criteria; inter-rater reliability for ambiguous cases (kappa reported).
- **Study quality assessment** — use a graded instrument. Minimum (DARE): (i) criteria described and appropriate; (ii) search likely to cover all relevant studies; (iii) quality/validity assessed; (iv) basic data adequately described.
- **Data extraction** — pre-designed form filled consistently; reliability checks.
- **Data synthesis** — descriptive/narrative, quantitative (meta-analysis), qualitative, or mixed; include sensitivity analysis; address publication bias.

### 3. Reporting
- **Dissemination strategy** — journal, tech report, poster, web, practitioner summary.
- **Main report format** — abstract, background, questions, methods, included/excluded studies, results, discussion, limitations, conclusions, appendices.
- **Evaluate the report** — external review against a quality checklist.

## Variants
- **Systematic mapping study** (scoping study) — lighter-weight broad plot of the evidence landscape; precursor to an SLR.
- **Tertiary review** — SLR of SLRs in a mature domain.

## Software-engineering adaptations (§3 of [[sources/2007-kitchenham-slr-guidelines|Kitchenham]])
SE is closer to social sciences (Budgen similarity 0.83 to education/nursing) than clinical medicine (0.17). Practical consequences:
- Cannot restrict to RCTs — must aggregate heterogeneous study types.
- Surrogate-measure risk (e.g. defects-in-testing as proxy for quality) must be explicit.
- Small primary-study populations — avoid over-restricting Population until practical-implications stage.

## LLM-era amendments

### What LLMs can do now (empirically)
- **Title/abstract screening**: Mixtral-8×7B achieves ~82% sensitivity / 75% specificity on biomedical SLRs ([[sources/2024-dennstaedt-llm-title-abstract-screening]]). Usable as a **first-pass filter** with a human second pass.
- **Retrieval and related-work drafting**: plan-then-generate reduces hallucinated citations 18–26% ([[sources/2024-agarwal-litllms-are-we-there-yet]]).
- **Data extraction**: GPT-4o reaches ~83% precision / 86% recall vs. expert gold ([[sources/2025-scherbakov-llms-as-tools-literature-reviews]]); categorical/textual extraction stronger than numeric.

### What LLMs cannot do reliably
- **Factually grounded search-strategy construction** — fabricates MeSH/controlled vocabulary ([[sources/2023-qureshi-chatgpt-sr-automation]]).
- **Verified citation of real sources** — hallucinates references without retrieval grounding.
- **Reproducible output** — non-determinism breaks the reproducibility the Kitchenham protocol depends on; mitigate with fixed prompts, seeds where available, and majority voting across N runs (3 in Scherbakov).
- **Unsupervised synthesis** — expert content review remains required.

### Integration pattern (Scherbakov-style)
Covidence (or equivalent) + LLM plugin; 2 human reviewers calibrated on a subset → human consensus → LLM vote (N self-consistency runs, majority) → disagreements resolved by a senior reviewer. LLM as a **third reviewer**, not a solo reviewer.

## Related pages
- [[concepts/llm-assisted-literature-review]] — shared concept tying the LLM-era papers.
- [[entities/barbara-kitchenham]] — canonical author.
- Existing SLRs in the wiki: [[sources/2019-verenich-survey-ppm]], [[sources/2020-rama-maneiro-deep-learning-ppm-review]], the benchmark cited in [[concepts/outcome-prediction]].