--- title: Systematic Literature Review (SLR) type: method tags: [literature-review, systematic-review, methodology, evidence-based] sources: ["[[sources/2007-kitchenham-slr-guidelines]]", "[[sources/2023-qureshi-chatgpt-sr-automation]]", "[[sources/2024-agarwal-litllms-are-we-there-yet]]", "[[sources/2024-dennstaedt-llm-title-abstract-screening]]", "[[sources/2025-scherbakov-llms-as-tools-literature-reviews]]"] created: 2026-04-20 updated: 2026-04-20 --- # Systematic Literature Review (SLR) Hub page for the **method of conducting systematic literature reviews**. Anchored on [[sources/2007-kitchenham-slr-guidelines|Kitchenham & Charters 2007]] for the procedure, with LLM-era amendments from the four 2023–2025 papers in the batch (see [[concepts/llm-assisted-literature-review]]). ## Definition An SLR is a secondary study that uses a **well-defined, pre-registered, auditable methodology** to identify, analyse and interpret all available evidence relevant to a specific research question. Distinguishing features versus a traditional narrative review: pre-specified protocol, inclusion/exclusion criteria, quality instrument, data extraction form, and reproducible synthesis. ## The three phases (Kitchenham 2007) ### 1. Planning - **Identify the need** — confirm no existing SLR already answers the question (search DARE and similar). - **Commission** (optional) — for contracted reviews; produce a commissioning document with question, scope, advisory group, budget, timetable. - **Specify the research question(s)** — the most important activity. Use **PICOC** (Population, Intervention, Comparison, Outcome, Context) per Petticrew and Roberts. Question must be meaningful to both practitioners and researchers. - **Develop the review protocol** — search strategy, inclusion/exclusion criteria, quality instrument, data-extraction form, synthesis plan, schedule. - **Evaluate the protocol** — ideally peer-reviewed by independent experts. ### 2. Conducting - **Identification of research** — multi-database search (automated + manual + citation chasing); address publication bias; use bibliography management; document the search trail for reproducibility. - **Study selection** — apply inclusion/exclusion criteria; inter-rater reliability for ambiguous cases (kappa reported). - **Study quality assessment** — use a graded instrument. Minimum (DARE): (i) criteria described and appropriate; (ii) search likely to cover all relevant studies; (iii) quality/validity assessed; (iv) basic data adequately described. - **Data extraction** — pre-designed form filled consistently; reliability checks. - **Data synthesis** — descriptive/narrative, quantitative (meta-analysis), qualitative, or mixed; include sensitivity analysis; address publication bias. ### 3. Reporting - **Dissemination strategy** — journal, tech report, poster, web, practitioner summary. - **Main report format** — abstract, background, questions, methods, included/excluded studies, results, discussion, limitations, conclusions, appendices. - **Evaluate the report** — external review against a quality checklist. ## Variants - **Systematic mapping study** (scoping study) — lighter-weight broad plot of the evidence landscape; precursor to an SLR. - **Tertiary review** — SLR of SLRs in a mature domain. ## Software-engineering adaptations (§3 of [[sources/2007-kitchenham-slr-guidelines|Kitchenham]]) SE is closer to social sciences (Budgen similarity 0.83 to education/nursing) than clinical medicine (0.17). Practical consequences: - Cannot restrict to RCTs — must aggregate heterogeneous study types. - Surrogate-measure risk (e.g. defects-in-testing as proxy for quality) must be explicit. - Small primary-study populations — avoid over-restricting Population until practical-implications stage. ## LLM-era amendments ### What LLMs can do now (empirically) - **Title/abstract screening**: Mixtral-8×7B achieves ~82% sensitivity / 75% specificity on biomedical SLRs ([[sources/2024-dennstaedt-llm-title-abstract-screening]]). Usable as a **first-pass filter** with a human second pass. - **Retrieval and related-work drafting**: plan-then-generate reduces hallucinated citations 18–26% ([[sources/2024-agarwal-litllms-are-we-there-yet]]). - **Data extraction**: GPT-4o reaches ~83% precision / 86% recall vs. expert gold ([[sources/2025-scherbakov-llms-as-tools-literature-reviews]]); categorical/textual extraction stronger than numeric. ### What LLMs cannot do reliably - **Factually grounded search-strategy construction** — fabricates MeSH/controlled vocabulary ([[sources/2023-qureshi-chatgpt-sr-automation]]). - **Verified citation of real sources** — hallucinates references without retrieval grounding. - **Reproducible output** — non-determinism breaks the reproducibility the Kitchenham protocol depends on; mitigate with fixed prompts, seeds where available, and majority voting across N runs (3 in Scherbakov). - **Unsupervised synthesis** — expert content review remains required. ### Integration pattern (Scherbakov-style) Covidence (or equivalent) + LLM plugin; 2 human reviewers calibrated on a subset → human consensus → LLM vote (N self-consistency runs, majority) → disagreements resolved by a senior reviewer. LLM as a **third reviewer**, not a solo reviewer. ## Related pages - [[concepts/llm-assisted-literature-review]] — shared concept tying the LLM-era papers. - [[entities/barbara-kitchenham]] — canonical author. - Existing SLRs in the wiki: [[sources/2019-verenich-survey-ppm]], [[sources/2020-rama-maneiro-deep-learning-ppm-review]], the benchmark cited in [[concepts/outcome-prediction]].