All articles
Meta-Analysis May 18, 2026 5 min read

How to Conduct a Systematic Review in 2026: Step-by-Step

A complete walkthrough of every stage — from defining your PICO question and registering your protocol to searching 9 databases, dual-reviewer screening, meta-analysis, GRADE, and PRISMA 2020 reporting.

N
Naeem Ur Rehman
Published May 18, 2026
01

Define your research question with PICO

Every systematic review begins with a clearly defined, structured research question. The most widely used framework is PICO — Population, Intervention, Comparison, and Outcome. Getting this right before you search is non-negotiable: your PICO determines your search strategy, your inclusion criteria, and ultimately the scope of your entire review.

PICO elementWhat it definesExample (hypertension review)
Population (P)Who the evidence applies to — age, condition, settingAdults aged 40+ with essential hypertension
Intervention (I)The exposure, treatment, or test under investigationACE inhibitor monotherapy
Comparison (C)The control condition or alternative treatmentCalcium channel blocker or placebo
Outcome (O)The measured endpoint — primary and secondarySystolic blood pressure reduction, cardiovascular events

For diagnostic accuracy reviews, use PICO-D (adding the Diagnosis element). For qualitative or scoping reviews, the PCC framework (Population, Concept, Context) may be more appropriate.

💡
Common mistake: scoping too broadly

A PICO like "any intervention for any cardiovascular outcome in any adult" will return tens of thousands of records. Be specific enough that your review is answerable. You can always run separate reviews for different subpopulations or interventions.

02

Write and register your protocol

Before searching a single database, write a full protocol — including your PICO, eligibility criteria, search strategy, data extraction plan, risk of bias approach, and statistical analysis plan. Then register it publicly.

Why register? Pre-registration prevents outcome reporting bias, protects you from accusations of post-hoc analysis changes, and is increasingly required by high-impact journals. PRISMA 2020 explicitly requires you to report your registration number.

Where to register
  • PROSPERO — the international prospective register for systematic reviews (most widely recognised)
  • OSF (Open Science Framework) — broader scope, accepts all review types
  • Cochrane — if conducting a Cochrane systematic review
03

Build your search strategy

A comprehensive search strategy is what separates a systematic review from a narrative review. You need to capture every eligible study — including unpublished studies, grey literature, and studies in languages other than English — to minimise publication bias.

Which databases to search

PRISMA 2020 recommends searching all relevant databases for your field. For clinical and health research, the minimum is:

1
PubMed / MEDLINE Essential
The primary database for biomedical literature. Free via NCBI Entrez API. Over 36 million citations.
2
Scopus Essential
Elsevier's multidisciplinary database. Broader than PubMed, covering engineering, social sciences, and arts.
3
Europe PMC
Covers life sciences literature including preprints. Particularly strong for funded UK and EU research.
4
OpenAlex
Open-source alternative to Web of Science. Over 250 million works, fully open API, no subscription required.
5
CrossRef
The DOI registration agency. Excellent for finding grey literature and preprints across all disciplines.
6
Cochrane CENTRAL
Specialised trials register. Best source for randomised controlled trial records, especially older ones.
⚠️
Single-database searches miss 15–40% of eligible studies

Research consistently shows that no single database contains all relevant literature. Searching PubMed alone is sufficient only for very narrow, PubMed-indexed topics — and even then, you risk missing trials published in non-indexed journals or available only as grey literature.

How to build the search string

Your search string translates the PICO into database-specific syntax using Medical Subject Headings (MeSH) for PubMed, Emtree for Embase, and free-text synonyms for all databases. A well-constructed search combines:

04

Deduplicate and screen: title/abstract stage

After running your searches across all databases, you will have a pool of records that contains significant overlap — the same study indexed in five databases counts as five records. Deduplication is the process of removing these duplicate records before screening begins.

In a typical multi-database search of 6–9 sources, 30–60% of records are duplicates. Accurate deduplication is essential — missed duplicates inflate your screening workload; over-aggressive deduplication removes legitimate unique records.

Dual-reviewer screening: why it is mandatory

Every systematic review guideline — Cochrane Handbook, PRISMA 2020, JBI Manual — requires independent dual-reviewer screening at both the title/abstract and full-text stages. Two reviewers screen each record independently, without seeing each other's decisions, to eliminate selection bias. Disagreements are resolved by discussion or a third reviewer.

Screening approachBias riskInter-rater agreementAccepted by journals
Single reviewerHighN/ANo
Dual reviewer, not blindedModerateModerateSometimes
Dual reviewer, blind (independent)LowCohen's κ reportedYes ✓

Inter-rater agreement — typically measured using Cohen's kappa (κ) — should be calculated and reported. A κ of 0.61–0.80 is considered substantial agreement; above 0.80 is almost perfect. Low κ indicates your inclusion criteria need clarification.

05

Full-text screening and PRISMA flow diagram

Studies that pass title/abstract screening move to full-text assessment. You retrieve the full paper for each record and apply your eligibility criteria rigorously. Every exclusion at this stage must be documented with a specific reason, as PRISMA 2020 requires you to report the number of excluded full-texts and reasons for exclusion.

📊
PRISMA 2020 flow diagram: the four stages
  • Identification — records identified through database searching + other sources (citation searching, grey literature, hand searching)
  • Screening — records screened at title/abstract after deduplication; records excluded
  • Eligibility — full-text articles assessed; excluded with reasons
  • Included — studies included in qualitative synthesis and/or meta-analysis
06

Data extraction

For every included study, two reviewers independently extract the data you will use in your synthesis. Data extraction forms should be piloted on 2–3 studies before the main extraction begins, to ensure all reviewers interpret fields consistently.

What to extract

⚠️
The SD problem: when papers report SE or IQR

Many papers report standard error (SE) rather than standard deviation (SD), or present medians and interquartile ranges rather than means. For meta-analysis, you need SD. Convert SE to SD using SD = SE × √n. For IQR, you can estimate SD using Wan et al. (2014) or Luo et al. (2018) methods.

07

Risk of bias assessment

Risk of bias (RoB) assessment evaluates the methodological quality of each included study — specifically, whether design or conduct limitations could have introduced systematic error into the results. The instrument you use depends entirely on the study design.

ToolStudy designDomainsRating scale
RoB 2Randomised controlled trials5 domainsLow / Some concerns / High
ROBINS-INon-randomised intervention studies7 domainsLow / Moderate / Serious / Critical
Newcastle-Ottawa ScaleCohort and case-control studies8 stars across 3 domainsStars (0–9)
QUADAS-2Diagnostic test accuracy studies4 domainsLow / High / Unclear
AXISCross-sectional / prevalence studies20 itemsYes / No / Unsure

Risk of bias judgements should be made at the outcome level, not just the study level — a study can have low RoB for its primary outcome but high RoB for secondary outcomes if measurement methods differed.

08

Meta-analysis: pooling effects

If sufficient studies report compatible outcomes, you can combine their effect estimates statistically in a meta-analysis. This produces a pooled estimate with a confidence interval that incorporates between-study variance — giving you more statistical power than any individual study.

Choosing the right statistical model

Fixed-effect vs Random-effects: the essential decision

Fixed-effect model

Assumes all studies estimate the same true effect. Appropriate only when studies are near-identical replicates. Produces narrower confidence intervals — misleadingly so when heterogeneity exists. Rarely appropriate in clinical research.

Random-effects model

Assumes the true effect varies across studies. The default for most clinical meta-analyses. Produces wider, more honest confidence intervals. Requires estimation of between-study variance τ². DerSimonian-Laird or Paule-Mandel estimators supported.

DerSimonian-Laird (DL)

The most widely used τ² estimator. Computationally simple and well-understood. Can underestimate variance when k < 5 studies. The historical standard.

Paule-Mandel (PM)

Iterative method-of-moments estimator. Preferred when k < 10 studies because it is less biased. Produces more conservative (wider) confidence intervals. Increasingly recommended in recent methodological literature.

Understanding heterogeneity: I², Q, and τ²

Heterogeneity is the variation in effect estimates across studies that exceeds chance. It is one of the most misunderstood concepts in meta-analysis.

⚠️
I² does not measure the amount of heterogeneity — it measures its proportion

I² is the percentage of variance attributable to between-study differences rather than sampling error. An I² of 80% does not tell you the effect estimates are dramatically inconsistent — it tells you 80% of variance is between-study. With large samples, even trivial between-study variance produces high I². Report τ² (the absolute variance) and the prediction interval alongside I².

The 95% prediction interval is the most clinically useful heterogeneity statistic: it describes the range within which the true effect in a new similar study would be expected to fall 95% of the time. A wide prediction interval crossing the null line means the intervention may be beneficial in some settings and harmful in others — a fundamentally different clinical message than a narrow confidence interval.

Effect measures by outcome type

Outcome typeEffect measurePooled onBack-transformed
Continuous (same scale)Mean Difference (MD)Raw scaleN/A
Continuous (different scales)Standardised Mean Difference (SMD)SD unitsN/A
DichotomousOdds Ratio (OR) or Risk Ratio (RR)Log scaleExponentiated for display
DichotomousRisk Difference (RD)Raw proportionN/A
Time-to-eventHazard Ratio (HR)Log scaleExponentiated for display
09

GRADE: rating certainty of evidence

GRADE (Grading of Recommendations Assessment, Development and Evaluation) is the international standard for rating how confident you are in your pooled estimate. GRADE produces a certainty rating — High, Moderate, Low, or Very Low — for each outcome, based on systematic evaluation of five downgrade domains and three upgrade criteria.

DomainDirectionWhen to apply
Risk of bias↓ DowngradeIncluded studies have serious or critical methodological limitations
Inconsistency↓ DowngradeUnexplained heterogeneity — wide prediction interval, high I²
Indirectness↓ DowngradeEvidence does not directly answer the PICO question
Imprecision↓ DowngradeWide CI crossing the minimal important difference threshold
Publication bias↓ DowngradeFunnel plot asymmetry, Egger regression significant (p < 0.1)
Large effect size↑ UpgradeRR > 2 or < 0.5, and RoB is low
Dose-response gradient↑ UpgradeClear relationship between dose and effect magnitude
Plausible confounding↑ UpgradeConfounding would reduce the observed effect, so true effect is larger

RCT evidence starts at High certainty. Observational evidence starts at Low certainty. From there, you downgrade or upgrade based on the above domains, arriving at your final certainty rating for each outcome.

10

PRISMA 2020 reporting

PRISMA 2020 is the mandatory reporting standard for systematic reviews. The 2020 update by Page et al. (BMJ, 2021) introduced significant revisions to the flow diagram and expanded the checklist from 27 to 27 items with more detailed guidance.

The most visible PRISMA deliverable is the flow diagram — a visual record of how many records were identified, screened, assessed for eligibility, and ultimately included. PRISMA 2020 revised this to include two separate identification streams:

Verflux auto-generates your PRISMA 2020 diagram

Verflux pulls the exact record counts from your search and screening modules and builds a PRISMA 2020-compliant flow diagram automatically. Counts are editable, and the final diagram exports as a 300 DPI PNG ready for journal submission — without touching PowerPoint or drawing software.


Start your systematic review today

All analytical features included in the free trial. No credit card, no installation, no R or Python.

Create free account
Trial · 1 project · Full meta-analysis engine · Demo data included