Screening studies for a systematic review is the process of evaluating every record your search retrieves against a set of predefined criteria to decide what stays and what goes. It sounds simple. In practice, it is the single most time-consuming phase of any review, often accounting for 40 to 60 percent of total project hours.[1] It is also where the greatest risk of bias lives: miss one relevant study and your conclusions may shift; include one irrelevant study and your analysis gets muddier. Getting screening right is not optional. It is foundational.
Key Takeaways
- Screening happens in two stages: title and abstract screening first, then full-text screening of the shortlist.
- Dual independent screening with conflict resolution is the gold standard recommended by the Cochrane Handbook.
- Clear, testable inclusion and exclusion criteria (built from your PICO framework) prevent subjective decisions.
- AI-assisted screening can reduce workload by 60 percent or more while maintaining sensitivity above 95 percent.
- Every screening decision should be documented for PRISMA reporting and audit trail purposes.
What Is Study Screening in a Systematic Review?
Study screening is the structured evaluation of identified records to determine which meet your review's eligibility criteria. The Cochrane Handbook defines it as the process of assessing each record retrieved by the search strategy against predefined inclusion and exclusion criteria.[2] In practical terms, you start with thousands of records and progressively filter down to the dozens or hundreds that actually answer your research question.
The process is divided into two distinct stages. Title and abstract screening is the first pass, where reviewers read the title and abstract of each record and make a quick include, exclude, or uncertain decision. Full-text screening comes next, where the shortlisted records are read in their entirety and assessed against detailed eligibility criteria. According to Lefebvre and colleagues, this two-stage approach balances thoroughness with efficiency.[3]
The numbers involved can be daunting. A typical systematic review in medicine retrieves between 1,000 and 10,000 records from database searches. After title and abstract screening, between 5 and 15 percent usually proceed to full-text review. After full-text screening, the final included studies might represent less than 2 percent of the original search results.
Writing Effective Inclusion and Exclusion Criteria
Inclusion and exclusion criteria are the rules that determine which studies enter your review. They must be specific enough to apply consistently, broad enough to capture all relevant evidence, and defined before screening begins.
The best criteria flow directly from your PICO framework:
| PICO Element | Example Criterion | Screening Application |
|---|---|---|
| Population | Adults aged 18+ with Type 2 diabetes | Exclude paediatric studies, Type 1 diabetes, gestational diabetes |
| Intervention | SGLT2 inhibitors (any dose, any duration) | Exclude studies of other drug classes without SGLT2 arm |
| Comparator | Placebo or active comparator | Exclude single-arm studies without a comparison group |
| Outcome | HbA1c change at 12+ weeks | Exclude studies not reporting HbA1c or with follow-up under 12 weeks |
| Study Design | Randomised controlled trials | Exclude observational studies, case reports, narrative reviews |
Common additional criteria include language restrictions, date ranges, and publication status. The Cochrane Handbook recommends against language restrictions where possible, as excluding non-English studies can introduce bias.[2] If you must restrict by language, document the rationale in your protocol.
Pilot Testing: Before screening your full dataset, pilot test your criteria on 50 to 100 records. This reveals ambiguous criteria, calibrates reviewers, and prevents having to re-screen thousands of records after discovering a criterion is too vague. Calculate inter-rater agreement (Cohen's kappa) during the pilot. A kappa below 0.60 suggests your criteria need refinement.
Stage 1: Title and Abstract Screening
Title and abstract screening is the rapid first pass through your search results. The goal is to remove records that are clearly irrelevant while retaining anything that might be eligible.
The cardinal rule at this stage is: when in doubt, include. It is far better to carry a few extra records into full-text screening than to accidentally exclude a relevant study at the abstract stage. According to a study by Waffenschmidt and colleagues, even experienced reviewers miss between 2 and 5 percent of relevant records during title and abstract screening.[4]
Practical tips for efficient title and abstract screening:
- Screen in batches. Reviewing 200 to 300 abstracts per session with breaks reduces fatigue-related errors.
- Use a standardised form. Even a simple three-button interface (include, exclude, uncertain) is better than free-text notes.
- Record the reason for exclusion. While PRISMA only requires exclusion reasons at the full-text stage, tracking them at abstract level helps calibrate your team.
- Screen independently. Dual independent screening is the gold standard. Both reviewers should screen without seeing each other's decisions.
Speed varies widely. Manual screening of titles and abstracts typically takes 30 seconds to 2 minutes per record. For a review with 3,000 records screened by two reviewers, that is 50 to 200 hours of screening time before you even open a full-text PDF.
Dual Screening and Conflict Resolution
Dual independent screening means two reviewers independently assess every record, then compare decisions. The Cochrane Handbook strongly recommends this approach because single-reviewer screening consistently misses relevant studies.[2]
When two reviewers disagree on a record, that disagreement is called a conflict. Conflicts are resolved through one of three methods:
- Discussion between reviewers. The two reviewers review the record together and reach consensus. This is the most common approach.
- Third reviewer arbitration. A senior researcher makes the final decision. Used when discussion fails to resolve the conflict.
- Inclusive resolution. Any record that either reviewer includes moves forward. This maximises sensitivity at the cost of carrying more records to full-text screening.
Inter-rater reliability is measured using Cohen's kappa statistic. A kappa of 0.61 to 0.80 indicates substantial agreement; above 0.80 is almost perfect.[5] If your kappa is below 0.60 during early screening, stop and recalibrate. The problem is almost always ambiguous criteria, not incompetent reviewers.
Stage 2: Full-Text Screening
Full-text screening is the detailed assessment of each shortlisted study against your complete eligibility criteria. Unlike title and abstract screening, every exclusion at this stage must be documented with a specific reason.
PRISMA 2020 requires you to report the number of full-text articles excluded and the reasons for each exclusion, grouped by category.[6] Common exclusion reasons include wrong population, wrong intervention, wrong outcome, wrong study design, duplicate publication, and conference abstract only.
Full-text screening is slower per record but involves far fewer records. Expect to spend 5 to 15 minutes per full-text article, depending on complexity and how clearly the study reports its methods. For a typical review with 150 to 300 full texts to assess, this stage takes 15 to 75 hours across both reviewers.
Practical considerations at this stage:
- PDF retrieval. Not all full texts are freely available. Budget time for interlibrary loans, institutional access, and contacting authors for unpublished data.
- Multiple reports of the same study. A single trial may be published across multiple papers (protocol, primary results, secondary analyses, long-term follow-up). Link these together rather than treating them as separate studies.
- Borderline cases. When a study sits right on the boundary of your criteria, document why you included or excluded it. This transparency strengthens your review.
AI-Assisted Screening: How It Works and When to Use It
AI-assisted screening uses machine learning models to predict whether each record is likely to meet your inclusion criteria, based on its title and abstract text. The AI does not replace human judgement. It prioritises and triages, allowing reviewers to focus their attention where it matters most.
There are three main approaches to AI-assisted screening:
| Approach | How It Works | Typical Sensitivity |
|---|---|---|
| Active learning | Prioritises records most likely to be relevant; reviewer screens in order of predicted relevance | 90 to 95% |
| Semi-automated | AI flags a subset as safe to auto-exclude; reviewer screens the rest manually | 93 to 97% |
| Full AI triage | AI classifies every record as include/exclude; reviewer verifies AI decisions | 95 to 98% |
The critical metric for any AI screening tool is sensitivity (recall), not accuracy. In systematic review screening, a false negative (missing a relevant study) is far more damaging than a false positive (including an irrelevant study that gets caught at full-text stage). According to a validation study across 14 systematic reviews, AI-assisted screening achieved a pooled sensitivity of 97.3% with a median workload reduction of 63%.[7]
When is AI screening appropriate? AI screening works best for reviews with 500 or more records, clear inclusion criteria, and a standard biomedical vocabulary. For very small reviews (under 200 records) or highly specialised topics with unusual terminology, manual screening may be just as fast. For living reviews or review updates, AI screening is particularly valuable because the model can learn from your previous screening decisions.
Documenting Screening for PRISMA Compliance
The PRISMA 2020 flow diagram is the standard way to report study screening results. It tracks the flow of records through four phases: identification (records found), screening (records assessed), eligibility (full texts evaluated), and inclusion (studies in the review).[6]
Your flow diagram must report:
- Total records identified from each database (before and after deduplication)
- Records excluded at title and abstract screening
- Full-text articles assessed for eligibility
- Full-text articles excluded, with reasons grouped by category
- Studies included in qualitative synthesis and, if applicable, quantitative synthesis (meta-analysis)
If you used automation tools during screening, PRISMA 2020 recommends reporting this in the methods section, including which tool was used, what role it played, and how human oversight was maintained. For a detailed walkthrough of the flow diagram, see our PRISMA flow diagram guide.
Common Screening Mistakes and How to Avoid Them
After working with hundreds of systematic review teams, certain screening errors appear repeatedly:
- Vague criteria. "Studies about diabetes" is not a criterion. "Adults aged 18+ diagnosed with Type 2 diabetes mellitus" is. Specificity prevents disagreements.
- Screening fatigue. Error rates increase significantly after 90 minutes of continuous screening. Take breaks. Screen in batches.
- Not pilot testing. Jumping straight into full screening without a pilot round almost always leads to re-screening later.
- Single-reviewer screening. It is tempting when pressed for time, but single-reviewer screening consistently misses 5 to 10 percent of relevant studies compared to dual screening.
- Inconsistent exclusion reasons. Use a standardised set of exclusion codes, not free text. "Wrong population" applied consistently is better than twenty variations of the same idea.
For a broader overview of the systematic review process that screening fits into, see our systematic review methods guide. For understanding how your screening results feed into the analysis phase, our systematic review vs meta-analysis explainer covers the downstream steps.
Frequently Asked Questions
How many reviewers should screen studies for a systematic review?
At least two independent reviewers should screen every record at both the title/abstract and full-text stages. The Cochrane Handbook recommends dual independent screening as the gold standard because single-reviewer screening misses 5 to 10 percent of relevant studies. A third reviewer should be available to resolve conflicts when the two primary reviewers disagree.
What is an acceptable inter-rater agreement for screening?
A Cohen's kappa of 0.61 or higher indicates substantial agreement and is generally considered acceptable for systematic review screening. Kappa values above 0.80 indicate almost perfect agreement. If your kappa falls below 0.60 during pilot screening, recalibrate your criteria and re-train reviewers before proceeding with the full dataset.
Can AI replace human reviewers in systematic review screening?
No. Current AI screening tools assist human reviewers by prioritising records, flagging likely includes and excludes, and reducing workload by 50 to 70 percent. However, human oversight remains essential. AI tools achieve high sensitivity (95 to 98 percent) but not perfection, and systematic reviews require a level of methodological transparency that demands human accountability for every inclusion decision.
How long does screening take for a typical systematic review?
Screening duration depends on the number of records and the number of reviewers. A review with 3,000 records typically requires 50 to 200 hours for title and abstract screening (across two reviewers) and 15 to 75 hours for full-text screening. AI-assisted screening can reduce the title and abstract phase to 10 to 40 hours by automating triage of clearly irrelevant records.
What exclusion reasons should I record during full-text screening?
PRISMA 2020 requires exclusion reasons at the full-text stage, grouped by category. Common categories include wrong population, wrong intervention, wrong comparator, wrong outcome, wrong study design, duplicate publication, full text unavailable, and conference abstract only. Use standardised codes rather than free text to ensure consistency across reviewers.
If study screening is the bottleneck in your systematic review workflow, AI-assisted tools can recover hundreds of hours without sacrificing the sensitivity your review depends on. At Systematicly, screening is built into a platform that carries your data from search through to publication-ready results. Start a free project at research.systematicly.com to try it with your own data.
Summary
Screening studies for a systematic review involves two stages: a rapid title and abstract pass to remove clearly irrelevant records, followed by detailed full-text assessment against your eligibility criteria. Dual independent screening with conflict resolution remains the gold standard. Clear, PICO-derived criteria, pilot testing, and consistent documentation are the foundations of reliable screening. AI-assisted tools like Systematicly can reduce screening workload by over 60 percent while maintaining sensitivity above 97 percent, and automatically generate your PRISMA flow diagram as you work.
Cut your screening time without cutting corners. Systematicly's AI screens your records with 97.3% sensitivity, flags conflicts automatically, and builds your PRISMA diagram in real time. Start your free project and see how many hours you save on your next review.
References
- Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7(2):e012545. ↩
- Higgins JPT, Thomas J, Chandler J, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions, Version 6.4. Cochrane. 2023. ↩
- Lefebvre C, Glanville J, Briscoe S, et al. Searching for and selecting studies. In: Higgins JPT, Thomas J, editors. Cochrane Handbook for Systematic Reviews of Interventions. 2nd ed. Wiley; 2019:67-107. ↩
- Waffenschmidt S, Knelangen M, Sieben W, Bühn S, Pieper D. Single screening versus conventional double screening for study selection in systematic reviews. BMC Med Res Methodol. 2019;19:132. ↩
- McHugh ML. Interrater reliability: The kappa statistic. Biochem Med. 2012;22(3):276-282. ↩
- Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. ↩
- Bishop M, Chen S. AI-assisted abstract screening achieves 97.3% sensitivity across 14 systematic reviews: A validation study. Systematicly Research Lab. 2026. ↩