Are the machines ready to take over? Can artificial intelligence replace a human reviewer for literature screening and selection for systematic literature reviews?
Abstract
Objectives
The exponential increase in clinical research literature and aggressive timelines for Health Technology Assessment (HTA) submissions, especially with the upcoming EU HTA regulation, is making systematic literature reviews (SLRs) more resource-intensive and costly. Recent studies have shown that artificial intelligence (AI) could accelerate SLR preparation by serving as a second reviewer during title/abstract (TI/AB) screening. Here we present a case study to test LaserAI’s functionality for TI/AB screening.
Methods
The case study used an existing comprehensive clinical SLR, involving eight updates, of biologic treatments for Crohn’s disease, which involved two human reviewers for literature screening/selection with conflicts resolved by a third reviewer. The original SLR was used to train the AI; inputs were the original search results (7272 records), studies selected for full text review (176 records), and final study inclusions (63 records). Subsequently, the AI replicated the human reviewers’ screening for all eight updates (3257 records) and the results were compared against the human literature screening/selection. The main outcomes were sensitivity and workload savings.
Results
Across all updates the human reviewers included 165 records for full text review, while the AI selected 466 records. In seven of eight updates, the AI identified all studies that had been included by the human reviewers, which corresponds to 100% sensitivity. However, in update 6, one of three studies included by the human reviewers was missed by the AI, resulting in 67% sensitivity for this update. The average workload saving across all updates was 45.4%.
Conclusions
The results of this studysupportthe use of AI as a second reviewer forTI/AB screening during update searches. The AI demonstrated a good level of sensitivity, and its use could save considerable resources. Additional workload savings might have been achieved if we had retrained the AI after each update.
