What is it about?
This study aimed to compare the effectiveness of two commonly used LLM chatbots, ChatGPT and Bard, to correctly identify articles for inclusion in systematic reviews within the field of Otolaryngology (OHNS) compared to three PRISMA‐compliant systematic reviews. We compare their performance with established PRISMA‐compliant review articles to assess their potential for streamlining the initial stage of systematic reviews
Featured Image
Photo by Zulfugar Karimov on Unsplash
Why is it important?
Large language models (LLMs) failed to fully replicate peer‐reviewed methodologies, producing outputs with inaccuracies but identifying relevant, especially recent, articles missed by the references. While human‐led PRISMA‐based reviews remain the gold standard, refining LLMs for literature reviews shows potential.
Perspectives
It is essential to provide extensive caution prior to use of any artificial intelligence in research. These models have proven effective in analyzing and evaluating data; however, their effectiveness is completely dependent on their dataset. Without thorough training of AI models and the human user, there lies great risk in utilizing these models for new investigations.
Jonathan Kuriakose
Northwestern University
Read the Original
This page is a summary of: Assessing Large Language Models for Early Article Identification in Otolaryngology—Head and Neck Surgery Systematic Reviews, Health Care Science, January 2026, Tsinghua University Press,
DOI: 10.1002/hcs2.70048.
You can read the full text:
Contributors
The following have contributed to this page







