What is it about?

A new study found that ChatGPT-4, a popular AI language model, provides inconsistent risk assessments for patients with chest pain not caused by injury, despite showing strong overall correlation with established risk scoring tools. The variability in ChatGPT-4's scores when given identical patient data raises concerns about its reliability for clinical decision-making in evaluating these patients.

Featured Image

Why is it important?

This study is the first to comprehensively evaluate ChatGPT-4's ability to assess heart attack risk in patients with chest pain not caused by injury. The findings are timely and important because they highlight the need for further refinement and customization of AI language models before they can be safely integrated into clinical practice for this purpose. Addressing these limitations could help unlock the potential of AI to improve cardiac risk assessment and patient care.


I find the inconsistency in ChatGPT-4's risk assessments concerning, as it could lead to confusion and suboptimal care if relied upon in clinical settings. However, I remain optimistic about the potential of AI to enhance medical decision-making if properly developed and validated. As a researcher, I believe this study underscores the importance of rigorous testing and continuous improvement of AI models before deploying them in high-stakes healthcare contexts. Collaborations between AI experts and healthcare professionals will be key to ensuring these tools are safe, reliable, and truly beneficial for patients.

Thomas F Heston MD
University of Washington

Read the Original

This page is a summary of: ChatGPT provides inconsistent risk-stratification of patients with atraumatic chest pain, PLoS ONE, April 2024, PLOS,
DOI: 10.1371/journal.pone.0301854.
You can read the full text:

Open access logo



The following have contributed to this page