ChatGPT and patient inquiries related to laboratory medicine
We read your story on the use of ChatGPT to answer pathology-specific questions (June issue, page 1). We are pleased to share the findings of our recent study titled “ChatGPT vs Medical Professional: Analyzing Responses to Laboratory Medicine Questions on Social Media” (doi.org/10.1093/clinchem/hvae093). We believe this research marks an important first step in leveraging medical informatics and large language models to directly address patients’ questions related to laboratory medicine.
Our study delves into the expanding role of AI in health care, focusing on ChatGPT, a large language model developed by OpenAI. We evaluated ChatGPT’s reliability and efficacy in addressing patient inquiries related to laboratory medicine. Unlike previous studies that focused mainly on theoretical applications or controlled settings,1,2 our research offers real-world insights by comparing ChatGPT’s performance with that of medical professionals on social media platforms like Reddit and Quora. Experienced laboratory medicine professionals then evaluated these responses for accuracy, relevance, and clarity, and for their inclusion of current information.
Our study highlights ChatGPT’s potential as a reliable public resource for laboratory medicine-related inquiries. We found that ChatGPT’s responses were often rated higher than those from medical professionals, consistent with recent literature published in JAMA,3 which reported higher satisfaction rates with ChatGPT-generated responses compared with those from physicians for general medical questions posted on social media.
Notably, there was no significant difference in preference among the laboratory medicine evaluators between ChatGPT versions 3.5 and 4.0, demonstrating a consistent level of performance across versions. This consistency is significant, given that ChatGPT’s training does not focus specifically on medical data, suggesting its robustness in processing general information.
While ChatGPT’s responses were mainly preferred for their accuracy and comprehensiveness, we observed that achieving a balance with brevity is crucial. Accurate and thorough answers may be highly rated by experts, but patients or laypersons often benefit more from concise, simple, and targeted responses. Our study did not use tailored prompts to optimize ChatGPT’s responses for brevity, but it is important to note that ChatGPT can provide shorter, easy-to-understand answers when guided by prompts. This flexibility underscores the potential of large language models to adapt their outputs to meet varying communication needs.
One limitation of our study is the varied professional backgrounds of those who provided the online responses, which may not always reflect laboratory medicine expertise. However, this diversity mirrors the range of perspectives patients might encounter when seeking medical care in person or advice online.
An interesting observation made by our evaluators was the professionalism displayed in ChatGPT’s responses, though it wasn’t a focus of this study. According to previous studies, ChatGPT exhibited a greater level of empathy than physicians’ responses3 and demonstrated superior emotional awareness compared with the general population,4 suggesting that its formal and empathetic language enhances its perceived reliability.
Despite the promise shown by large language models like ChatGPT, their limitations—including reliance on outdated medical knowledge and lack of specific medical training—underscore the need for more specialized solutions. Small language models, tailored with domain-specific knowledge, may offer more precise and relevant responses to health care questions while requiring fewer computational resources. AI’s future in health care likely lies in balancing the strengths of large and small language models to deliver comprehensive, accurate, and personalized medical information and services.
Our study offers a novel perspective by directly comparing ChatGPT-generated responses to those of human professionals in a real-world context, addressing a critical gap in existing literature. The results underscore the importance of ongoing engagement with AI technologies to refine and enhance their role in supporting diagnostic processes and patient care. As AI’s capabilities evolve, collaboration between AI developers and health care professionals will be crucial to enhance personalized medicine and improve health care outcomes.
- Munoz-Zuluaga C, Zhao Z, Wang F, Greenblatt MB, Yang HS. Assessing the accuracy and clinical utility of ChatGPT in laboratory medicine. Clin Chem. 2023;69(8):939–940.
- Cadamuro J, Cabitza F, Debeljak Z, et al. Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI). Clin Chem Lab Med. 2023;61(7):1158–1166.
- Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589–596.
- Elyoseph Z, Hadar-Shoval D, Asraf K, Lvovsky M. ChatGPT outperforms humans in emotional awareness evaluations. Front Psychol. 2023;14:1199058.
Min Yu, MD, PhD, MBA, D (ABCC)
Director of Point of Care Testing and Point of Service Laboratories
Director of Beth Israel Deaconess HealthCare-Chestnut Hill
Department of Pathology
Beth Israel Deaconess Medical Center
Harvard Medical School, Boston
Mark Girton, MD, D (ABCC)
Clinical Assistant Professor
Department of Pathology
University of Michigan, Ann Arbor