Skip to main content
  • Letter to the Editor
  • Open access
  • Published:

Assessing the performance of ChatGPT-4 and ChatGPT-4o in lung cancer diagnoses

To the Editor,

Lung cancer is a highly invasive and prevalent disease, which is a leading cause of cancer death globally [1]. Timely diagnosis is key to improving outcomes. Pulmonary CT is essential in detecting lung cancer, relying on signs like lobulation, spiculation, pleural indentation, and vacuolar sign, which help experts assess the type, location, and progression of the disease, aiding in clinical decision-making. Recently, with the rapid development of artificial intelligence (AI) technology, the large language models (LLM) such as ChatGPT-4o, ChatGPT-4, and Google Bard introduced image-reading capabilities [2], have offered new solutions for early lung cancer diagnosis [3], particularly in low- and middle-income regions with limited clinical expertise, which can have significant impacts on cost-effectiveness and local healthcare resource allocation. Therefore, this study evaluates the accuracy and cost-effectiveness of ChatGPT versus clinical physicians in diagnosing lung cancer, using published cases.

This study, conducted in January 2025, reviewed 60 lung cancer cases. We extracted medical history, images (CT, H&E, IHC, PET/CT, etc.), and multiple-choice options from the cases to create 60 question documents. These were entered into ChatGPT-4 and 4o, prompting the models to provide the most likely and second most likely answers, along with confidence ratings. Two chief lung oncologists independently conducted blinded evaluations for comparison. For cost analysis, we used the 2023 Eurozone average labor cost (35.6 EUR/hour or 38.7 USD/hour).

After evaluating responses from ChatGPT-4o, ChatGPT-4 and two physicians, results showed that for the top diagnosis, ChatGPT-4o (73.33%) was comparable to ChatGPT-4 (60.00%) and physician-2 (88.67%) (P = 0.121, P = 0.068), but significantly lower than physician-1 (95.00%) (P = 0.001). For the top two diagnoses, ChatGPT-4o (86.67%) also showed no significant difference from ChatGPT-4 (73.33%) and physician-2 (95.00%), but was significantly lower than physician-1 (98.33%) (P = 0.015). ChatGPT-4o had higher confidence in its first diagnosis compared to ChatGPT-4 (P < 0.001), but lower than both physicians (P < 0.001). Confidence for the second diagnosis dropped but remained higher than ChatGPT-4 (P < 0.001), with no significant difference from the physicians. ChatGPT-4o and ChatGPT-4 had significantly lower time and cost (P < 0.001) compared to the doctors, with ChatGPT-4o being the fastest and most cost-effective (Fig. 1).

Our research shows that ChatGPT-4o demonstrates high accuracy in lung cancer diagnosis, nearing the performance of clinical doctors, with clear advantages in time and cost. While ChatGPT-4 struggles with longer inputs and multimodal data, ChatGPT-4o’s capabilities make it a strong tool for initial lung cancer diagnosis, especially in regions with limited medical expertise. ChatGPT-4o also maintains consistent performance, even in multitasking or emergency situations where physician accuracy might decrease. However, the study’s small sample size and limited representation of lung tumor variability are notable limitations. Larger studies and more AI models are needed for further validation. Despite these limitations, our study suggests ChatGPT-4o could serve as a low-cost, rapid diagnostic tool, aiding doctors in improving diagnostic accuracy and providing valuable guidance for non-medical professionals. It lays a foundation for future AI-assisted lung cancer diagnosis and early intervention.

Fig. 1
figure 1

Workflow and results comparing the accuracy, confidence, time, and cost of ChatGPT-4, ChatGPT-4o and physicians in diagnosing lung cancer. ChatGPT-4o served as the control. Accuracy comparisons used Pearson’s chi-squared test, while confidence, time, and cost were compared with independent t-tests. ***: statistically significant; ns: not significant

Data availability

Publicly available data were analyzed in this study.

Abbreviations

AI:

Artificial intelligence

LLM:

Large language models

ChatGPT:

Chat generative pretrained transformer

CT:

Computed tomography

H&E:

Hematoxylin and eosin

IHC:

Immunohistochemistry

PET/CT:

Positron emission tomography/computed tomography

References

  1. Leiter A, Veluswamy RR, Wisnivesky JP. The global burden of lung cancer: current status and future trends. Nat Rev Clin Oncol. 2023;20(9):624–39. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41571-023-00798-3.

    Article  PubMed  Google Scholar 

  2. Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330(1):78–80. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jama.2023.8288.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Huang S, Yang J, Shen N, et al. Artificial intelligence in lung cancer diagnosis and prognosis: current application and future perspective. Semin Cancer Biol. 2023;89:30–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.semcancer.2023.01.006.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to the researchers who provided the original data.

Funding

This work was supported by the National Natural Science Foundation of China (No. 82473253).

Author information

Authors and Affiliations

Authors

Contributions

(I) Conception and design: CH Xie; (II) Administrative support: CH Xie and XF Dai; (III) Collection and assembly of data: JR Yang and X Cai; (IV) Data analysis and interpretation: JR Yang and X Cai; (V) Manuscript writing: JR Yang; (VI) Final approval of manuscript: All authors.

Corresponding authors

Correspondence to Xiaofang Dai or Conghua Xie.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

All authors have completed the ICMJE uniform disclosure form. The authors have no conflicts of interest to declare.

Consent for publication

All authors have completed the ICMJE uniform disclosure form. The authors have no conflicts of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Cai, X., Dai, X. et al. Assessing the performance of ChatGPT-4 and ChatGPT-4o in lung cancer diagnoses. J Transl Med 23, 346 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12967-025-06337-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12967-025-06337-1

Keywords