Hostname: page-component-54dcc4c588-54gsr Total loading time: 0 Render date: 2025-09-28T15:21:10.888Z Has data issue: false hasContentIssue false

A comprehensive evaluation of artificial intelligence–provided information on common ENT surgical procedures using the QAMAI tool

Published online by Cambridge University Press:  01 September 2025

Mitat Selçuk Bozhöyük*
Affiliation:
Department of Otorhinolaryngology, Turkish Republic of Ministry of Health Bitlis Tatvan State Hospital, Bitlis, Turkey
Levent Yücel
Affiliation:
Department of Otorhinolaryngology, University of Health Sciences, Gülhane Training and Research Hospital, Ankara, Turkey
*
Corresponding author: Mitat Selçuk Bozhöyük; Email: mitat.selcuk@hotmail.com

Abstract

Objectives

This study aimed to evaluate the quality of information provided by artificial intelligence (AI) applications regarding ENT surgeries and usability for patients.

Methods

ChatGPT 4.0, GEMINI 1.5 Flash, Copilot, Claude 3.5 Sonnet and DeepSeek-R1 were asked to provide detailed responses to patient-oriented questions about 15 ENT surgeries. Each AI application was queried three times, with a 3-day interval between each session. Two ENT specialists evaluated all responses using the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool.

Results

Average QAMAI scores for each AI application were as follows: ChatGPT 4.0 (27.56 ± 1.20), GEMINI 1.5 Flash (26.24 ± 1.26), Copilot (26.84 ± 1.35), Claude 3.5 Sonnet (28.24 ± 0.77) and DeepSeek-R1 (28.13 ± 0.84). A statistically significant difference was found among the applications (p < 0.001). ICC analysis indicated high stability across evaluations conducted for all five AI applications (p < 0.001).

Conclusion

AI has the potential to provide patients with accurate and consistent information about ENT surgeries, yet differences in QAMAI scores show that information quality varies between platforms.

Information

Type
Main Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of J.L.O. (1984) LIMITED.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Mitat Selçuk Bozhöyük takes responsibility for the integrity of the content of the paper.

References

Savage, SA, Seth, I, Angus, ZG, Rozen, WM. Advancements in microsurgery: a comprehensive systematic review of artificial intelligence applications. J Plast Reconstr Aesthet Surg 2025;101:6576Google Scholar
Gorgy, A, Xu, HH, Hawary, HE, Nepon, H, Lee, J, Vorstenbosch, J. Integrating AI into breast reconstruction surgery: exploring opportunities, applications, and challenges. Plast Surg 2024 doi:10.1177/22925503241292349 (Online ahead of print)Google Scholar
Kim, KH, Jeong, JH, Ko, MJ, Lee, S, Kwon, WK, Lee, BJ. Using artificial intelligence in the comprehensive management of spinal cord injury. Korean J Neurotrauma 2024;20:215–24Google Scholar
Gumbs, AA, Frigerio, I, Spolverato, G, Croner, R, Illanes, A, Chouillard, E, et al. Artificial intelligence surgery: how do we get to autonomous actions in surgery? Sensors (Basel) 2021;21:5526Google Scholar
Frosolini, A, Catarzi, L, Benedetti, S, Latini, L, Chisci, G, Franz, L, et al. The role of large language models (LLMs) in providing triage for maxillofacial trauma cases: a preliminary study. Diagnostics (Basel) 2024;14:839Google Scholar
Mayo-Yáñez, M, González-Torres, L, Saibene, AM, Allevi, F, Vaira, LA, Maniaci, A, et al. Application of ChatGPT as a support tool in the diagnosis and management of acute bacterial tonsillitis. Health Technol 2024;14:773–9Google Scholar
Bir Yücel, K, Sutcuoglu, O, Yazıcı, O, Ozet, A, Ozdemir, N. Can artificial intelligence provide accurate and reliable answers to cancer patients’ questions about cancer pain? Comparison of chatbots based on ESMO cancer pain guideline. Memo 2024;17:302–6Google Scholar
Abou-Abdallah, M, Dar, T, Mahmudzade, Y, Michaels, J, Talwar, R, Tornari, C. The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations? Eur Arch Otorhinolaryngol 2024;281:6147–53Google Scholar
Riestra-Ayora, J, Vaduva, C, Esteban-Sánchez, J, Garrote-Garrote, M, Fernández-Navarro, C, Sánchez-Rodríguez, C, et al. ChatGPT as an information tool in rhinology. Can we trust each other today? Eur Arch Otorhinolaryngol 2024;281:3253–9Google Scholar
Mira, FA, Favier, V, Dos Santos Sobreira Nunes, H, de Castro, JV, Carsuzaa, F, Meccariello, G, et al. Chat GPT for the management of obstructive sleep apnea: do we have a polar star? Eur Arch Otorhinolaryngol 2024;281:2087–93Google Scholar
Ostrowska, M, Kacała, P, Onolememen, D, Vaughan-Lane, K, Sisily Joseph, A, Ostrowski, A, et al. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024;281:6069–81Google Scholar
Chiesa-Estomba, CM, Lechien, JR, Vaira, LA, Brunet, A, Cammaroto, G, Mayo-Yanez, M, et al. Correction: exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol 2024;281:2777 Erratum for: Eur Arch Otorhinolaryngol 2024;281:2081–6Google Scholar
Langlie, J, Kamrava, B, Pasick, LJ, Mei, C, Hoffer, ME. Artificial intelligence and ChatGPT: an otolaryngology patient’s ally or foe? Am J Otolaryngol 2024;45:104220Google Scholar
Marques, M, Almeida, A, Pereira, H. The medicine revolution through artificial intelligence: ethical challenges of machine learning algorithms in decision-making. Cureus 2024;16:e69405Google Scholar
Alowais, SA, Alghamdi, SS, Alsuhebany, N, Alqahtani, T, Alshaya, AI, Almohareb, SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ 2023;23:689Google Scholar
Silberg, WM, Lundberg, GD, Musacchio, RA. Assessing, controlling, and assuring the quality of medical information on the Internet: Caveant lector et viewor–Let the reader and viewer beware. JAMA 1997;277:1244–5Google Scholar
Charnock, D, Shepperd, S, Needham, G, Gann, R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 1999;53:105–11Google Scholar
Bernard, A, Langille, M, Hughes, S, Rose, C, Leddin, D, Veldhuyzen van Zanten, S. A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 2007;102:2070–7Google Scholar
Boscolo-Rizzo, P, Marcuzzo, AV, Lazzarin, C, Giudici, F, Polesel, J, Stellin, M, et al. Quality of information provided by Artificial Intelligence chatbots surrounding the reconstructive surgery for head and neck cancer: a comparative analysis between ChatGPT4 and Claude2. Clin Otolaryngol 2025;50:330–5Google Scholar
Guo, L, Zhou, C, Xu, J, Huang, C, Yu, Y, Lu, G. Deep learning for chest X-ray diagnosis: competition between radiologists with or without artificial intelligence assistance. J Imaging Inform Med 2024;37:922–34Google Scholar
Rahsepar, AA, Tavakoli, N, Kim, GHJ, Hassani, C, Abtin, F, Bedayat, A. How AI responds to common lung cancer questions: ChatGPT vs Google Bard. Radiology 2023;307:e230922Google Scholar
Lechien, JR, Naunheim, MR, Maniaci, A, Radulesco, T, Saibene, AM, Chiesa-Estomba, CM, et al. Performance and consistency of ChatGPT-4 versus otolaryngologists: a clinical case series. Otolaryngol Head Neck Surg 2024;170:1519–26Google Scholar
Saibene, AM, Allevi, F, Calvo-Henriquez, C, Maniaci, A, Mayo-Yáñez, M, Paderno, A, et al. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation. Eur Arch Otorhinolaryngol 2024;281:1835–41Google Scholar
Vaira, LA, Lechien, JR, Abbate, V, Allevi, F, Audino, G, Beltramini, GA, et al. Validation of the quality analysis of medical artificial intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. Eur Arch Otorhinolaryngol 2024;281:6123–31Google Scholar
Khaldi, A, Machayekhi, S, Salvagno, M, Maniaci, A, Vaira, LA, La Via, L, et al. Accuracy of ChatGPT responses on tracheotomy for patient education. Eur Arch Otorhinolaryngol 2024;28:6167–72Google Scholar
Lim, B, Lirios, G, Sakalkale, A, Satheakeerthy, S, Hayes, D, Yeung, JMC. Assessing the efficacy of artificial intelligence to provide peri-operative information for patients with a stoma. ANZ J Surg 2025;95:464–96Google Scholar
Saxena, S, Barreto Chang, OL, Suppan, M, Meco, BC, Vacas, S, Radtke, F, et al. A comparison of large language model generated and published perioperative neurocognitive disorder recommendations: a cross sectional web-based analysis. Br J Anaesth 2025 doi:10.1016/j.bja.2025.01.001 (Online ahead of print)Google Scholar
Bahir, D, Zur, O, Attal, L, Nujeidat, Z, Knaanie, A, Pikkel, J, et al. Gemini AI vs. ChatGPT: a comprehensive examination alongside ophthalmology residents in medical knowledge. Graefes Arch Clin Exp Ophthalmol 2025;263:527–36Google Scholar
Zhao, FF, He, HJ, Liang, JJ, Cen, J, Wang, Y, Lin, H, et al. Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3. Eye (Lond) 2025;39:1132–7Google Scholar
Borsetto, D, Sia, E, Axon, P, Donnelly, N, Tysome, JR, Anschuetz, L, et al. Quality of information provided by artificial intelligence chatbots surrounding the management of vestibular schwannomas: a comparative analysis between ChatGPT-4 and Claude 2. Otol Neurotol 2025;46:432–6Google Scholar
Vaira, LA, Lechien, JR, Abbate, V, Allevi, F, Audino, G, Beltramini, GA, et al. Accuracy of ChatGPT-Generated Information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg 2024;170:1492–503Google Scholar
Lechien, JR, Rameau, A. Applications of ChatGPT in otolaryngology-head neck surgery: a state of the art review. Otolaryngol Head Neck Surg 2024;171:667–77Google Scholar
Vaira, LA, Lechien, JR, Maniaci, A, Tanda, G, Abbate, V, Allevi, F, et al. Evaluating AI generated informed consent documents in oral surgery: a comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents. J Craniomaxillofac Surg 2025;53:1823Google Scholar
Decker, H, Trang, K, Ramirez, J, Colley, A, Pierce, L, Coleman, M, et al. Large language model-based chatbot vs surgeon-generated informed consent documentation for common procedures. JAMA Netw Open 2023;6:e2336997Google Scholar
Patel, I, Om, A, Cuzzone, D, Garcia Nores, G. Comparing ChatGPT vs surgeon-generated informed consent documentation for plastic surgery procedures. Aesthet Surg J Open Forum 2024;6:ojae092Google Scholar
Beaulieu-Jones, BR, Berrigan, MT, Shah, S, Marwaha, JS, Lai, S-L, Brat, GA. Evaluating capabilities of large language models: performance of GPT-4 on surgical knowledge assessments. Surgery 2024;175:936–42Google Scholar
Liang, R, Zhao, A, Peng, L, Xu, X, Zhong, J, Wu, F, et al. Enhanced artificial intelligence strategies in renal oncology: iterative optimization and comparative analysis of GPT 3.5 versus 4.0. Ann Surg Oncol 2024;31:3887–93Google Scholar
Lechien, JR, Briganti, G, Vaira, LA. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol 2024;281:2159–65Google Scholar
Supplementary material: File

Bozhöyük and Yücel supplementary material

Bozhöyük and Yücel supplementary material
Download Bozhöyük and Yücel supplementary material(File)
File 598 KB