A comprehensive evaluation of artificial intelligence–provided information on common ENT surgical procedures using the QAMAI tool

Mitat Selçuk Bozhöyük; Levent Yücel

doi:10.1017/S0022215125103368

A comprehensive evaluation of artificial intelligence–provided information on common ENT surgical procedures using the QAMAI tool

Published online by Cambridge University Press: 01 September 2025

Mitat Selçuk Bozhöyük

and

Levent Yücel

Show author details

Mitat Selçuk Bozhöyük*: Affiliation:
Department of Otorhinolaryngology, Turkish Republic of Ministry of Health Bitlis Tatvan State Hospital, Bitlis, Turkey
Levent Yücel: Affiliation:
Department of Otorhinolaryngology, University of Health Sciences, Gülhane Training and Research Hospital, Ankara, Turkey
*: Corresponding author: Mitat Selçuk Bozhöyük; Email: mitat.selcuk@hotmail.com

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Objectives

This study aimed to evaluate the quality of information provided by artificial intelligence (AI) applications regarding ENT surgeries and usability for patients.

Methods

ChatGPT 4.0, GEMINI 1.5 Flash, Copilot, Claude 3.5 Sonnet and DeepSeek-R1 were asked to provide detailed responses to patient-oriented questions about 15 ENT surgeries. Each AI application was queried three times, with a 3-day interval between each session. Two ENT specialists evaluated all responses using the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool.

Results

Average QAMAI scores for each AI application were as follows: ChatGPT 4.0 (27.56 ± 1.20), GEMINI 1.5 Flash (26.24 ± 1.26), Copilot (26.84 ± 1.35), Claude 3.5 Sonnet (28.24 ± 0.77) and DeepSeek-R1 (28.13 ± 0.84). A statistically significant difference was found among the applications (p < 0.001). ICC analysis indicated high stability across evaluations conducted for all five AI applications (p < 0.001).

Conclusion

AI has the potential to provide patients with accurate and consistent information about ENT surgeries, yet differences in QAMAI scores show that information quality varies between platforms.

Keywords

head and neck surgery health services research informed consent medical politics otology rhinology technologies

Information

Type: Main Article
Information: The Journal of Laryngology & Otology , First View , pp. 1 - 6

DOI: https://doi.org/10.1017/S0022215125103368 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of J.L.O. (1984) LIMITED.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Mitat Selçuk Bozhöyük takes responsibility for the integrity of the content of the paper.

References

Savage, SA, Seth, I, Angus, ZG, Rozen, WM. Advancements in microsurgery: a comprehensive systematic review of artificial intelligence applications. J Plast Reconstr Aesthet Surg 2025;101:65–76Google Scholar

Gorgy, A, Xu, HH, Hawary, HE, Nepon, H, Lee, J, Vorstenbosch, J. Integrating AI into breast reconstruction surgery: exploring opportunities, applications, and challenges. Plast Surg 2024 doi:10.1177/22925503241292349 (Online ahead of print)Google Scholar

Kim, KH, Jeong, JH, Ko, MJ, Lee, S, Kwon, WK, Lee, BJ. Using artificial intelligence in the comprehensive management of spinal cord injury. Korean J Neurotrauma 2024;20:215–24Google Scholar

Gumbs, AA, Frigerio, I, Spolverato, G, Croner, R, Illanes, A, Chouillard, E, et al. Artificial intelligence surgery: how do we get to autonomous actions in surgery? Sensors (Basel) 2021;21:5526Google Scholar

Frosolini, A, Catarzi, L, Benedetti, S, Latini, L, Chisci, G, Franz, L, et al. The role of large language models (LLMs) in providing triage for maxillofacial trauma cases: a preliminary study. Diagnostics (Basel) 2024;14:839Google Scholar

Mayo-Yáñez, M, González-Torres, L, Saibene, AM, Allevi, F, Vaira, LA, Maniaci, A, et al. Application of ChatGPT as a support tool in the diagnosis and management of acute bacterial tonsillitis. Health Technol 2024;14:773–9Google Scholar

Bir Yücel, K, Sutcuoglu, O, Yazıcı, O, Ozet, A, Ozdemir, N. Can artificial intelligence provide accurate and reliable answers to cancer patients’ questions about cancer pain? Comparison of chatbots based on ESMO cancer pain guideline. Memo 2024;17:302–6Google Scholar

Abou-Abdallah, M, Dar, T, Mahmudzade, Y, Michaels, J, Talwar, R, Tornari, C. The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations? Eur Arch Otorhinolaryngol 2024;281:6147–53Google Scholar

Riestra-Ayora, J, Vaduva, C, Esteban-Sánchez, J, Garrote-Garrote, M, Fernández-Navarro, C, Sánchez-Rodríguez, C, et al. ChatGPT as an information tool in rhinology. Can we trust each other today? Eur Arch Otorhinolaryngol 2024;281:3253–9Google Scholar

Mira, FA, Favier, V, Dos Santos Sobreira Nunes, H, de Castro, JV, Carsuzaa, F, Meccariello, G, et al. Chat GPT for the management of obstructive sleep apnea: do we have a polar star? Eur Arch Otorhinolaryngol 2024;281:2087–93Google Scholar

Ostrowska, M, Kacała, P, Onolememen, D, Vaughan-Lane, K, Sisily Joseph, A, Ostrowski, A, et al. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024;281:6069–81Google Scholar

Chiesa-Estomba, CM, Lechien, JR, Vaira, LA, Brunet, A, Cammaroto, G, Mayo-Yanez, M, et al. Correction: exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol 2024;281:2777 Erratum for: Eur Arch Otorhinolaryngol 2024;281:2081–6Google Scholar

Langlie, J, Kamrava, B, Pasick, LJ, Mei, C, Hoffer, ME. Artificial intelligence and ChatGPT: an otolaryngology patient’s ally or foe? Am J Otolaryngol 2024;45:104220Google Scholar

Marques, M, Almeida, A, Pereira, H. The medicine revolution through artificial intelligence: ethical challenges of machine learning algorithms in decision-making. Cureus 2024;16:e69405Google Scholar

Alowais, SA, Alghamdi, SS, Alsuhebany, N, Alqahtani, T, Alshaya, AI, Almohareb, SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ 2023;23:689Google Scholar

Silberg, WM, Lundberg, GD, Musacchio, RA. Assessing, controlling, and assuring the quality of medical information on the Internet: Caveant lector et viewor–Let the reader and viewer beware. JAMA 1997;277:1244–5Google Scholar

Charnock, D, Shepperd, S, Needham, G, Gann, R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 1999;53:105–11Google Scholar

Bernard, A, Langille, M, Hughes, S, Rose, C, Leddin, D, Veldhuyzen van Zanten, S. A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 2007;102:2070–7Google Scholar

Boscolo-Rizzo, P, Marcuzzo, AV, Lazzarin, C, Giudici, F, Polesel, J, Stellin, M, et al. Quality of information provided by Artificial Intelligence chatbots surrounding the reconstructive surgery for head and neck cancer: a comparative analysis between ChatGPT4 and Claude2. Clin Otolaryngol 2025;50:330–5Google Scholar

Guo, L, Zhou, C, Xu, J, Huang, C, Yu, Y, Lu, G. Deep learning for chest X-ray diagnosis: competition between radiologists with or without artificial intelligence assistance. J Imaging Inform Med 2024;37:922–34Google Scholar

Rahsepar, AA, Tavakoli, N, Kim, GHJ, Hassani, C, Abtin, F, Bedayat, A. How AI responds to common lung cancer questions: ChatGPT vs Google Bard. Radiology 2023;307:e230922Google Scholar

Lechien, JR, Naunheim, MR, Maniaci, A, Radulesco, T, Saibene, AM, Chiesa-Estomba, CM, et al. Performance and consistency of ChatGPT-4 versus otolaryngologists: a clinical case series. Otolaryngol Head Neck Surg 2024;170:1519–26Google Scholar

Saibene, AM, Allevi, F, Calvo-Henriquez, C, Maniaci, A, Mayo-Yáñez, M, Paderno, A, et al. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation. Eur Arch Otorhinolaryngol 2024;281:1835–41Google Scholar

Vaira, LA, Lechien, JR, Abbate, V, Allevi, F, Audino, G, Beltramini, GA, et al. Validation of the quality analysis of medical artificial intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. Eur Arch Otorhinolaryngol 2024;281:6123–31Google Scholar

Khaldi, A, Machayekhi, S, Salvagno, M, Maniaci, A, Vaira, LA, La Via, L, et al. Accuracy of ChatGPT responses on tracheotomy for patient education. Eur Arch Otorhinolaryngol 2024;28:6167–72Google Scholar

Lim, B, Lirios, G, Sakalkale, A, Satheakeerthy, S, Hayes, D, Yeung, JMC. Assessing the efficacy of artificial intelligence to provide peri-operative information for patients with a stoma. ANZ J Surg 2025;95:464–96Google Scholar

Saxena, S, Barreto Chang, OL, Suppan, M, Meco, BC, Vacas, S, Radtke, F, et al. A comparison of large language model generated and published perioperative neurocognitive disorder recommendations: a cross sectional web-based analysis. Br J Anaesth 2025 doi:10.1016/j.bja.2025.01.001 (Online ahead of print)Google Scholar

Bahir, D, Zur, O, Attal, L, Nujeidat, Z, Knaanie, A, Pikkel, J, et al. Gemini AI vs. ChatGPT: a comprehensive examination alongside ophthalmology residents in medical knowledge. Graefes Arch Clin Exp Ophthalmol 2025;263:527–36Google Scholar

Zhao, FF, He, HJ, Liang, JJ, Cen, J, Wang, Y, Lin, H, et al. Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3. Eye (Lond) 2025;39:1132–7Google Scholar

Borsetto, D, Sia, E, Axon, P, Donnelly, N, Tysome, JR, Anschuetz, L, et al. Quality of information provided by artificial intelligence chatbots surrounding the management of vestibular schwannomas: a comparative analysis between ChatGPT-4 and Claude 2. Otol Neurotol 2025;46:432–6Google Scholar

Vaira, LA, Lechien, JR, Abbate, V, Allevi, F, Audino, G, Beltramini, GA, et al. Accuracy of ChatGPT-Generated Information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg 2024;170:1492–503Google Scholar

Lechien, JR, Rameau, A. Applications of ChatGPT in otolaryngology-head neck surgery: a state of the art review. Otolaryngol Head Neck Surg 2024;171:667–77Google Scholar

Vaira, LA, Lechien, JR, Maniaci, A, Tanda, G, Abbate, V, Allevi, F, et al. Evaluating AI generated informed consent documents in oral surgery: a comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents. J Craniomaxillofac Surg 2025;53:18–23Google Scholar

Decker, H, Trang, K, Ramirez, J, Colley, A, Pierce, L, Coleman, M, et al. Large language model-based chatbot vs surgeon-generated informed consent documentation for common procedures. JAMA Netw Open 2023;6:e2336997Google Scholar

Patel, I, Om, A, Cuzzone, D, Garcia Nores, G. Comparing ChatGPT vs surgeon-generated informed consent documentation for plastic surgery procedures. Aesthet Surg J Open Forum 2024;6:ojae092Google Scholar

Beaulieu-Jones, BR, Berrigan, MT, Shah, S, Marwaha, JS, Lai, S-L, Brat, GA. Evaluating capabilities of large language models: performance of GPT-4 on surgical knowledge assessments. Surgery 2024;175:936–42Google Scholar

Liang, R, Zhao, A, Peng, L, Xu, X, Zhong, J, Wu, F, et al. Enhanced artificial intelligence strategies in renal oncology: iterative optimization and comparative analysis of GPT 3.5 versus 4.0. Ann Surg Oncol 2024;31:3887–93Google Scholar

Lechien, JR, Briganti, G, Vaira, LA. Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol 2024;281:2159–65Google Scholar

Bozhöyük and Yücel supplementary material

File 598 KB

Article contents

A comprehensive evaluation of artificial intelligence–provided information on common ENT surgical procedures using the QAMAI tool

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Bozhöyük and Yücel supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests