Abstract
Objective:
To investigate the effect of artificial intelligence (AI) in the otolaryngology standardized resident training of endoscopy.
Methods:
A total of 30 standardized training residents from the Department of Otolaryngology at Shenzhen Children’s Hospital were selected from January 2022 to December 2024 and were randomly divided into an experimental group and a control group. They underwent a 1 month endoscopic rotation. The control group received traditional lectures and endoscopic operation training. The experimental group, in addition to the conventional teaching methods, participants can confirm the diagnosis of common pediatric otolaryngology conditions through automatic diagnosis of AI model. The theoretical scores, practical skills assessments, and residents’ satisfaction ratings of both groups were compared.
Results:
The theoretical and practical scores of the experimental group were significantly higher than those of the control group (P < .05). Satisfaction rate with training in the experimental group was also significantly higher than that of the control group (P < .05). The experimental group demonstrated significantly-higher satisfaction levels than the control group in learning interest, learning efficiency, self-directed learning ability, endoscopic theoretical knowledge, endoscopic operational skills, and standardized endoscopic diagnostic reporting (P < .05).
Conclusions:
Compared with traditional teaching methods, AI-assisted endoscopy training system can help improve participant satisfaction and skills. More intelligent otolaryngology training models will be developed in the future.
Introduction
Modern medicine has advanced rapidly since the 21st century. The continuous expansion of medical knowledge and evolving learning methods necessitate that doctors engage in lifelong learning. Standardized residency training plays a critical role in the ongoing education of medical graduates. It is designed to equip residents with essential theoretical knowledge and practical skills, as well as to deepen their understanding of key aspects of disease diagnosis and treatment principles. This training is vital for shaping residents into skilled clinical physicians. Traditional endoscopic training relies heavily on the instructor’s expertise and clinical experience, with students requiring extensive hands-on practice to master the techniques. However, without the right teaching strategies, students may experience a decline in both their intuitive understanding and motivation to learn. Therefore, selecting appropriate teaching methods for otolaryngology residents is crucial for enhancing the effectiveness of their training.
With the rapid advancement of artificial intelligence (AI) technology, diagnostic procedures such as gastrointestinal endoscopy, bronchoscopy, otolaryngology endoscopy, colposcopy, and cystoscopy have become more efficient and accurate.1,2 AI, a branch of computer science, focuses on solving complex problems by emulating human intelligence. Its core technologies, including machine learning and deep learning, process vast amounts of data to build pattern recognition systems for prediction and decision-making. The essence of AI lies in its capacity for self-learning and continuous optimization, enabling it to improve performance across diverse tasks through iterative data analysis. AI has progressed significantly, now capable of autonomously classifying, identifying, performing segmentation and measuring parameters of endoscopic images, and providing automated analyses and diagnoses of common diseases. 3 Furthermore, the class activation mapping can improve interpretability and enable more informed decision-making. This study harnesses AI technology to enhance training programs for otolaryngology endoscopy, aiming to improve teaching outcomes and elevate educational effectiveness.
Materials and Methods
Study Participants
A randomized-controlled trial was designed from January 2022 to December 2024, a total of 30 standardized training residents (hereafter referred to as “residents”) from the Department of Otolaryngology at Shenzhen Children’s Hospital were enrolled in the study. The group consisted of 3 undergraduate students, 25 master’s degree students, and 2 doctoral students. Participants were randomly assigned into 2 groups based on whether AI-assisted training software was utilized. The training lasted for 1 month and included procedures such as otoscopy, nasal endoscopy, and laryngoscopy. The experimental group (n = 15) comprised 5 males and 10 females, with ages ranging from 23 to 35 years and a mean age of 25.36 ± 3.62 years. The control group (n = 15) included 6 males and 9 females, with ages ranging from 22 to 31 years and a mean age of 24.85 ± 3.04 years. There were no statistically-significant differences in the baseline demographic characteristics between the 2 groups.
This study was conducted in compliance with the principles of the Declaration of Helsinki by the World Medical Association and was approved by the Ethical Committee of Shenzhen Children’s Hospital (protocol number 2022115). Written informed consent was obtained from the 30 standardized training residents. For pediatric images used in endoscopy training, parents or guardians provided written informed consent for access to endoscopic images of each child participant. The authors did not have access to information that could identify individual participants during or after data collection.
Research Tools
The control group adhered to the conventional teaching methods as outlined in the standardized training syllabus. Instruction was led by experienced instructors who first delivered lectures on the theoretical principles of otolaryngology endoscopy and the standards for image acquisition. This was followed by one-on-one training sessions focused on endoscopic operations. The clinical teaching was instructor-centered, with instructors demonstrating the endoscopic characteristics of common otolaryngology diseases in detail and guided residents in operating the endoscope to capture images of various lesion sites. The teacher instructs residents to select endoscopic images and generating diagnostic reports. At the conclusion of the course, both theoretical knowledge and practical skills (New Zealand Airway Limited ORSIM endoscopy simulation system) were assessed.
The experimental group received training enhanced by the Xception model in addition to the conventional teaching methods. The teachers impart theoretical principles of otolaryngology endoscopy and provide one-on-one guidance on endoscopic operations. Meanwhile, AI assists in guiding the residents to automatic diagnosis of different images during the training, and the participants could compare and confirm. This deep learning model utilized Python-based tools for image data processing, model training, and result testing (training flow shown in Figure 1). The detailed process of model construction can be found in our previous reports.4 -6 Based on the constructed deep learning model, we designed a classifier for common otolaryngology conditions. As shown in Figure 2, common diseases such as acute otitis media, otitis media with effusion, adenoid hypertrophy, and vocal cord polyps can be automatically diagnosed, and the target interest area of AI is displayed. Participants can confirm the diagnosis in real time.

The Xception model training flow.

Automatic diagnosis by Classifer.
Residents Satisfaction Evaluation
Resident satisfaction score was evaluated using a questionnaire survey (Supplemental Material), which analyzed the satisfaction levels of both groups with their respective training methods. The control group was guided by teachers to select endoscopic images and generating endoscopic diagnosis reports, while the experimental group added AI automatic diagnosis of different images during the training, and the participants could compare and confirm. The comparison was conducted across 10 aspects: learning interest, learning efficiency, self-directed learning ability, endoscopic theoretical knowledge, endoscopic operational skills, management of endoscopic complications, standardized endoscopic diagnostic reporting, clinical critical thinking, doctor-patient communication skills, and teamwork abilities. Each aspect was scored on a scale of 0 to 10, with a total score of 100, where higher scores indicate greater satisfaction. The grading criteria were as follows:
Satisfied: 60 to 100 points.
Dissatisfied: <60 points.
Evaluation of Training Effectiveness
Theoretical knowledge was assessed using a custom-designed examination paper, while practical skills were evaluated with the endoscopy simulation system (Airway Limited ORSIM, Airway Simulation Limited, New Zealand). This simulation system quantitatively measures endoscopic performance using metrics such as operation time, number of wall contacts, endoscope bending angles, simulated blood oxygen saturation monitoring, and diagnostic accuracy. The maximum score for each assessment was set at 100 points.
Statistical Analysis
Statistical analysis was conducted using the IBM SPSS Statistics V22.0 software. Continuous variables were expressed as mean ± standard deviation (χ̄ ± s) and analyzed using the t-test, while categorical variables were presented as percentages and compared using the χ² test. A P < .05 was considered statistically significant.
Results
The comparison of training satisfaction between the 2 groups of residents is shown in Table 1. The experimental group demonstrated significantly-higher satisfaction levels than the control group in learning interest, learning efficiency, self-directed learning ability, endoscopic theoretical knowledge, endoscopic operational skills, and standardized endoscopic diagnostic reporting (P < .05). The satisfaction rate in the experimental group was 93.3%, significantly higher than the 60% observed in the control group (P = .031). The theoretical exam score for the experimental group was 88.13 ± 7.37, which was significantly higher than the control group’s score of 78.07 ± 12.10. The practical skills score for the experimental group was 89.40 ± 8.88, which was also significantly higher than the control group’s score of 78.80 ± 11.47 (P = .009), as shown in Table 2.
Comparison of Satisfaction Between the 2 Groups.
Comparison of Training Effect Between the 2 Groups.
t-test.
χ2 test.
Discussion
Endoscopic procedures play a vital role in the standardized training of otolaryngology residents. The otolaryngological cavities feature complex anatomical structures, narrow openings, and deep recesses, making them challenging to navigate. Moreover, these cavities are closely linked to neighboring areas such as the orbit, skull base, oral cavity, respiratory tract, and digestive tract, which adds to the complexity. As a result, residents must develop excellent hand-eye coordination and spatial awareness. Specialized instruments, such as the anterior rhinoscope, indirect nasopharyngoscope, and indirect laryngoscope, are commonly used for examinations in both clinical practice and teaching. However, different instructors may have varying approaches to observation and teaching, which can make it challenging for residents to quickly master standardized examination techniques and disease diagnoses. Many residents often find the learning process to be monotonous and tedious. Since Messerklinger first performed endoscopic sinus surgery, 7 the endoscopic system has become an indispensable “eye” in the field, playing a revolutionary role in the advancement of otolaryngology.
Traditional otolaryngology endoscopic training primarily relies on hands-on guidance from instructors, requiring residents to practice extensively to acquire endoscopic operation skills. However, due to their limited experience, residents often cause discomfort or unintentional injury to patients when performing endoscopy in the otolaryngological cavities with delicate and complex anatomical structures. This challenge is particularly pronounced for beginners. Residents frequently encounter difficulties in tasks such as selecting appropriate images, identifying and marking anatomical landmarks, making accurate diagnoses, and writing comprehensive endoscopic reports. Additionally, traditional training model often employ subjective assessment criteria, lacking standardized quantitative metrics. This makes it difficult for residents to receive timely and objective feedback, hindering their learning progress. Furthermore, residents come from diverse educational backgrounds and have varying levels of clinical expertise, posing difficulties for instructors to meet individual needs of teaching. To address this challenge, we have for the first time applied AI to otolaryngology endoscopic training.
With the rapid advancement of computer technology, particularly the maturation of deep learning, vision analysis, and natural language processing techniques, AI has demonstrated immense potential in medical education. By leveraging machine learning algorithms, AI can analyze vast amounts of medical data, enabling residents to better understand disease progression and identify optimal treatment strategies derived from big data insights. 3 Machine learning algorithms can autonomously analyze and recognize complex features in endoscopic images. After training on extensive datasets, AI models have achieved high accuracy in tasks such as automatic image capture, lesion identification, and diagnostic reporting.1,8 Advancements in virtual reality (VR) technology have introduced more intuitive and immersive learning methods for medical simulation education. VR provides realistic operational experiences while simulating the anatomical structures and pathological conditions of various patients. This enables residents to enhance their preparedness for real-world clinical practice.9,10
The experimental group trained with the AI-assisted software achieved significantly-higher scores than the control group in key satisfaction indicators—including learning interest (8.80 ± 1.20 vs 6.07 ± 2.12), learning efficiency (8.73 ± 1.28 vs 6.20 ± 1.97), and self-directed learning ability (8.67 ± 2.54 vs 6.13 ± 1.23), all with P < .05. This aligns with their higher overall satisfaction rate (93.3% vs 60%, P = .031), indicating that AI integration enhances engagement and perceived value in training. Notably, no significant differences emerged between groups in managing endoscopic complications (7.60 ± 1.72 vs 7.67 ± 1.59, P = .94), clinical critical thinking (7.47 ± 1.88 vs 7.68 ± 1.46, P = .86), doctor-patient communication (7.60 ± 1.35 vs 7.87 ± 1.30, P = .54), or teamwork (7.67 ± 1.11 vs 7.46 ± 1.45, P = .39). This pattern likely reflects the nature of these competencies: Complication management and critical thinking rely on long-term clinical experience and adaptability to complex, rare scenarios, where short-term AI training, which may focus on standardized procedures, has limited impact. Similarly, communication and teamwork are interpersonal skills shaped by face-to-face interaction, a strength of traditional teaching that AI has yet to replicate. These findings highlight that while AI excels at structured skill-building and knowledge transmission, it cannot fully replace human-centric training for experience-dependent or relational competencies. In addition, these subjective improvements translated to objective performance gains: The experimental group scored significantly higher in theoretical exams (88.13 ± 7.37 vs 78.07 ± 12.10, P = .01) and practical skills (89.40 ± 8.88 vs 78.80 ± 11.47, P = .009). This suggests that AI tools not only support the acquisition of cognitive knowledge but also enhance procedural competencies, possibly due to repeated simulations, instant feedback, and visualization aids provided by the AI software.
The diagnosis of otitis media has become a prominent focus of AI research in otoscopy. Traditional diagnostic tools, such as hand-held otoscope, pneumatic otoscope, and tympanometry, are prone to diagnostic errors. Our team developed the Xception model,4,5 which intelligently classifies tympanic membrane images and effectively distinguish the membranes between normal, acute otitis media, and secretory otitis media, with an impressive accuracy rate of 97.45% (Figure 1). By utilizing this tool, residents can automatically identify and classify common otologic lesions, enabling them to quickly familiarize themselves with the characteristics of various pathologies. Another innovation in otoscopic training involves remote endoscopic examinations. Residents can connect an otoscope to a smartphone and transmit images to their instructor for real-time remote teaching.11,12 This approach enhances accessibility and fosters more interactive and flexible learning opportunities.
Nasal endoscope is a critical tool for examining the structures and pathologies of the nasal cavity and sinuses. The application of AI has significantly improved training outcomes for residents. AI research has advanced deep learning-based video image analysis and registration algorithms, enabling 3-dimensional reconstructions and navigation systems of the nasal cavity for use during nasal endoscopic examination and surgery. These technologies assist residents in accurately identifying and navigating the complex anatomical structures of the nasal cavity, thereby reducing procedural difficulty and enhancing operational precision. 13 Building on these advancements, our team developed the Xception model, which can automatically measure parameters from various cross-sections of the adenoids in nasal endoscopy. 6 This model is particularly useful for diagnosing and grading pediatric adenoid hypertrophy. Together, these innovations provide substantial support for clinical decision-making and enhance the quality of endoscopic training for residents.
AI-based classifiers offer valuable assistance in diagnosing laryngeal tumors using endoscopic images, particularly in differentiating between benign lesions, precancerous conditions, and malignant tumors.14,15 AI has demonstrated high accuracy in selecting key frames from video laryngoscopy, which can assist residents in automatically capturing and selecting images.16,17 This capability streamlines the diagnostic process and supports residents in improving their skills and efficiency. Stroboscopy-based laryngeal vibration imaging is a crucial technique for clinical voice assessment. Fehling et al 18 utilized AI to develop a model for automatically segmenting the glottic anatomy in high-speed video endoscopic images, achieving segmentation results comparable to expert-level performance. This technology can significantly assist residents in learning to diagnose vocal cords lesions. Endoscopic training can also benefit from portable smartphone laryngoscope systems, which allow high-speed video images of patients with laryngeal diseases to be transmitted to smartphones, facilitating remote teaching and expanding the reach of training opportunities.12,19
Despite the promising application of AI in otolaryngology endoscopic training, its widespread adoption faces several challenges. 20 First, the high cost of technology remains a significant barrier. The development and maintenance of AI-based endoscopic systems require substantial financial investment, which many educational institutions may find difficult to support. Second, concerns regarding data privacy and security are critical obstacles to the broader implementation of AI. Protecting patient data from breaches and ensuring confidentiality are pressing issues that must be addressed to gain trust and facilitate the adoption of AI technologies in medical education. Additionally, the integration of AI in endoscopic training raises ethical concerns. Students may become overly dependent on AI, potentially diminishing their ability to engage in independent learning and critical thinking. For instance, AI misdiagnoses or incorrect feedback could lead to residents making erroneous judgments, which could negatively impact their clinical practice. Finally, the evolving role of instructors presents another challenge. Teachers may need to shift from being traditional knowledge providers to becoming supervisors and guides throughout the training process. This transition requires educators to possess higher technical literacy and the ability to effectively facilitate learning using AI tools.
Our study had several limitations. First, the study is limited by its non-blinded nature and single-center focus, which may have decreased the power of the evidence. Some trainees were excited about AI, this leads to greater enthusiasm for endoscopic learning and higher satisfaction score. Also, the practical skills score observed in this study could have been overestimated or underestimated because the endoscopy simulation system is not exactly the same as the human body structure. Moreover, the sample size was relatively small, with only 30 residents enrolled, potentially limiting generalizability. The intervention period was 1 month, which may not reflect the long-term retention or impact of AI-assisted training. Despite these limitations, the present study reveals AI plays a significant role in assisting in endoscopic diagnosis teaching.
Conclusion
The application of AI technology in otolaryngology endoscopic training is gradually transforming traditional teaching methods, offering benefits to students, doctors, and patients alike. For students, AI provides valuable assistance in improving endoscopic skills. For doctors, AI models enhance diagnostic accuracy and treatment efficiency, helping to reduce workload and allowing health care professionals to focus on more complex clinical tasks. For patients, AI integration ensures more accurate, faster, and safer medical services. The limitation of this study is the relatively-small-sample size, and the results are from a single tertiary care institution; future steps could involve multicenter randomized prospective controlled studies that would have more endoscopic image data to train the model and applying these findings to medical education in various disciplines. As the technology continues to evolve, the synergy between AI and endoscopic training will become even more integrated, playing an increasingly-crucial role in the advancement of otolaryngology education and practice.
Supplemental Material
sj-pdf-1-ear-10.1177_01455613251371794 – Supplemental material for Construction and Implementation of Otolaryngology Endoscopic Training Based on Artificial Intelligence
Supplemental material, sj-pdf-1-ear-10.1177_01455613251371794 for Construction and Implementation of Otolaryngology Endoscopic Training Based on Artificial Intelligence by Guo Xu, Desheng Jia, Xuansheng Wang, Jing Chen, Hongguang Pan and Zebin Wu in Ear, Nose & Throat Journal
Footnotes
Acknowledgements
We are deeply grateful to every standardized training resident who agreed to participate in this study.
Ethical Considerations
All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. The study was approved by the ethics committee of the Shenzhen Children’s Hospital (2022115).
Consent to Participate
All patients included in the study are under 18, and informed consent was obtained from all individual participants’ parents and/or legal guardian. Written informed consents was obtained from the 30 standardized training residents.
Consent for Publication
Informed consent for publication was obtained from all patients’ parents for the use of their medical records in writing this study. All the authors have approved the manuscript and agree with submission to your esteemed journal.
Author Contributions
G.X.: conceptualization, methodology, drafting initial manuscript. D.J.: data curation, validation. X.W.: training AI model. J.C.: data curation, statistical analysis. H.P.: funding acquisition, project administration. Z.W.: writing—review and editing. All authors provided final approval and agree to be accountable for all aspects of the work.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding was supported by the Shenzhen Science and Technology Program (JCYJ20220530155603007) and Shantou University Medical School 2023 Teaching Reform and Research Project (21).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Datasets used and/or analyzed during the current study are available from the corresponding author (Z.W., wuzbent@hotmail.com) on reasonable request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
