Abstract
Objectives:
The purpose of this study was to evaluate visual gaze patterns and the ability to correctly identify cancer among participants of different experience levels when viewing benign and malignant vocal cord lesions.
Methods:
Thirty-one participants were divided into groups based on level of experience. These included novice (medical students, PGY1-2 otolaryngology residents), intermediate (PGY3-5 otolaryngology residents, gastroenterology fellow), advanced practice providers (physician assistants, nurse practitioners, and speech language pathologists), and experts (board-certified otolaryngologists). Each participant was shown 7 images of vocal cord pathology including glottic cancer, infectious laryngitis, and granuloma and asked to determine the likelihood of cancer on a scale of certain, probable, possible, and unlikely. Eye tracking data were collected and used to identify the area of interest (AOI) that each participant fixated on first, fixated on the longest, and had the greatest number of fixations.
Results:
No significant differences were seen among groups when comparing AOI with first fixation, AOI with longest fixation, or AOI with most fixations. Novices were significantly more likely to rate a low likelihood of cancer when viewing infectious laryngitis compared to more experienced groups (P < .001). There was no difference in likelihood of cancer rating among groups for the remaining images.
Conclusions:
There was no significant difference in gaze targets among participants of different experience levels evaluating vocal cord pathology. Symmetric appearance of vocal cord lesions may explain differences seen in likelihood of cancer rating among groups. Future studies with larger sample sizes will better elucidate gaze targets that lead to accurate diagnosis of vocal cord pathology.
Introduction
Eye tracking technology (ETT) follows and records a person’s gaze location resulting in a visual gaze pattern during a given task. In several fields of medicine, it has been used to detect differences in gaze patterns between novices and experts to diagnose glaucoma, 1 skeletal fractures, 2 and pigmented skin lesions 3 with the goal of using it as an education tool for training, assessment, and feedback. 4 ETT has previously been explored in the field of otolaryngology, specifically looking at gaze patterns during endoscopic sinus surgery 5 or facial analysis following parotidectomy. 6 However, there are no known studies evaluating its use in flexible laryngoscopy interpretation.
Flexible laryngoscopy and assessment of vocal cord pathology is an important diagnostic skill as part of otolaryngology training and education. However, training tools are lacking and there is no formal method to assess competency. Russell et al 7 used a video training tool to help novice learners evaluate abnormalities in the pharynx, hypopharynx, larynx, and subglottis, and found that this did not lead to better diagnostic accuracy or confidence when evaluating vocal cord abnormalities. One would assume that diagnostic accuracy improves with training. However, a study by Anis et al 8 surveyed a group of head and neck surgeons and found poor inter-rater reliability in identifying features associated with carcinoma, such as erythroplakia and aberrant microvasculature. This emphasizes the importance of developing better training tools to prepare and educate the next generation of surgeons.
The aim of this study was to use ETT to compare visual gaze patterns among participants of different experience levels while viewing vocal cord pathology in still images from flexible laryngoscopy. In addition, the likelihood of cancer diagnosis was compared among groups. By gaining insight into the gaze patterns of participants, we hope to improve our understanding of how vocal cord pathology is evaluated with the long-term goal of developing better teaching tools for its interpretation.
Methods
Participants
This study was approved by the Mayo Clinic Institutional Review Board (20-012365). Participants with varying experience in interpreting flexible laryngoscopy were recruited and consented. Groups were divided into novice (medical students and PGY1-2 otolaryngology residents with 2 years or less of flexible endoscopy experience), intermediate (PGY3-5 otolaryngology residents and a gastroenterology fellow with at least 2 years of flexible endoscopy experience), advanced practice provider (APP; nurse practitioners, physician assistants, and speech language pathologists with at least 5 years of flexible endoscopy experience), and expert (board-certified otolaryngologists).
Eye-Tracking Design
Seven still images of vocal cord pathology were selected by the senior author for the study (SLB). These included 4 images of glottic cancer, 2 of infectious laryngitis, and 1 granuloma. A Tobii Pro (Stockholm, Sweden) X3 eye-tracking camera system (120 Hz) and iMotions (Copenhagen, Denmark) Biometric Research Platform software version 7.1.14670.18 were used to objectively measure how participants viewed the study images. The system used an infrared eye-tracking sensor mounted below the stimulus screen to capture each participant’s gaze pattern. Participants were instructed to view images of vocal cord pathology and then rate the likelihood of cancer after viewing each image. Images were displayed on a 24-inch liquid crystal display screen in a quiet room and iMotions’ 9-point calibration and validation algorithm was used at the beginning of each data collection to ensure tracking accuracy. Each participant was shown 7 still images for 15 seconds each. After 15 seconds elapsed, the image disappeared and they were given 10 seconds to circle the likelihood of cancer on a scale of certain, probable, possible, and unlikely before the next image was shown. The same order of images was presented to each participant.
Data Analysis
Multiple areas of interest (AOI) for each image were designated by the senior author using the iMotions software. These included the region of pathology and surrounding subsites within the larynx such as the false cords and subglottis (Figures 1 and 2). Time to first fixation, fixation duration, and total number of fixations were collected for each AOI. These data were then used to identify the AOI that each participant fixated on first (ie, the AOI with the shortest time to first fixation), fixated on the longest, and had the greatest number of fixations. Differences in the AOI of first fixation, AOI with the longest fixation, AOI with the most fixations, and likelihood of cancer rating among all 4 experience groups were analyzed using Kruskal-Wallis tests and Spearman rank correlation coefficients. Differences between novices compared with the other 3 experience groups combined were obtained using Wilcoxon rank sum and Fisher exact tests. A P-value of less than .05 was considered statistically significant.

Designated AOI of malignant vocal cord lesions. Glottic cancer 1 (A): 1 = right cord mass, 2 = right cord vessel, 3 = left cord, 4 = right false cord, 5 = left false cord, 6 = subglottis; glottic cancer 2 (B): 1 = left cord mass, 2 = left cord vessel, 3 = right cord, 4 = right false cord, 5 = left false cord, 6 = subglottis; glottic cancer 3 (C): 1 = left cord mass, 2 = left cord vessel, 3 = right cord mass, 4 = right false cord, 5 = left false cord, 6 = subglottis; glottic cancer 4 (D): 1 = left cord mass, 2 = anterior commissure, 3 = left anterior cord vessel, 4 = left posterior cord, 5 = right anterior cord mass, 6 = right posterior cord, 7 = right false cord, 8 = left false cord, 9 = subglottis.

Designated AOI of benign vocal cord lesions. Infectious laryngitis 1 (A): 1 = right cord mass, 2 = left cord mass, 3 = right anterior cord, 4 = left anterior cord, 5 = right false cord, 6 = left false cord, 7 = subglottis; infectious laryngitis 2 (B): 1 = left cord, 2 = right anterior cord, 3 = right posterior cord, 4 = subglottis; granuloma (C): 1 = left cord granuloma, 2 = left cord, 3 = right cord, 4 = subglottis.
Results
There were 31 participants in the study, including 10 novice, 10 intermediate, 5 APP, and 6 experts. The novice group consisted of 3 medical students, 3 PGY1 otolaryngology residents, and 4 PGY2 otolaryngology residents. The intermediate group consisted of 2 PGY3 otolaryngology residents, 4 PGY4 otolaryngology residents, 3 PGY5 otolaryngology residents, and 1 gastroenterology fellow. The APP group consisted of 1 nurse practitioner, 3 physician assistants, and 1 speech language pathologist. The expert group consisted of 2 laryngologists, 1 pediatric otolaryngologist, and 3 head and neck surgical oncologists. Fifty-eight percent of participants were male and 42% were female.
Most participants fixated first, fixated for the longest duration, and had the most fixations on the region of pathology when viewing both malignant and benign vocal cord lesions. There were no significant differences among groups when comparing AOI with first fixation, longest fixation duration, and most fixations in all 7 images (Tables 1-3). When comparing the novice group to the 3 more experienced groups, there was a significant difference in AOI of first fixation with novices more likely to fixate on the subglottis first when viewing infectious laryngitis 2 (Table 2, P = .009).
Gaze Metrics and Cancer Certainty for Glottic Cancer Images 1 Through 4.
Abbreviation: NA, not applicable.
Summarized with n (%).
P-values for differences among all 4 experience groups analyzed as ordinal were obtained using Kruskal-Wallis tests and Spearman rank correlation coefficients.
P-values for differences between novices compared with the other 3 experience groups combined were obtained using Wilcoxon rank sum and Fisher exact tests.
Gaze Metrics and Cancer Certainty for Infectious Laryngitis Images 1 and 2.
Abbreviation: NA, not applicable.
Summarized with n (%).
P-values for differences among all 4 experience groups analyzed as ordinal were obtained using Kruskal-Wallis tests and Spearman rank correlation coefficients.
P-values for differences between novices compared with the other 3 experience groups combined were obtained using Wilcoxon rank sum and Fisher exact tests.
The software could not distinguish between these 2 areas for 1 subject. Right cord mass was selected for the purpose of this summary; associations were similar when right anterior cord was selected instead.
P-values <0.05 are in bold
Gaze Metrics and Cancer Certainty for Vocal Cord Granuloma.
Abbreviation: NA, not applicable.
Summarized with n (%).
P-values for differences among all 4 experience groups analyzed as ordinal were obtained using Kruskal-Wallis tests and Spearman rank correlation coefficients.
P-values for differences between novices compared with the other 3 experience groups combined were obtained using Wilcoxon rank sum and Fisher exact tests.
Most participants had a high suspicion of cancer when viewing glottic cancer 1, 2, and 4 with the majority rating the likelihood of cancer as certain or probable (87%, 64%, and 94%, respectively). After viewing glottic cancer 3, 32% of participants rated the likelihood of cancer as probable while 67% rated it possible or unlikely. Most participants had a low suspicion of cancer when viewing infectious laryngitis 1, infectious laryngitis 2, and granuloma with the majority rating the likelihood of cancer as possible or unlikely (64%, 84%, and 93%, respectively). A significant difference was observed among groups rating the likelihood of cancer for infectious laryngitis 1 (P = .004) with novices more likely to have a lower suspicion of cancer compared to the 3 more experienced groups (Table 2, P < .001). No significant differences were observed in the likelihood of cancer rating for the remaining images (Tables 1-3).
Discussion
This study evaluated the gaze patterns of participants with different experience levels and their diagnostic interpretation. In general, most participants directed their attention to the region of pathology. For the images of glottic cancer, the most common AOI fixated on was the lesion itself with the exception being glottic cancer 1. This image featured a prominent blood vessel posterior to the lesion (Figure 1A, right cord vessel), which drew the attention of most participants in each group. The only difference seen in gaze patterns was in the evaluation of infectious laryngitis 2. Most novices fixated first on the subglottis while most participants in the other 3 groups fixated first on the left vocal cord pathology (Table 2, P = .009). However, all participants fixated the longest on the left vocal cord, suggesting the novice group was ultimately able to locate the region of pathology.
If we define a “certain” or “probable” rating as correctly identifying a malignant lesion and a “possible” or “unlikely” rating as correctly identifying a benign lesion, diagnostic accuracy was overall similar across groups. After viewing glottic cancers 1, 2, and 4, most participants correctly rated the likelihood of cancer as “certain” or “probable.” After viewing infectious laryngitis 2 and granuloma, most participants correctly had a low suspicion of cancer, rating the likelihood of cancer as “possible” or “unlikely.” These images all showed predominantly unilateral vocal cord pathology, which may have influenced diagnostic accuracy.
Conversely, bilateral vocal cord pathology was more challenging. Most participants incorrectly rated the likelihood of cancer in glottic cancer 3 as “possible” or “unlikely.” Looking specifically at the gaze patterns for this image, 71% of participants focused on the left vocal cord lesion first but 58% of participants fixated the longest on the right vocal cord lesion. The detection of bilateral vocal cord pathology may have influenced the incorrect diagnosis of this lesion as benign. Infectious laryngitis 1 presented a similar challenge to participants. While most participants correctly rated this image as benign, all novice participants correctly had a low suspicion of cancer compared to 40% of intermediate, 60% of APP, and 50% of expert participants (P < .001). In this image, 74% of all participants fixated on the right vocal cord first. However, 60% of participants in the novice group fixated on the left vocal cord for the longest duration. This contrasts to 76% of the more experienced participants maintaining their gaze on the right vocal cord for the longest duration. It is possible more participants in the novice group recognized bilateral vocal cord pathology in this image, which may have influenced their diagnosis of this lesion as benign.
Our data suggest that symmetry influences diagnostic decision-making when viewing vocal cord pathology. The importance of symmetry on viewer gaze patterns is well documented in ETT studies on facial analysis. 9 Previous studies have shown that peripheral facial deformities, 10 hemifacial paralysis, 11 and unilateral coronal craniosynostosis 12 draw attention away from the central facial triangle of the eyes, nose, and mouth to the region of pathology. Previous work has also shown significantly increased gaze time on a unilateral total parotidectomy defect that was not seen in control patients or those who underwent superficial parotidectomy. 6 All of these studies included casual observers without medical expertise. Like the facial plastics literature, our study shows that novice viewers may naturally direct their attention to an asymmetric region of pathology. Furthermore, novice participants evaluating bilateral vocal cord lesions fixate on both sides, suggesting the importance of comparing sides for symmetry in making their diagnosis.
The importance of evaluating vascular patterns in glottic lesions has previously been recognized by the European Laryngological Society. 13 Aberrant microvasculature and erythroplakia have been noted to have significant associations with dysplastic and malignant lesions; however, inter-rater reliability among clinicians to identify such changes was poor. 8 This contrasts to our study, which did not identify differences in the ability of participants to identify vascular abnormalities on flexible laryngoscopy. Interestingly, the overall sensitivity of vascular stippling (a form of vascular aberrancy) and the presence of dysplasia and malignancy was only 51%. 14 This suggests that one cannot rely solely on the presence of high-risk features on flexible laryngoscopy in the diagnosis of vocal cord pathology. Clinical history is critical in the decision-making process and biopsies of suspicious lesions should be performed.
This study has several limitations. There were only 31 participants, which led to a lack of statistical power. Additionally, a control image of a normal larynx was not included. This would potentially affect novice participants who have no baseline for comparison. Because the same image order was presented to all participants, there was a possible learning effect with participants becoming more comfortable with the ETT with each successive image. Finally, there was a selection bias in designating the AOI. Board-certified otolaryngologists may come to the same diagnostic conclusion but rate different regions of the vocal cord as more important to their decision-making process.
Conclusions
In general, participants of varying training levels had similar gaze patterns and diagnostic accuracy when evaluating vocal cord lesions. Focused ETT studies evaluating the influence of vascular aberrancy in the diagnosis of unilateral glottic lesions and AOI that lead to the correct diagnosis of bilateral glottic lesions are warranted. Determining fixation targets that correlate with diagnostic accuracy would not only improve medical education, but also lead to the development of computer-based algorithms that would aid in clinical decision-making.
Footnotes
Author Note
This material was presented as a poster at the Mayo Clinic Education Science and Scholarship Symposium held November 30th to December 2nd, 2021 and Combined Otolaryngology Spring Meeting held April 27th to May 1st, 2022. It has not been previously published in part or whole and is not currently under consideration for publication elsewhere.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this study was provided by the Mayo Clinic Department of Otolaryngology, Rochester, MN.
