Abstract
Background
The bedside head impulse test (bHIT) is a clinical method of assessing the vestibulo-ocular reflex. It is a critical component of the bedside assessment of dizzy patients and helps differentiate acute stroke from vestibular neuritis. A previous study on senior Otolaryngology residents showed poor competence in performing and interpreting the bHIT and called for specific evaluations in the Competency By Design (CBD) curriculum to remedy this. This study aimed to assess whether those competencies have improved after full implementation of CBD in residency programs.
Methods
Thirty post-graduate year 4 Otolaryngology residents in Canada were evaluated on the use of the bHIT using a written multiple-choice question (MCQ) examination, interpretation of bHIT videos, and performance of a bHIT. Ratings of bHIT performance were completed by 2 expert examiners (DT, DL) using the Ottawa Clinic Assessment Tool.
Results
Only 6.7% (rater DT) and 20% (rater DL) of residents were found able to perform the bHIT independently. Inter-rater reliability was moderate (0.55, intraclass correlation). Mean scores were 70% (13.4% standard deviation) for video interpretation and 59% (20.6% standard deviation) for multiple-choice questions. Video interpretation scores did not correlate with bHIT ratings (Pearson r = 0.11), but MCQs and bHIT ratings did correlate moderately (Pearson r = 0.52).
Comparing to the prior study, residents performed worse on the bHIT (3.14 average score vs 3.64, P < .01) and fewer residents performed the bHIT independently (6.7% vs 22%—rater DT, 20% vs 39%—rater DL). Residents also performed worse on MCQs (58.7% vs 70.9%, P = 0.038), though similarly on video interpretation (70% vs 65%, P = .198).
Conclusion
Fourth year OTL-HNS residents in Canada are not competent in performing the bHIT. These findings have implications for refining competency-based curricula in the evaluation of critical physical exam skills.
Introduction
The bedside head impulse test (bHIT) was described in 1988 as a quick and safe clinical method of assessing the angular vestibulo-ocular reflex (VOR). 1 Because abnormalities of the VOR are a reliable sign of peripheral vestibular dysfunction, the bHIT forms an essential part of the examination of patients presenting with both acute and chronic dizziness.1-4 Despite this, the bHIT is often not performed in appropriate clinical settings, not done properly, or not interpreted appropriately.5-8 Other studies have shown that the clinical usefulness of the bHIT improves with increasing experience at performing the test. 9
The bHIT forms an integral part of the HINTS examination (head impulse test, nystagmus, test of skew), a battery of clinical bedside examination techniques that, if done appropriately and well, can differentiate between peripheral vestibular dysfunction and posterior circulation stroke in those patients presenting with acute vestibular syndrome. 3 A recent meta-analysis confirmed that the HINTS examination should be the standard of practice in this patient population, but also recognized that physicians without specialist training in its use and interpretation are less likely to find it helpful. 10 The authors thus called for increased training and education.
Competency By Design (CBD) is the Canadian implementation of competency-based medical education (CBME) and represents a shift in medical education toward outcome-based design, implementation, and assessment. 11 This shift required defining key competencies and training experiences for residency, of which the vestibular assessment is specifically required in Otolaryngology.12,13 In an informal survey conducted by the authors of academic otologists in Canada, all confirmed that specific teaching on the bHIT is included in various forms (eg, didactic lectures, bedside teaching). As part of the workplace-based assessment of these competencies, entrustable professional activities (EPAs) are used, which are a broad unit of professional practice observable in the workplace. 14 For otolaryngologists in Canada, this includes assessing patients with dizziness as part of Core EPA #20, with the bHIT forming essential knowledge for this task. 15 However, as the EPA is currently formulated residents can be judged competent in Core EPA #20 without having a bHIT directly observed, as they can be evaluated on clinical presentations where the use of a bHIT is not required (ie, a patient with benign paroxysmal positional vertigo). Of interest, the CBD curriculum for neurology was designed with an EPA that requires direct observation and evaluation of a HINTS examination (which includes the bHIT) in addition to the Dix-Hallpike test.
Five years ago, immediately before CBD was introduced in Otolaryngology residencies across Canada, our group published a study highlighting a lack of knowledge and training among final-year Otolaryngology residents in being able to perform and interpret the bHIT. 8 As the first cohort of residents completely trained in a CBME curriculum just graduated in 2022, we aimed to determine whether bHIT skills have improved after the full implementation of CBME. We hypothesize that despite having a required EPA for assessing patients with dizziness, most senior residents will not be competent in performing and/or interpreting the bHIT and that there will be no improvement in any of the 3 scored aspects of the study when compared to the pre-CBD era.
Methods
All post-graduate year 4 (PGY4) residents in Canadian Otolaryngology training programs were eligible for inclusion. PGY4 residents are expected to have completed their Core EPA #20 by the end of that year. Subjects were recruited and the study was performed at a Royal College of Surgeons of Canada annual exam review course given to Otolaryngology residents in Calgary, Alberta, Canada, in May 2023. All residents in all Canadian programs typically attend this course. The only exclusion criterion was an active physical limitation that might restrict participants’ ability to perform the bHIT.
Thirty out of 34 eligible residents gave consent and participated in the study. Subjects were given a pre-test 10 item multiple choice questionnaire (MCQ) quiz to assess knowledge surrounding the clinical use of the bHIT and how to perform the test (Appendix A).
Subjects were then asked to perform the bHIT on one of the authors (RR) and instructed to perform the examination as they would during a typical clinical encounter. No feedback was given during the procedure, and competence in performing the procedure was judged by 2 expert evaluators watching the subjects perform the test in real time (DL, DT). The evaluators used a modified version of an entrustability scale, the Ottawa Clinic Assessment Tool, a previously-validated tool for assessing clinical performance based on a 5 point Likert scale. 16 The bHIT was rated on multiple components: patient instructions, positioning, and impulse characteristics including speed, amplitude, consistency (between sides and with any repeated bHITs), and use of distraction and unpredictability (Appendix B). One rating for each component was assigned for the entire encounter. Finally, the participants were shown a series of 10 bHITs on video monitors and asked to judge whether each bHIT was normal or abnormal. The MCQ quiz, video, and evaluators (DL, DT) were all the same as in the 2018 study.
A rating of 4 (“independent with only minor corrections”) was considered a reasonable cut-off for competent performance of the bHIT. Statistical analysis of the data included calculating percentages of residents in 2 scenarios: scoring 4 or greater on each aspect of the entrustability scale or having a mean score of 4 or greater. Both were felt to be reasonable interpretations of being able to independently perform the procedure. Inter-rater reliabilities were calculated using intraclass correlations for both single-rater reliability (how reliable a single rater’s scores would be) and the reliability of the mean rating of both raters (how reliable the mean of 2 raters’ scores would be). Mean scores were calculated for the MCQ and video interpretation tests, and Pearson correlations were calculated between the different forms of assessment—mean rater bHIT scores, video interpretation score, and both mean MCQ scores and individual questions. Finally, the mean scores for the bHIT, video, and MCQ scores for the current cohort were compared with those for the prior study cohort using Student’s t-test.
Results
Only 2 (6.7%) and 6 (20%) of residents (DT and DL rating, respectively) were able to perform the bHIT independently with an average score of 4 or greater. Inter-rater reliability of the modified OCAT was fair (0.55, intraclass correlation), and the reliability of the mean rating was good (0.71) for the continuous rating scale, though using a cutoff of an average of “4” resulted in only fair inter-rater reliability (kappa = 0.44). Residents were rated poorest in the proper use of distracting movements/unpredictability of the bHIT (mean score 2.85), positioning (mean 2.87), and improper amplitude (mean score 2.97, mostly too large of an amplitude), as compared to the mean bHIT score (3.14).
The mean scores were 70% (standard deviation 13.4%) for the video interpretation and 58.7% (20.6% standard deviation) for the MCQs. The scores on the MCQ quiz had fair correlation with mean bHIT rating scores (Pearson r = .52), but there was no correlation between video interpretation and mean bHIT rating scores (Pearson r = .11).
As compared to the 2017 cohort, the current cohort performed worse on the bHIT (3.14 average score vs 3.64, P < .01) and fewer current residents were able to perform the bHIT independently (6.7% vs 22%—rater DT, 20% vs 39%—rater DL). Residents in the current cohort also performed worse on the MCQ quiz (58.7% vs 70.9%, P = .038), although they performed similarly on the video interpretation (70% vs 65%, P = .198).
Discussion
The results confirmed our hypothesis that despite curricular changes with the implementation of CBD, the majority of fourth year Canadian Otolaryngology residents are not competent in performing the bHIT as judged by 2 expert examiners using the modified OCAT. Regarding alternative assessment methods, in this cohort the resident’s score on the MCQ correlated with bHIT ability, while accuracy at interpreting video recordings of bHITs did not correlate well. This is the opposite result from the prior cohort and implies that neither measure could be used as a substitute for direct observation of bHIT skill. In both cohorts, the amplitude (eg, too large), unpredictability (eg, too predictable), and positioning (eg, grasping the mandible, arms completely extended) during the bHIT were the areas where residents had the most difficulty.
Interrater reliability was only fair when making “competent or not” judgments based upon having all component OCAT scores of “4” or “5,” and even treating the score as a continuous variable resulted only in moderate inter-rater reliability. This reliability was slightly better than in the prior study, but concerns remain about untrained evaluators’ ability to accurately assess competency for this bedside skill. Doing rater training, using more raters, making a single “competent or not” judgment (rather than grading multiple components of the bHIT), and making more measurements of a single resident over time are possible ways of improving reliability of competency judgments. Indeed, multiple assessments made over time are a key aspect of competency-based medical education paradigms.
This study has several limitations. There were only 2 examiners from a single center, limiting judgments of true inter-rater reliability of the modified OCAT. Only Otolaryngology residents were evaluated in this study because of convenience sampling at an annual review course. Otolaryngology physicians are expected to be experts in the management of patients with dizziness, but other specialties also frequently evaluate dizzy patients (eg, neurology, emergency medicine). Due to feasibility considerations, it was impractical to enroll a similarly-broad cohort of emergency medicine and neurology residents. Accordingly, the results of this study may not generalize to the competence of all residents in Canada treating dizziness. Indeed, neurology residency training programs in Canada do have an EPA that mandates direct observation of a HINTS examination (which includes the bHIT), and this may translate into improved competence with this skill.
Furthermore, as a result of the restructuring occurring with CBD, current Otolaryngology residents challenge their nationwide licensing examination in the fourth year of residency and they were enrolled for this study at that point in their training (vs in the fifth year for the prior cohort). In addition, the COVID-19 pandemic altered the residency experience for this cohort of residents, often replacing typical clinical exposure with either reduced in-person or virtual clinics for an extended period. These factors may contribute to the poorer performance observed in the current cohort and may mask any gains that the CBD curriculum may have offered. However, residents were expected to have completed the EPA concerning dizziness by the time the course occurred, implying their training was complete in evaluating patients with dizziness even at this earlier stage.
Despite these limitations, the findings have implications for the ongoing refinement of competency-based curricula in Canada. The variable and mostly poor correlation of bHIT competence with both paper-based assessment of bHIT knowledge and video interpretation of bHIT examinations argues toward including a directly-observed clinic-based assessment of this skill as a specific component of EPAs concerning the evaluation of dizzy patients. More broadly, this study provides evidence that technical skills felt to be critical to competence in a specialty should be evaluated directly, rather than assumed to be present after indirect assessment or through assessment of related skills. It may be worthwhile for specialty committees to examine these critical skills and consider including direct assessment of them in their curricula.
Future research directions could include studying residents in other subspecialties. Specifically, comparing a cohort of neurology residents, who have direct assessment of bHIT mandated in their curricula, may prove informative. Including more examiners at different centers with different levels of expertise may help to broaden the applicability of the findings. Development and validation of other assessment methods or rater training programs that improve inter-rater reliability are of paramount importance if decisions are to be made regarding competency. Finally, developing a teaching module that could be integrated in both residency programs and continuing education in practice may help dissemination of the bHIT as a critical clinical skill.
Conclusions
Fourth-year Otolaryngology residents in Canada are not competent in performing the bHIT. Competency by design has not resulted in improved performance, though this finding may be confounded by both earlier testing of competence and effects of the COVID-19 pandemic on training. These findings have implications for refining competency-based curricula in Canada in the evaluation of critical physical exam skills.
Footnotes
Appendix A
Appendix B
Acknowledgements
The authors would like to thank Dr. Rob Hart for allowing us to perform the study at the review course and Jennifer Coish for her administrative support.
Author Contributions
D.L. contributed to protocol design, recruitment and testing of subjects, data analysis, and manuscript development and editing. D.T. contributed to protocol design, recruitment, and testing of subjects, manuscript development and editing. R.R. contributed to protocol design, recruitment, test of subjects, manuscript development, and editing.
Availability of Data and Material
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Consent for Publication
Not applicable.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethics Approval and Consent to Participate
Ethics approval for this study was obtained from the Ottawa Hospital Research Institute Research Ethics Board (Protocol: 20230029-01H). Informed consent was obtained from each participant.
