Abstract
BACKGROUND:
International consensus on best practices for calculating and reporting vestibular function is lacking. Quantitative vestibulo-ocular reflex (VOR) gain using a video head impulse test (HIT) device can be calculated by various methods.
OBJECTIVE:
To compare different gain calculation methods and to analyze interactions between artifacts and calculation methods.
METHODS:
We analyzed 1300 horizontal HIT traces from 26 patients with acute vestibular syndrome and calculated the ratio between eye and head velocity at specific time points (40 ms, 60 ms) after HIT onset (‘velocity gain’), ratio of velocity slopes (‘regression gain’), and ratio of area under the curves after de-saccading (‘position gain’).
RESULTS:
There was no mean difference between gain at 60 ms and position gain, both showing a significant correlation (r2 = 0.77,
CONCLUSIONS:
There is no clear superiority of a single gain calculation method for video HIT testing. Artifacts cause small but significant reductions of measured VOR gains in HITs with higher, normal-range gains, regardless of calculation method. Artifacts in abnormal HITs with low gain increased measurement noise. A larger number of HITs should be performed to confirm abnormal results, regardless of calculation method.
Keywords
Introduction
There are different calculation methods of VOR gain measured quantitatively by a video head impulse test (vHIT) and there is no international consensus on how to best report vestibular function. The head impulse test, first described by Halmagyi and Curthoys [9], is a critical method to evaluate the vestibulo-ocular reflex (VOR) in patients with dizziness and balance disorders, but can be challenging to interpret clinically [11]. Thanks to new light-weight mobile and non-invasive video-oculography devices it is now possible to quantitatively measure VOR and to identify corrective saccades [15].
vHIT devices consist of a light-weight goggles frame, a mounted infrared high-speed camera for eye-tracking, and an inertial sensor for head acceleration measurements. These devices test VOR function of horizontal and vertical semicircular canals. The VOR gain is predominately mediated by the function of the ipsilateral canal with some influence from the contralateral canal. The accuracy of the measurement with such non-invasive devices has been validated against recordings with magnetic scleral search coils [14, 26], a laboratory gold standard for ocular motor recordings [23].
VOR gain measured with scleral magnetic search coils is calculated as the ratio of eye to head velocity [2, 12]. Gain can be measured over a time range/period or at specific time points after initiation of head movement such as 40 ms [8], 60 ms, or at the time of peak head velocity [4], here called ‘velocity gain.’ Another possibility to evaluate gain is a linear regression of eye and head velocity [3], which compares the slopes between head and eye velocity around peak head acceleration, here called ‘regression gain.’ Another method is calculating gain as the ratio of the areas under the curve (AUC) after de-saccading eye and head velocity curves [15], which compares eye and head position, here called ‘position gain.’
On the other hand, there are disruptive eye movements and measurement artifacts observed while testing dizzy patients [16]. Frequent artifacts can interfere with VOR gain measurements [16], causing different calculation methods to result in different gain values. This issue is important, as VOR gain is also now starting to be used as a triage tool for differentiating between peripheral and central causes of acute dizziness or vertigo [16, 19].
The aim of this study was to compare different gain calculation methods, to analyze the impact of artifacts on VOR gain, and to test whether specific gain calculation algorithms were\enlargethispage 3pt particularly robust despite the presence of these artifacts.
Material and methods
Test subjects
The data set from the previous publication [17] was used for all vHIT gain calculations in this manuscript. The vHIT data were collected as part of a prospective cross-sectional multicenter study (OSF Saint Francis Medical Center, Peoria, Ill., USA, and Johns Hopkins Hospital, Baltimore, MD, USA) between August 2011 and December 2012. Twenty-six patients with an acute vestibular syndrome and a mix of underlying disease pathologies [25] were examined. Patient characteristics and diagnoses are summarized in previous publications reporting diagnostic accuracy [17] and type of artifacts [16] in the same population.
Video head impulse test (vHIT) assessment

Figure 1 depicts one clean (Fig. 1A–C, no artifacts) and unclean (Fig. 1D–E, trace with artifacts) vHIT example (eye- and head velocity profile) from one patient with PICA stroke. The unclean vHIT shows trace oscillation artifacts due to intermittent pupil tracking loss. VOR gain has been calculated by (A and D) taking different time points at 40 ms and at 60 ms after HIT onset (‘velocity gain’), (B and E) applying a linear regression (‘regression gain’), or (C and F) comparing the area under the curve for eye (black) and head (grey) velocity (‘position gain’). Note that VOR gain from the same patient and the same vHIT trace resulted in different VOR gains for each calculation method ranging from 0.73–1.1 (traces with no artifact) and from 0.39–1.13 if artifacts changed the morphology of the bell-shaped slow phase curve.
A dataset with 1300 HIT traces in total was used as the basis for our study. While performing the horizontal head impulse test toward each ear, eye and head movements were recorded at the bedside with a vHIT device (ICS Impulse; formerly GN Otometrics, Taastrup, Denmark; now Otometrics division of Natus Medical Inc., Pleasanton, CA, USA). vHIT recordings can be categorized as reflecting either normal or abnormal vestibular function on the basis of VOR gain. A VOR gain close to 1.0 usually reflects a normal value, where as a gain <0.8 is considered abnormal. A gain cutoff of ∼0.7 optimally distinguishes between peripheral and central pathology in patients with the acute vestibular syndrome [17]. Therefore, we classified HIT exams from patients with vestibular neuritis or vestibular strokes into groups with lower (<0.7) or higher gain values (>0.7), since artifacts might have a differential impact on low versus high gain values.
vHITs performed on a diseased ear in patients with clinically-diagnosed vestibular neuritis (i.e., peripheral cases) showed lower gains (i.e., abnormal) on the affected side. The contralateral, clinically-unaffected side showed higher gains within the normal range (though generally below 1.0, presumably because of loss of the contribution from the opposing, affected canal). By contrast, vHITs derived from patients with PICA strokes (i.e., central cases) had higher gains (>0.7, and generally within the normal range) in both ears. PICA stroke was diagnosed based on history, clinical features and diffusion weighted imaging [16]. AICA strokes were excluded because patients might have normal or abnormal HITs with variable VOR gains.
Data from 23 patients, 46 total ears, and 1070 HITs were available for analysis. We classified both clean data (without artifacts, Fig. 1A–C)) and artifactual data (Fig. 1D–F) based on selection criteria reported in a previous publication [16]. Clean records were available for 22 patients, 42 ears, and 539 HITs. We distinguished 6 types of artifacts according to a previously published classification [16]: 1) covert saccades, 2) blinks, 3) trace oscillations, 4) phase shift 5) several peaks, 6) high gain and 7) all artifacts combined (all artifacts pulled together). Traces with infrequent, non-disruptive artifacts (occurring after the head movement) or overt saccades were excluded from the analysis of artifactual data.
Mean VOR gain by calculation method, stratified by higher vs. lower gain HITs and pairwise comparisons by calculation method. Results reflect only “clean” traces, after all artifacts have been removed
Mean VOR gain by calculation method, stratified by higher vs. lower gain HITs and pairwise comparisons by calculation method. Results reflect only “clean” traces, after all artifacts have been removed
Raw vHIT data were extracted from the vHIT device. We applied four different gain calculation methods on the same data using a customized Matlab script (Matlab R2014b, Mathworks, Natick, Mass., USA): (#1a) the ratio between eye and head velocity at 40 ms (Fig. 1A); (#1b) the ratio between eye and head velocity at 60 ms after onset of head movement (Fig. 1A); (#2) the ratio of velocity slopes using regression around peak acceleration (±15 ms) (Fig. 1B) and (#3) the ratio of area under the curves after de-saccading the whole slow phase VOR trace (Fig. 1C).
Calculations #1a, #1b reflect a ratio between velocities at given time points; calculation #2 compares regression slopes between eye- and head velocity around peak head acceleration [3]; and calculation #3 analyzes eye- and head positions before and after the head movement [14]. For calculation #1, the latency of 40 or 60 ms after head movement onset (head velocity exceeding 20deg/s) was used. Calculation #2 is currently not used by commercial devices. For calculation #3 we used the built-in calculation algorithm from Otometrics (Otosuite® software, v1.2.18) since they use a proprietary method for de-saccading traces and for determining the area under the curve. For simplicity, we call these gain calculation methods here ‘velocity gain’ (specifying either 40 ms [#1a] or 60 ms [#1b]); ‘regression gain’ (#2); and ‘position gain’ (#3) (Fig. 1).

VOR gain of HITs with higher gain and lower (abnormal) gain for clean recordings for all four methods. Different letters represent significant differences among the methods inside the normal or abnormal HITs (For variables with the same letter, the difference between these variables is not statistically significant. Likewise, for variables with a different letter, the difference is statistically significant). Means and confidence intervals are model based, due to the nested data.
We recorded 539 HITs from 42 ears from 26 patients. HITs were thus nested within ears (there are several HITs per ear), and ears were nested within patients (data from two ears per patient). The effective sample size were the 26 patients. Due to the nested data, all analyses were fitted with mixed effects models. First, only clean recordings (i.e. without artifacts) were analyzed. VOR gain was fit using calculation method, HIT classification (low/high gain), and the interaction between calculation method and HIT classification as fixed effects, with random intercepts for test nested in ear nested in patient. Position gain was correlated with velocity gain at 60 ms for clean records only, with random intercepts for ear nested in patient. Finally, gain was fit using calculation method, pathology, and artifact, with all possible interactions among the three as fixed effects, and random intercepts for test nested in ear nested in patient. Each artifact was included in a separate model. Multiple comparisons were performed first between the methods for HITs with artifacts, and second between clean and artifact-laden HITs. Multiple comparisons were performed for each artifact and HIT category (low gain/high gain) separately. All analyses were performed in R [22], with the packages nlme [21] for mixed effects models, and emmeans [13] for multiple comparisons.
Linear mixed effects model of VOR gain in dependence of calculation method, HIT “normality”1, and the interaction between method and HIT classification as fixed effect, and random intercepts for test nested in ear nested in patient. Model was fit on “clean” traces only, after all artifacts have been removed
1High vs. low-gain HIT (threshold cutoff value 0.7). 2Degrees of freedom in numerator or 3denominator.
VOR gain calculations after excluding artifacts (Clean HITs Only)

Correlation between position gain and velocity gain at 60 ms. Grey dots represent the individual HITs, and black dots are mean (±standard error) values per patient. The regression line is derived from the mixed effects model (+/–95% confidence intervals), and takes into account the nested structure of HITs within ears within patients. Note that the regression line shows an offset with an intercept of 0.46. Imperfect calculation algorithms including imperfect de-saccading, removal of negative gain values by the device or data lowpass filtering might lead to a skewed regression line.
Mean VOR gains for clean/filtered higher-gain HITs were all within normal limits, regardless of calculation method, but there were small, statistically significant differences across most calculation methods (range 0.89 to 0.97) (Table 1, Fig. 2). Importantly, these differences were not clinically significant with respect to classifying VOR gain “normality” (i.e., mean results for a given ear, regardless of calculation method, did not cross either the 0.8 [normal vs. abnormal] or 0.7 [central vs. peripheral] thresholds described in the Methods section) (Fig. 2). Similarly, HITs with higher gains were significantly different from abnormal HITs with lower gains, regardless of the calculation method (Table 1).
Mean VOR gains for clean/filtered lower-gain (abnormal) HITs were all below normal limits, and they also showed statistically significant differences across most calculation methods (range 0.45 to 0.63, Table 1, Fig. 2). In addition, there was a strong positive relationship between ‘velocity gain at 60 ms’ and ‘position gain’ for clean HITs (both high and low gain HITs) (conditional r2 = 0.77,
Interaction between gain calculation method and artifact type, stratified by high vs. low gain HITs. Shown are differences in mean gain results between unclean (with artifacts) and clean (without artifacts) HITs
Significance levels: ’***’<0.001; ’**’<0.01; ’*’<0.05. 1covert saccades.
Interaction between gain calculation method and artifact type, stratified by high vs. low gain HITs. Shown are differences in mean gain results between unclean (with artifacts) and clean (without artifacts) HITs

VOR gain for clean (white bars) and artifactual (grey bars) higher gain and lower gainHITs for different artifacts and the four methods. Different letters represent significant differences among the methods for gains with artifacts (see legend Fig. 2). Stars denote significant differences between HITs with and without artifacts within a method: *
Figure 1D–F shows one example with trace oscillation artifacts (pupil tracking loss) resulting in variable VOR gain values for each calculation method. Gain measures for unclean HITs were more likely to vary by calculation method when gain was high, in the normal range, rather than low, in the abnormal range (Table 3, Fig. 4). HITs with higher gains and artifacts were statistically significantly lower than HITs without artifacts (–0.06 to –0.11, Table 3), but values still remained within normal limits (Fig. 4, Table S1). Abnormal HIT mean gains with artifacts were also generally lower than HITs without artifacts (–0.07 to +0.01, Table 3) but not statistically significantly so when averaged across artifact types (Table 3, “all artifacts” column).
The influence of each artifact type on mean VOR gain is shown in Tables 3, 4, and 5 and mean VOR gains with confidence intervals are shown in Fig. 4.
Discussion
The application of different VOR gain calculation methods for the same clean HITs (i.e., with artifacts removed) resulted in small but significant mean gain differences; nevertheless, gains remained within the expected normal or abnormal range. The most well studied VOR gain calculation methods (i.e., ‘velocity gain at 60 ms’ and ‘position gain’) showed similar mean VOR gain values and a strong, statistically-significant correlation in clean vHIT recordings.
Artifacts affected high gains more than low gains, independent of the calculation method. When averaged across artifact types, artifacts reduced higher-gain (normal) HITs by a small but statistically-significant margin (range –0.11 to –0.06) that did not push mean gains below the normal range. There was, however, clinically-meaningful heterogeneity in impact by artifact type (large gain reductions by suspected blink artifacts and large gain increases by suspected goggles slippage), and some interaction between artifact type and calculation method. By contrast, the overall impact of artifacts on abnormal, lower-gain HITs was smaller (range –0.07 to +0.01) and not statistically significant. There was still some interaction between artifact type and calculation method. However, in neither case was there a clinically meaningful interaction between the impact of all artifacts combined and calculation method.
Overall, these results suggest that the impact of calculation method on VOR gain measures is small and probably not clinically impactful, especially relative to the impact of artifacts. The interaction between calculation method and artifact type is inconsistent, with some calculation methods susceptible to specific artifact types and robust to other artifact types.
Scleral search coils have been used for gold standard VOR measurements using instantaneous gain calculations as a continuous function of time [3, 12]. This analytic approach allows a more complete representation of VOR gain during the entire time span of a head impulse compared to data read at a given time point. In some studies, the highest gain value calculated around peak head velocity was used to represent maximal VOR gain changes after ototoxic intratympanic injections [20]. Even in search coil studies, depending on the chosen gain calculation time point, the result might under- or overestimate ‘true’ VOR gain. In general, non-vestibular reflexes have longer latencies [10]. The calculation of gain at an early time point after HIT onset (e.g. 40 ms or 60 ms) or early time period (0–100 ms, increasing head velocity before peak head velocity) has the advantage of preventing inputs of non-vestibular origin such as optokinetic, smooth pursuit, cervico-ocular reflexes etc. [7]. Some authors choose even a shorter observation period or earlier calculation time points to minimize the influence of catch-up saccades on gain calculation [1, 27].
On the other hand, the reflection of gain only at a given time point [8] is said to be more prone to specific artifacts such as blinks, phase shifts or trace oscillations due to random recording noise or disturbances. Our study differed from prior literature with blink artifacts artifactually reducing higher HIT gains substantially across all calculation methods and producing dissimilar impacts for the two different instantaneous velocity gain calculation methods (40 ms vs. 60 ms).
Regression gain [6, 12] includes a larger time span (during increasing head velocity) and thus contains more information about the velocity profile. This method, however, might be more prone to artifacts due to a longer observation time from HIT start until peak head velocity and due to the chosen time interval. In our study, we found an interaction between this calculation method and the phase shift artifact for higher gain HITs; however, this effect was small and insignificant for abnormal HITs and it is also due to the selected time interval further away from head movement onset.
Although potentially susceptible to influence by delayed, non-vestibular inputs to eye movements, the calculation method of de-saccaded position gain [14] might correct for artifacts such as oscillations or phase shifts between eye and head velocity traces (goggle slippage). One would expect that calculating the ‘area under the curve’ would compensate for any trace oscillations or phase shifts, however, we did not find this calculation method to be more robust to such artifacts. The approach of de-saccading traces might be more susceptible to intrusive covert saccades since the de-saccading algorithms might not identify all kind of fast phases during a HIT, but we found no statistically-significant effects for any calculation method.
Techniques of gain calculation and software-based correction of artifacts continue to advance. One approach to improve the accuracy of position gain calculation was proposed by Shen et al. [24] introducing the SHIMPS paradigm: Disruptive covert saccades were either absent or small and the direction of compensatory eye movements were opposite to the deficient slow phase VOR and therefore less interfering with de-saccading algorithms. Cleworth et al. [5] compared different analyzing techniques favoring two calculation techniques such as area under the curve during increasing head velocity or gain over 50–70 ms (around peak head acceleration or peak head velocity) post onset in order to decrease variances and inaccuracy. Software solutions might correct for specific artifacts such as goggle slippage [12] and, thus, achieve a more reliable VOR gain value. In addition, software algorithms have the ability to analyze data during data collection and discard invalid impulses which are not meeting predefined quality standards. Such solutions, paired with repeated measures, could theoretically improve both the accuracy and reliability of vHIT gain estimates.
On average, under real-world conditions, artifacts bias vHIT-based higher VOR gains downward slightly but do not impact diagnostic classification as normal vs. abnormal. Although different gain calculation methods can be differentially impacted by specific artifacts, on average, the impact of calculation method is small compared to the impact of artifacts themselves. Overall, gain impacts of the calculation method tend to average out over multiple artifact types, creating a picture of imprecision due to random variation, rather than systematic bias. We recommend repeated measures of >10–20 HITs per ear, which tends to neutralize artifact effects, likely resulting in more accurate mean gains. Although we found no apparently measurement superiority of a given calculation method, efforts should be taken to standardize measurements for research and clinical care. Groups such as the B
Limitations
Our study involved many head impulses, but from only 26 subjects with a specific clinical presentation (acute vestibular syndrome), so results may not generalize to other patient populations. Although we assessed the most commonly used calculation methods (including those employed by US FDA-approved, production-line vHIT systems currently being sold in the US and Europe), other methods were not tested in our study (e.g., point by point gain average over different time windows using whole or partial velocity profile, ascending or descending window, ±12 ms around peak head velocity).
Although there was a strong correlation between velocity and position gain, with found a skewed regression line (Fig. 3): At lower gains, calculated position gain was rather too high or – vice versa – velocity gain could have been too low. At higher gains velocity gain was higher than position gain. This phenomenon might be due to imperfect gain calculations including imperfect de-saccading algorithms, removal of negative gain values by the device or data low pass filtering. We had, however, no gold standard (scleral search coil) measures to determine the “true” gain for any of our traces.We used raw data from only a single brand of vHIT device in order to obtain uniform data, potentially limiting the generalizability of our results; however, lateral vHIT gains computed with the same technique appear to be similar across vHIT systems [5]. Finally, the data were limited to the lateral impulses only. HITs in the RALP or LARP plane seem to be more prone to artifacts and these artifacts might be more difficult to classify.
Conclusions
Our results showed no clear superiority of a particular gain calculation method for vHIT testing, with the results being most similar in the middle range (0.5–0.8) in which most VOR gain values fall. When averaged across multiple trials and patients, artifacts do influence gain by causing small but significant reductions in HITs with normal-range gains, whereas those with lower (i.e., abnormal) HIT gains remain unchanged. Artifacts increase measurement noise, but different types of artifacts influence calculation methods differently, but they still tend to average out. This suggests a larger number of HITs should be performed to confirm vHIT results, regardless of calculation method. Practically, we recommend an international consensus for vHIT gain measurement and reporting.
Potential conflicts of interest
None
Footnotes
Acknowledgments
This study was supported by the Swiss National Science Foundation PBBEP2 1365-73 and #320030_173081 (GM) and the Inselspital Bern (Insel-grant, #2602). Dr. David E. Newman-Toker’s effort was partially supported by a grant from the National Institutes of Health, National Institute of Deafness and Communication Disorders (U01DC013778). GN Otometrics and Interacoustics loaned VOG equipment for research.
