Abstract
Background
and importance Each year, 1.4 million patients attend a UK ED with a head injury. Mild traumatic brain injury affects up to 300/100 000 admitted patients/year and a greater number of non-admitted patients. Identifying those patients with a head injury that have concussion, and of those, which will have a prolonged recovery, is critical for discharge planning. The Vestibular/Ocular Motor Screening test (VOMS) has been reported as a useful “sideline tool” to evaluate for sports-related concussion (SRC). VOMS has been assessed for utility primarily for predicting in head-injured, which cases will have concussion, and secondarily in predicting in concussed patients, which will have prolonged recovery. Originally described in 2014, VOMS has not been subject to systematic review or meta-analysis, with regard to its predictive performance for concussion.
Objective
To assess the state of VOMS evidence for dichotomously classifying concussion status in patients with non-severe head injury
Design
Systematic review.
Setting and participants
Studies comprising the review enrolled ambulatory head-injured adults and children, usually from sports-related settings, in Europe or the USA.
Exposure
VOMS.
Outcome measures
Presence of concussion, presence of prolonged recovery in concussed patients
Main results
The review identified 17 studies, characterized by a wide variety of specific approaches to administering and scoring VOMS. While VOMS showed promise as a screening tool for concussion, marked study heterogeneity precluded generation of a pooled effect estimate for VOMS performance.
Conclusion
VOMS is potentially useful as a concussion screening tool. Available evidence from the SRC arena suggests sensitivity ranging from 58–96%, with specificity 46−92%. Directions for future VOMS research should include evaluation of standardized administration and scoring, potentially of a simpler VOMS (with fewer components), in a general head-injured population. Further analysis of precisely defined VOMS application may be useful to determine the proper place of VOMS screening for the head-injured.
Keywords
Introduction
Non-severe head injury can be defined in practical terms as denoting cases who are conversant and ambulatory, and who either have negative brain imaging or who have no indication for such imaging. Such “minor” head injury cases can still have mild traumatic brain injury (TBI) and present with clinically important problems, relating to post-injury symptoms that impair return to pre-injury functional status.
Early identification of which head-injured cases have “concussion” (i.e., symptoms lasting beyond the initial hours after injury) is important for at least two reasons. First, in the immediate post-injury time frame there is often the need to decide on clearance of the head-injured case for return to work or sports. Second, early identification of cases with concussion can inform decision-making as to post-injury follow-up (e.g., in specialized head-injury clinics).
In the athletic arena, early identification of sports-related concussion (SRC) is a necessary component of post-injury screening. Importance of SRC screening is such that specialized screening tools have been developed for on-site (at sporting events) evaluation. One such tool is the Vestibular/Ocular Motor Screening (VOMS) test.
Since its 2014 description 1 by a group at the University of Pittsburgh Medical Center (UPMC), VOMS has been the subject of many reports. VOMS is a quick assessment of vestibular-ocular function intended as an adjunct for assessing ambulatory patients with mild or minor head injury. VOMS’ potential utility lies in its easily conducted and multifaceted evaluation of the ocular-vestibular system, which is particularly affected in concussion. VOMS takes only 5–10 min and does not require advanced technology.1,2
Whether utilized on the athletic field sidelines, the sports medicine clinic, or the Emergency Department (ED), VOMS has been the subject of multiple investigations assessing its ability to correctly classify the head-injured into concussed vs. non-concussed status.1–17 Despite the growing VOMS evidence base, and despite the existence of systematic reviews of other sideline SRC evaluation tools (e.g., King-Devick), 18 a 2022 literature search identified no systematic review with or without meta-analysis (MA) addressing VOMS diagnostic performance.
If VOMS is useful for rapid, easily executed evaluation of patients with possible concussion, its utility may extend across myriad settings. Due to the high importance of non-severe head injury, the obvious utility of an easily applied screen for concussion, and the growing size of the VOMS evidence base, the current review was executed to assess the state of the VOMS predictive evidence base. The review's aim was to assess VOMS’ diagnostic test performance for concussion identification.
Methods
Evidence sourcing: identifying studies to screen for inclusion in the review
To identify evidence relevant to this review, we first executed a PubMed search using VOMS (abbreviation or terms) as the initial search term. Pubmed was searched (on 1 March 2022) using the strategy: (((VOMS) OR (vestibular ocular motor screen)) OR (vestibular ocular motor screening)) OR (vestibular ocular motor score).
The strategy noted above yielded 380 results. Extended search strategies included review of identified articles’ references, gray-literature search (Google Scholar), and email contact with authors of VOMS studies to determine whether there were other papers still in the editorial process.
Eligibility
All article abstracts were reviewed in order to ascertain potential eligibility for inclusion. In cases where review of the abstract was insufficient to clearly define study eligibility for this review, the full-text articles were obtained and eligibility for inclusion was determined. Abstract reviews were executed independently by two authors (CT and ST), with disagreements resolved by discussion. There was no need to recourse to the contingency plan (in case of inability to reach consensus) of having the study's third author adjudicate eligibility.
Inclusion criteria: articles reporting assessment of VOMS’ diagnostic test accuracy (DTA), specifically those that evaluated VOMS as an identifier of concussion (dichotomously categorized as present or absent), or those using VOMS to predict post-concussion recovery times (defined either dichotomously as “normal or prolonged”, or defined continuously as “number of days to return to pre-injury status”).
Exclusion criteria: articles were excluded if they were not original research that reported predictive value specifically associated with VOMS. Examples of excluded articles were those that only evaluated VOMS as a joint predictor (i.e., results were not reported for VOMS-only performance but rather for joint performance of VOMS plus other tools), or those that included only non-injured subjects (to assess VOMS’ internal validity).
Overview of studies included in this review
Table 1 provides an overview of studies1–17 included in this review. The table lists study design (categorized as per Mathes and Pieper's suggested approach for systematic reviews 19 ) as well as brief highlights of methods and/or findings. For brevity, studies in this review are identified by the first-author's last name, with addition of the last two digits of publication year for the two pairs of studies with repetition of a first-author. Thus Elbin's 2018 study 5 and 2021 10 study are denoted in this review as Elbin18 and Elbin21, and the 2021 and 2022 studies by Ferris are denoted in this review as Ferris21 11 and Ferris22 17 .
Studies (n = 17) covered in this review.
Abbreviations: VOMS = Vestibular/Ocular Motor Screening, SRC = sport-related concussion
SCAT3 = Sport Concussion Assessment Tool-3
ImPACT = Immediate Post-concussion Assessment and Cognitive Testing
*When age range was not reported, age is presented as mean ± standard deviation
VOMS was initially described for use in athletes with SRC, and the preponderance of the literature continues to emphasize VOMS use in injured athletes. For this review, all studies but two (Eagle 7 and Babicz 16 ) included only the SRC mechanism of injury.
Defining the reference standard: determination of whether concussion is present
The goal of this review was to assess VOMS’ predictive classification performance for concussion. Concussion in the VOMS literature is primarily SRC, which is defined per international consensus 20 as a head injury resulting in signs and symptoms.
During study planning, a secondary goal was to assess whether VOMS was useful in predicting delayed recovery from concussion. Delayed recovery was defined using a 30-day cutoff (which was employed by the four VOMS studies assessing this outcome).4,7,12,14 As the review proceeded, it quickly became clear that there were insufficient data to enable substantive assessment of VOMS as a predictor of prolonged recovery (either as dichotomous or continuous endpoint). The limited findings on VOMS prediction of prolonged recovery are therefore restricted to presentation in a supplement to the review (see Supplement 5).
In all studies included in this review, the reference standard clinical diagnosis of concussion was made at the time of patient care or evaluation as part of a study protocol. The diagnosis was not assigned post hoc (e.g., by chart review) in any of the included studies. Studies assessed in this review did not provide comprehensive, explicit definition of symptoms used to define concussion presence.
Defining the index test: VOMS constituents and scoring
The original VOMS paper by Mucha 1 includes details on VOMS testing constituents and administration. A brief discussion of the VOMS testing process is included in this portion of the review, since nomenclature is key to interpreting the evidence. The review's supplemental material (Supplement 1) provides more comprehensive information that helps inform decisions as to the different studies’ application and scoring of VOMS.
Table 2 lists VOMS components, which are usually referred to as domains (one study 17 uses the term “tasks”). The VOMS domains are occasionally categorized into two groups: ocular-motor or vestibular. The ocular-motor domains include the SmoothPurs, HorSacc, and VertSacc components of VOMS; the VOMS domains comprising the vestibular group include HorVOR, VertVOR, and VMS. 12
Vestibular/ocular motor screening (VOMS) domains.
As seen in the table, an all-domain VOMS performed during evaluation of a head-injured patient consists of up to nine scores: one representing baseline (pre-provocation) symptoms, seven representing post-provocation symptoms (for each of seven domains), and one representing a distance measurement. The seven post-provocation domains are often referred to as “symptom domains.” Domains are scored, and reported either as a “total” score, or a “change” score. The employment of total or change scoring is a source of known heterogeneity and potential ambiguity. These issues, which affect interpretation of the VOMS literature, are explored in detail in Supplement 1.
For purposes of brevity, the remainder of this review's text, tables, and figures employ abbreviated terms for the domains as listed in Table 2.
The domain with simplest scoring is NPCcm. As the examiner's finger approaches the subject's nose, NPCcm is scored as the distance at which diplopia is reported or exotropia is observed.
For domains other than NPCcm, scores are generated by assessments and 0–10 ratings of each of four symptoms: headache, dizziness, nausea, and fogginess. Studies included in this review did not provide details on definitions of any of the symptom terms. The pre-provocation score is generated at the commencement of VOMS testing (before any provocative maneuvers) by summing the test subject's 0–10 ratings for each of the four symptoms; potential pre-provocation score thus ranges 0–40.
For the post-provocation symptom domains, the four symptoms are assessed and the 0–40 score tallied. For the “total” score approach, this tally (the summed symptom score) is the domain score. For the “change” score approach, the pre-provocation score is subtracted from the post-provocation tally to yield the domain score. The pre-provocation score thus does not directly contribute to the VOMS score but it is indirectly contributory when change scores are reported.
While some studies use disparate approaches (see Supplement 1), the most commonly applied definition of VOMS positivity uses cutoffs as initially described by Mucha. 1 VOMS is dichotomously defined as positive or negative, with a positive test defined as present if either NPCcm exceeds 5 or if any symptom domain score exceeds 1 (regardless as to whether total or change scoring methods are used).
Evaluation of study methodology
Methodologies were evaluated using Quality Assessment of Diagnostic Accuracy Studies (QUADAS) and Preferred Reporting Items for a Systematic Review and MA of DTA Studies (PRISMA-DTA).21–23 The QUADAS and PRISMA-DTA parameters, evaluated in parallel by all this review's authors for all studies, were used to guide study evaluation rather than determine study inclusion. No otherwise-eligible studies were discarded from the review based on application of QUADAS or PRISMA-DTA criteria.
For all DTA MA, the Cochrane group recommends evaluation of selected items; Table 3 lists these items (all of which were included in the evaluation of studies reported in this review) and includes the type of bias addressed by each.
Abbreviations: VOMS = Vestibular/Ocular Motor Screening
In addition to the issues applicable to all diagnostic testing study reviews, this project also incorporated additional questions suggested by systematic review experts.21,23 These questions are outlined in Table 4.
Abbreviations: VOMS = Vestibular/Ocular Motor Screening
With up to 20 study issues and 17 studies, extensive table summaries of classification of every study's score on every issue were impractical. Fortunately, issues pertinent to methodology were nearly all applicable to large numbers of studies in the review set. This review was thus well-suited for a narrative approach to summarizing quality. 21
Statistical analysis
Analysis and plotting were executed with Stata 17MP (StataCorp, College Station, TX). For calculations and plotting, Stata's midas 24 and metandi25,26 modules were used where possible (a minimum of four studies are required for metandi). Where post hoc statistical calculations were performed for purposes of this review, significance was set at α of 0.05. Confidence intervals (CIs) were calculated using binomial exact methods at the 95% level; if point estimates were either 0% or 100% then one-sided 97.5% CIs were calculated.
Descriptive statistics in this review are reported using the same central tendency measures found in the individual studies’ results sections. This review follows the convention of most VOMS studies in referring to (integer) VOMS scores as continuous, and in descriptively reporting VOMS scores using mean standard deviation (SD). Further information on assessment and reporting for descriptive statistics is found in Supplement 2.
The review incorporates terms that are standard for describing DTA. Diagnostic performance characteristics are calculated from data on true positive (TP), false positive (FP), true negative (TN), and false negative (FN). These data, arranged into a classic 2 × 2 table, enable calculation of a breadth of performance measures. The performance measures reported in this review's main results are sensitivity, specificity, and area under the receiver operator characteristic curve (AUROC). Supplement 2 provides details on these calculations, as well as calculations on other occasionally useful performance measures including positive and negative likelihood ratios (LR + and LR-), diagnostic odds ratio (Dx OR), Youden's J, and diagnostic accuracy.
General approaches for MA followed recommendations outlined by Borenstein. 27 The first step in executing MA for the DTA findings, was preparation of paired forest plots (sensitivity and specificity side by side). 22 Due to marked heterogeneity in studies’ definition and scoring of VOMS, paired forest plots were executed without generation of a pooled estimate of summary statistic. The plots did include 95% CIs for individual study point estimates (e.g., for sensitivity or specificity).
The heterogeneity that precluded pooled DTA parameter estimation precluded many other potential MA approaches. Supplement 2 provides extended information on an exploratory technique, generation of a hierarchical summary receiver operator characteristic curve (HSROC), which was utilized for hypothesis generation only.
For the essentially descriptive reporting of continuous-variable VOMS scores, random-effects MA was performed. The fact that the VOMS distribution was not well-known led to a choice to use a standardized mean difference approach; Hedges’ g (difference in means divided by pooled SD) was chosen to minimize risk of bias given the low study n; a priori cutoffs for effect importance were set at 0.2 (small), 0.5 (medium) and 0.8 (large). 27 The Hedges g is intended to provide extended information beyond a p value. Rather than denote whether there is a difference between groups, Hedges g (which is the same as Cohen's d if two sample sizes are equal) attempts to convey the likely clinical importance (effect size) of the difference. Since the null value of Heges g is 0, when the 95% CI for g does not include 0 there is some effect present. The classical cutoffs as used in this report are approximate guides: the higher the g the more the relative effect. For those findings with a “large” Hedges g (e.g., the pre-provocation scores in Figure 1(a)), the interpretation is that the pre-provocation VOMS has a large effect in indicating presence of concussion.

(a). Effect-size estimates for concussed vs. non-concussed VOMS scoring for Pre-provocation, SmoothPurs, & HorSacc domains. (b) Effect-size estimates for concussed vs. non-concussed VOMS scoring for VertSacc, NPCsx, & HorVOR domains. (c) Effect-size estimates for concussed vs. non-concussed VOMS scoring for VertVOR, VMS, & NPCcmdomains.
Results
Constitution of set of studies comprising review's evidence base
After initial review of PubMed results, 16 articles remained and constituted the preliminary candidate set.1–16 Full text was obtained for all 16 articles, although one 14 was published only in abstract form. Extended search strategies identified an additional 2022 study 17 meeting inclusion criteria. The review therefore encompassed a final study set of 17 studies, as outlined in Table 1.
Summary of study methodology review
All studies had elements of accrual or spectrum bias. Some had examples of possible misclassification bias, particularly in the context of milder disease in which a correct concussion classification was made at the time of injury but in which symptoms had resolved by the time VOMS was performed. All studies included a risk of confirmation bias, because the concussion status of the patient would have been known by the practitioner performing the VOMS exam. There is unlikely to be significant risk of verification or incorporation bias. VOMS administration was performed in any one study by more than one practitioner. As such, it is possible (although unlikely) that examiner-dependent administration bias was an issue. Eleven of 17 studies featured an author institutional affiliation with UPMC, the institution in which VOMS was developed. Details on study methodology review are contained in Supplement 3.
VOMS scores as continuous variables: descriptive results and estimated effect sizes
Descriptive data, usually in the form of means, were reported for individual VOMS domains in 11 studies.1,2,4–6,8–10,12,13,15 Three studies10,12,15 reported both means and medians (with two of these studies12,15 reporting IQR). The reviewed studies did not consistently report “full VOMS” scoring results, but rather reported on results by domain. Supplement 4 includes figures on domains’ median VOMS scores; the supplement also reports tabulation of central tendencies in continuous-variable VOMS domain scores in concussed vs. non-concussed cases, organized by both domain and scoring method.
Figure 1 depicts effect-size estimates for continuous-variable VOMS scores for each domain, for concussed vs. non-concussed cases and reporting total and change scores where applicable (i.e., for domains other than pre-provocation and NPCcm). For all domains, the no-effect H0 (Θ = 0) was rejected at p < .01, meaning that for every domain the scores were significantly higher in concussed vs. non-concussed cases.
As shown in Figure 1, the pooled effect-size estimates for all domains except for NPCcm (effect size estimate, 0.49) met the 0.5 minimum criterion for medium-sized effect. The figure also shows that for each symptom domain, either the total or change score (or both) produced a large (i.e., > .8) effect-size estimate. For three domains (total-scored NPCsx, HorVOR, and VMS) the effect-size estimates were noteworthy in that the threshold effect size for large effect was exceeded even by the lower bound of the effect estimates’ 95% CI. Heterogeneity assessment, limited by the low number of studies, is reported in Supplement 4 I2 was below 75% for all assessments except three: SmoothPurs (total score), NPCsx (change score), and NPCcm.
Multi-domain VOMS as a dichotomous predictor of concussion
The initial study plan called for analytic focus on studies using Mucha's initially described “full VOMS” comprising eight domains (not including the pre-provocation baseline, which is not independently counted in scoring). 1 However, only two reports (Elbin18 5 and Buttner 9 ) provided complete 2 × 2 table data for full-VOMS concussion prediction. This review's main results scope was thus widened to include any study employing a multi-domain VOMS version comprising at least half of the eight (non-pre-provocation) full-VOMS domains.
For the multi-domain VOMS studies (i.e., those including at least four domains in addition to pre-provocation), Figure 2 shows paired-forest sensitivity and specificity plotting and Figure 3 shows AUROC plotting. Supporting data for these figures is provided in Supplement 4.

Sensitivity and specificity of multi-domain VOMS for concussion diagnosis.

Area under receiver operator characteristic curve (AUROC) for multi-domain VOMS concussion diagnosis.
Neither Figure 2 nor Figure 3 includes summary statistics or pooled estimates (e.g., overall estimates of sensitivity calculated by combining all study results). Reliable calculation of such pooled estimates was precluded by methodological barriers including inter-study heterogeneity in VOMS constituents, total vs. change scoring methods, and varying approaches to results reporting. Some of these methodological inter-study barriers could be mitigated with performance of hierarchical summary receiver operator characteristic (HSROC) methods. However, there were insufficient study numbers to enable optimally reliable HSROC analysis. A preliminary (hypothesis-generating) HSROC analysis for multi-domain VOMS is included in Supplement 4.
Total sample size for each study: Butner (100), Ferris21 (388), Elbin18 (63), Ferris22 (424), Elbin21 (294), Babicz (158)
Total sample size for each study: Elbin18 (63), Butner (100), Elbin21 (294), Ferris22 (424), Ferris21 (388), Kontos (570)
Two components of Figure 3 require explanation. For one report (Elbin21 10 ), the study-reported ROC area (.73) was slightly higher than the Stata-calculated ROC area (.70); the study-reported ROC area is plotted with the Stata-calculated 95% CI (thus there is asymmetry of the CI around the point estimate). For the Ferris21 11 report, the 95% CI for the ROC area is too narrow to depict (both upper and lower bounds round to the point estimate).
While sensitivity, specificity, and AUROC are among the most familiar DTA measures, the VOMS evidence base includes information on other DTA parameters. These alternative measures include two that provide equal weight to sensitivity and specificity (diagnostic accuracy and Youden's J), and three that may be useful decision-making aids for clinical application (diagnostic odds ratio and positive and negative predictive values). Information for the multi-domain VOMS performance with regard to these alternative DTA measures is provided in Supplement 4.
Use of two-domain or three-domain VOMS to predict concussion
Reporting on VOMS diagnostic performance for two- or three-domain VOMS concussion assessment occurred in just three studies,1,10,13 which all reported AUROC but which all used different VOMS domain combinations. The studies’ findings are summarized in Table 5.
Area under receiver operator characteristic curve (AUROC) for two- and three-domain VOMS assessment for concussion.
Abbreviations: AUROC = area under receiver operator characteristic (curve), VOR = vestibular ocular reflex, NPC = near point of convergence, sx = symptoms, VMS = visual motion sensitivity
Use of single-domain VOMS to predict concussion
The studies covered in this review provide substantial data for single-domain VOMS prediction of concussion, especially when taking into account the large numbers of domains and DTA measures (sensitivity, specificity, AUROC, J, LR+, etc.). In order to focus results, the main results graphs for single-domain VOMS concussion DTA depicts the most important plot of paired sensitivity and specificity (Figure 4). Supporting data for the figure's data, as well as for other single-domain DTA measures, are provided in Supplement 4.

Sensitivity and specificity of single-domain VOMS for concussion diagnosis.
Total sample size for each study: Ferris22 (424), Elbin21 (294), Elbin18 (63), Mucha (142)
Summary of findings
In the head-injured, how well does VOMS predict concussion?
The way that VOMS was administered and reported across the 17 studies is extremely varied. All individual domains except HorVOR (change score) had an at least medium effect size. Clinically the most value VOMS can deliver is to accurately rule in concussion, because clinical follow-up can then be organised. If symptoms resolve prior to follow-up, the patient can simply cancel the appointment. Consequently, specificity is valued over sensitivity. The two highest specificities for the multi-domain VOMS were both using a cut-off on a 280-point score: one using a total score, and the other a change score. Neither strategy of total nor change score appeared more reliable than the other. Having any symptom domain at all (total = 1 or change = 1) as a strategy was the next most useful strategy in ruling in concussion.
In concussed patients, how well does VOMS predict recovery timing?
The quality of evidence for VOMS predicting prolonged recovery is far poorer than that of classifying concussion. There is variation in reporting which precludes drawing firm conclusions. One study reported diagnostic test characteristics, which included routinely high sensitivities (86–97%), low specificities, and poor AUROCs (55–58%). There is insufficient evidence to recommend VOMS as a reliable predictor of prolonged concussion.
Issues affecting interpretation of the evidence
Some data suggest differences in VOMS in males versus females. 17 VOMS may also be affected by past medical history such as learning disorder, attention-deficit/hyperactivity disorder, or previous concussions. 17 Some studies adjust for these potential differences (e.g., generating multivariable models or utilizing sex-stratified reporting 12 ) but not all reports include such adjustment.
Most data15,17 indicate VOMS predictive utility does not vary significantly within the age groups evaluated in the existing evidence base. However, the focus of most evidence on high-school or college athletes leaves open the question of generalizability to different age groups.
One weakness of the VOMS evidence base is the potential that VOMS assessors were aware of subjects’ concussion status. Awareness that a patient is concussed could be associated with bias in VOMS execution and interpretation. The studies in this review were not characterized by blinding of VOMS examiners to patient diagnosis, thus there is bias risk. Complete removal of such bias could be difficult (e.g., concussion status may be suggested by pre-provocation symptom score), but it remains the case that suspected awareness of concussion status can bias results of diagnostic testing. Four studies are from the same registry, the National Collegiate Athletic Association-Department of Defense Concussion Assessment, Research, and Education (CARE) Consortium.8,11,13,17 One study, Ferris2022 17 , pointed out that some of its subjects had been previously analyzed in other work from the same group. There may have been other subjects in the CARE registry that were subject to inclusion in more than one investigation. Since the CARE-based studies did not all share the same primary study question or methodology, there is no inherent problem in using one registry as a basis for more than one analysis set. Potential limitations with regard to generalizability are acknowledged.
MA for diagnostic testing is recognized as being particularly challenging. One reason for this is increased heterogeneity as compared to interventional studies. While there are well-accepted techniques for combining different studies’ results even in the setting of heterogeneity, VOMS MA presents particular difficulties. The widespread use of differing cutoffs for VOMS positivity could potentially be addressed with advanced techniques such as hierarchical summary ROC curves (HSROC). However, the VOMS evidence is characterized not just by varying cutoffs, but also by additional sources of multidimensional heterogeneity: total vs. change scores, overall vs. domain scores, and “full” (all domains) vs. partial VOMS. These differing approaches are not per se a weakness of the VOMS literature – they represent ongoing efforts to define the best-performing approach – but they represent a substantial barrier to creating pooled or average estimates of test performance.
Study heterogeneity was substantial. VOMS researchers 10 have highlighted the issue, writing that “the manner in which VOMS is scored and utilized is not consistently applied.” Even proceeding with the assumption that all studies administered VOMS identically, with scores tallied (as directed by Mucha's initial paper 1 ) as post-provocation minus pre-provocation scores, there were myriad potential differences in reporting. Based on a strict, simple outcome definition (concussion or not) and a single identical interpretation of VOMS positivity (i.e., identical cutoff and definition of positive VOMS), there were few data amenable to MA. Experts in the field of concussion evaluation tools (including VOMS) have noted that complex machine learning modeling is the best method for studying the multidimensional and mixed-tool nature of available concussion screening approaches. 28 The multitude of possible combinations of VOMS domains, VOMS domain cutoffs, or incorporation of VOMS in conjunction with other tools, limited the “apples to apples” study-design approaches ideal for MA.
Further study could address whether all VOMS components are always required. NPC distance, for instance, seems potentially less useful than other domains. Perhaps most importantly, the VOMS evidence base will be improved by studies incorporating standardized, clearly defined methods for testing and score interpretation.
Supplement 1: details regarding VOMS components and scoring
VOMS components
The term “baseline” represents a potential source for confusion. For some authors, “baseline” can refer to a pre-injury assessment (e.g., executed in a pre-season medical examination) 17 but the term more commonly follows usage as per that of Mucha's 1 initial VOMS description: the baseline is a post-injury assessment executed before VOMS provocative maneuvers. In order to minimize confusion, this review uses the term “pre-provocation” as a replacement for “baseline,” as an indicator of VOMS symptom scoring executed just prior to provocative maneuvers.
Various VOMS studies’ reporting often, but does not always, comprise the seven post-provocation symptom scores plus NPCcm. Three studies in this review report results from a VOMS testing procedure that excludes a domain: SmoothPurs 16 , VertVOR 1 , or VMS 3 . Reports focusing on change scores (as described below) do not always report NPCcm. 12 These modifications to the “full VOMS” may not affect a study's internal validity, but the non-identical nature of VOMS testing procedures should inform judgments regarding inter-study comparisons or pooled estimators of diagnostic performance.
VOMS scoring
VOMS consists of two types of elements: symptom scoring and a distance measurement. Symptom scores are subject-provided ratings of symptoms on a whole-number scale of 0 (no symptoms) to 10 (severe symptoms). Symptoms are rated before any provocative maneuvers (pre-provocation rating) and also after each VOMS provocative maneuver. The distance measurement is obtained using the provocative maneuver for near point of convergence (NPC); the usual approach is to perform NPC three times and record the average reading as NPCcm (the measurement is in cm but the reporting is often a score without units).
The symptom scores for a given domain are generated by summing the test subject's 0–10 ratings for each of four symptoms: headache, dizziness, nausea, and fogginess. The potential score for a given domain is thus 0 (for no symptoms at all) to 40 (for maximal symptoms on headache, dizziness, nausea, and fogginess). 1 It is acknowledged as a potential issue with VOMS, that at least some of the symptoms (e.g., dizziness) may be subjective.
One area in which VOMS scoring may be unclear is the concept of absolute (“total”) scoring vs. incremental (“change”) calculation. There is potential lack of clarity as to whether subject-reported symptom scores are supposed to represent an absolute score (e.g., asking the subject to rate headache 0–10 after a provocation, without regard to pre-provocation headache level) or a change from pre-provocation baseline (e.g., asking the subject to rate on the 0–10 scale the post-provocation headache worsening from pre-provocation headache level). The initial description of VOMS by Mucha 1 is worded as follows (italics added for emphasis): “Patients verbally rate changes in headache, dizziness, nausea, and fogginess symptoms compared with their immediate preassessment state on a scale of 0 (none) to 10 (severe) after each VOMS assessment to determine if each assessment provokes symptoms.”
Exploring the meaning of subject-reported VOMS symptom scores is not simply a semantic exercise. If patient-reported VOMS scores are patient-defined increments as described in the preceding paragraph 1 there is little sense in discussing “total” (i.e., score without subtracting pre-provocation) and “change” scores. To clarify how VOMS scoring is applied in actual practice and in the VOMS concussion prediction evidence base, this review's authors contacted VOMS experts. These experts confirmed that unless “change from pre-provocation” is explicitly identified, it is the total score (not incremental score over pre-provocation) that is used for concussion classification (personal communications, Drs. Anthony Kontos of UPMC and Robert Elbin of University of Arkansas, February 2022).
As evolving work (e.g., Elbin18 5 , Elbin21 10 ) illuminates potential performance differences between total and change scoring, the distinction between the two methods is important. In this review, unless scoring is labeled as “change” scoring (i.e., incremental), VOMS scores are “total” (i.e., absolute). Furthermore, unless clearly labeled otherwise (e.g., Ferris22 17 definition of “change from pre-season baseline”) the change score is the change from pre-provocation to post-provocation symptom scores.
One final nomenclature clarification: the “total” in total score refers to a totaling of the four 0–10 scores for headache, dizziness, nausea, and fogginess. The total thus ranges from 0–40. As generally used – but with occasional exception as noted in the next paragraph – the total score is not a summation of the VOMS domain scores. In a given study subject there are thus eight potential total symptom scores: one for the pre-provocation assessment and seven more for the seven symptom domains.
Exceptions to the rule regarding definition of the total score are found in two reports from Ferris (Ferris21 11 and Ferris22 17 ), that employ the term “total” to refer to summation of all seven post-provocation symptom domain scores. The summation generates a “total VOMS score” ranging from 0 to 280. Kontos 13 mentions the 0–280 score summation but uses the term “overall VOMS score”. For purposes of clarity in this review, the seven-domain summation score is referred to as the “0–280 score.”
Definition of VOMS positivity
Although the definition of VOMS positivity is generally consistent in studies comprising this review set, three points warrant clarification. First to be addressed in this section is the definition of positivity of NPCcm. Next, the minimum total score that defines VOMS positivity is discussed. Finally, the minimum change score that defines VOMS positivity is explained.
With regard to NCPcm, as originally described by Mucha 1 the cutoff for VOMS positivity is NPCcm exceeding 5. The instructional appendix of the initial VOMS study mentions an NPCcm cutoff of 6, but the remainder of the seminal VOMS report uses the 5 cm value. With an allowance that there is little or no practical difference between NPCcm cutoffs of “>5” and “≥5”, the VOMS literature is largely consistent in utilization of the originally defined NPCcm cutoff of 5.2,8,9,11,15,16 Two studies covered in this review included reporting on NPCcm cutoffs other than 5. The Anzalone 3 study employed an NPCcm cutoff of 6, and the multivariable regression modeling approach reported in Elbin21 10 incorporated an NPCcm cutoff of 3.
With regard to the total symptom scores, the initially described 1 cutoff of >1 (i.e., at least 2) was used in all but two studies employing dichotomous cutoffs. Anzalone 3 classified VOMS as positive for any non-zero total symptom score or for any patient exhibiting abnormal smooth pursuits or saccadic eye movements. Eagle 7 defined VOMS positivity at a higher cutoff of >2 (i.e., any VOMS symptom score at least 3 translated into a positive test). Other studies covered in this review that reported on VOMS total symptom scores either emphasized continuous (as opposed to dichotomous) total score results4–6,12,13 or did not report a definition of VOMS positivity. 14
As previously described, change scoring represents the increment in symptoms from pre-provocation scores. For any cases in which symptoms improved after provocation, the relevant domain symptom change score is coded as zero. 10 The change-score methodology is not applicable to either the pre-provocation domain or the assessment of NPCcm. For the VOMS symptom domains, change (from pre-provocation) defining VOMS positivity has occasionally been reported as an continuous modeling variable 11 or as a predictor with multiple potential cutoffs. 12 When a dichotomous approach to defining change-score positivity is used, for most studies2,5,9,15 the primary definition for VOMS positivity is the same cutoff (symptom score >1) as described by Mucha 1 for the total scoring approach. One of the most detailed evaluations of change scores, Elbin21 10 , used a change of >0 for all of the symptom domains. The Elbin21 10 study also reported a “net change” which was summed over all symptom domains; the cutoff for this composite net change was >2 (i.e., at least 3).
Supplement 2: details regarding methodology used in this review
Central tendency and dispersion
Although VOMS scores are usually non-normal, central tendencies are almost always reported as means ± SD; this practice is sufficiently common that VOMS statistical experts have used an approach of reporting parametric statistics in order to facilitate cross-study comparisons. 17
Some studies report median VOMS, with range or interquartile range (IQR). Only one study (Ferris22 17 ) reports medians’ 95% CIs.
The near-universal reporting of VOMS using means and SDs was used to calculate means’ CIs. These CIs were calculated post hoc to enable comparative viewing of precision across various studies. Means’ CIs were calculated using the formula: mean ± (zα × SE). Since CIs were set to the 95% level, zα was 1.96; SE (standard error) was calculated as SD ÷ n0.5 where n is sample size.
Measures of diagnostic test performance
This review focuses on a dichotomous endpoint (concussed vs. non-concussed) and a dichotomous predictor of positive vs. negative VOMS. A complete set of diagnostic test characteristics can be assessed if (and only if) there are data for VOMS positivity status in both concussed and non-concussed subjects.
The standard cells in the traditional 2 × 2 classification table (see Supplement 2 Figure 1) are: TP (VOMS-positive n in concussed), FP (VOMS-positive n in non-concussed), TN (VOMS-negative n in non-concussed), and FN (VOMS-negative n in concussed). In this review, studies described as reporting “full 2 × 2 table data” report all four of the table's cells with at least a possibility of non-zero cell numbers for TP, FP, TN, and FN.
Even if they are not explicitly reported in a given study, various measures of diagnostic test performance can be calculated in reviews, when the study results include raw data in the form of 2 × 2 classification table. 23 Where raw data were available, Stata was used to calculate test performance post hoc. When this was executed, the Stata-reported results were presented only for those results were not found in the original publication (i.e., the original publication's reported calculations were used when available).
Sensitivity, specificity, or both were reported in nearly all studies. Less commonly, overall performance was described using diagnostic accuracy (all correct test results, divided by all tests performed) or by Youden's J (subtracting 1.0 from summed sensitivity and specificity). J represents maximum distance of a receiver operator characteristic (ROC) curve from the null 45-degree line. A J of 0 represents a non-useful test and 1.0 a maximally useful test; 95% CI calculations for J are complex 29 and not included in most reports (or in this review). Optimizing J is a goal when investigators judge sensitivity and specificity to be equally important; some VOMS researchers have posited that this condition is met for concussion and therefore that J is an important metric by which to assess VOMS. 17
In addition to providing a basis for calculation of J, the ROC curve has another use as encountered in the VOMS evidence base. The area under the ROC curve (AUROC) serves as a basis for comparison between two predictors (e.g., in a study examining differential diagnostic utility of different VOMS cutoffs). AUROC results for VOMS have been suggested for interpretation as follows: ≥ 0.90 excellent, 0.89–0.80 good, 0.79–0.80 fair, 0.69–0.60 poor, < 0.60 fail. 12
Unidimensional summary statistics (e.g., J, AUROC, diagnostic OR, overall test accuracy), may be of limited clinical use since such measures do not distinguish between optimizing sensitivity and specificity. In clinical practice, there is advantage to having conditional measures such as positive and negative likelihood ratios (LR + and LR-), or positive and negative predictive values (PPV, NPV). Many studies covered in this review reported at least some conditional measures, for at least some VOMS-based scores.
For this review's endpoint of concussion diagnosis, PPV and NPV were not calculated. Any predictive value results appearing in the studies’ reports are not reproduced, because all studies that included VOMS assessment in non-concussed subjects used methods (“two-gate eligibility”) that artificially set prevalence. Artificial setting of prevalence translates into artificially created ratio of subjects with and without concussion, rendering non-useful any calculations for predictive values.30,31 (This limitation does not apply to the endpoint of VOMS prediction of recovery time, since all cases have concussion at study entry; predictive values are thus reported for the secondary endpoint and appear in Supplement 5.)
Meta-analysis
The heterogeneity in VOMS application precluded the planned MA approach of calculating pooled estimates for key diagnostic performance measures. MA techniques such as hierarchical summary ROC (HSROC) are potentially useful to handle studies defining VOMS positivity at different scoring cutoffs, but HSROC is not able to effectively handle differences in the VOMS itself (e.g., as occur when studies utilize different sets of VOMS domains). With the limitation that there was an insufficient study n to allow formal HSROC, the available data were used to construct a preliminary (hypothesis-generating) HSROC curve which is reported in Supplement 4.
Studies of diagnostic test performance are characterized by higher heterogeneity than interventional studies. Summary measures such as I2 are not recommended since such measures fail to account for key sources of heterogeneity in diagnostic test studies. 32 Despite agreement that I2 is not applicable for MA of DTA studies, there is no consensus on what measures (if any) could replace it. 23 Thus no formal measures of heterogeneity are reported in this review.
There were insufficient data in this review to allow for stratified or grouped analysis. There were also insufficient data to execute MA for the secondary endpoints addressing VOMS prediction of delayed recovery from concussion.
Supplement 3: extended evaluation of reviewed studies’ methodology
For purposes of clarity and brevity (i.e., to avoid generating methodology scoring tables with impractically large numbers of cells) the methodological review is organized by issue, with grouped assessment of studies sharing similar characteristics with regard to the issue in question. The methodological review follows the listing of QUADAS and PRISMA-DTA issues as outlined in the review's main-body Tables 3 and 4.
Study subject accrual and spectrum bias
With regard to study subject accrual, potential methodological issues arise in both the concussion cases and the non-concussed (control) cases. Some of the issues may limit test performance characteristics that can be reliably calculated, and other issues may affect generalizability of results.
With regard to non-concussed control cases, there were nine studies1,5,6,9–11,13,14,17 that reported results in concussed and non-concussed cases. For one study 14 the only available report was a conference abstract which presented limited methods and results: the abstract's main endpoint was VOMS prediction of delayed concussion recovery, but the authors also reported negative predictive value (NPV), thus implying VOMS had been applied in both concussed and non-concussed cases. No further information is available for this study.
Of the other eight studies reporting on both concussed and non-concussed subjects, five1,6,9,10,13 generated concussed and control groups by employing a “two-gate” approach (i.e., subjects were selected by two gateways into the study, with selection into non-concussed or concussed groups based on known disease status). The three remaining5,11,17 studies (of the eight assessing both concussed and non-concussed cases) used a model by which concussed cases served as their own controls; pre-injury VOMS (i.e., administered before any chance of concussion) formed a baseline of non-concussed status against which the (same) subjects’ post-injury VOMS scores were assessed.
The accrual of non-concussed controls using either the two-gate or the “before and after” approach is not a methodological weakness per se, but the artificial dictation of disease prevalence has implications for DTA analysis (see Supplement 2). There were no studies in this review's evidence base that assessed VOMS in a single group, comprising subjects with and also without concussion, so no inferences can be drawn about VOMS’ positive predictive value (PPV) or NPV for concussion.
For concussed cases, the spectrum bias issue is related to the concussion populations comprising study subjects in the evidence base. Considered overall, the study set assessed in this review comprised populations of healthy high-school and college athletes. There are few if any data addressing the question as to whether VOMS performance in these young, healthy patients would be matched by VOMS performance in other groups. Available data suggest that assessment of VOMS in disparate populations would be useful. For example, in discussing relatively poorer VOMS results in their study, Buttner 9 pointed out that their subjects were ED cases (i.e., not those evaluated on the field or in a sports-medicine clinic) and were older (16–38 years rather than 9–18 years in Mucha 1 ).
The fact that VOMS is intended for use in ambulatory subjects focuses the technique (and the studies in this review set) within a relatively narrow, similar range in the acuity spectrum. Within the range of ambulatory concussion, there is an acknowledged potential for differences in acuity. Since there are very few data stratifying VOMS by concussion acuity, the role of varying severity within the range of ambulatory concussion remains to be elucidated.
An additional source of potential bias with VOMS assessment in different studies is the timing of test administration. The initial VOMS report 1 study eligibility criteria required injured cases to be seen within three weeks of injury. In fact, in Mucha's study most subjects were seen much more quickly: 94% of subjects underwent initial VOMS within two weeks of injury and over half had their first VOMS within the first post-injury week. 1 Despite the fact that the initial VOMS was usually performed more rapidly than the study's inclusion criterion of a three-week window, it remains the case that the initial VOMS report's 21-day window was broader than the eligibility criteria for any of the other studies assessed in this review (see Supplement 3 Table 1).
Timing of initial VOMS can be important since those with delayed presentation may represent higher-acuity (more persistent) concussion. Table 3 shows inter-study similarity in mean post-injury days (2.2–7.3) to initial VOMS. There remains a possibility that differences in time interval from head injury to first VOMS assessment could confound synthesis of VOMS performance results. This issue has been highlighted by Buttner, 9 who pointed out that VOMS may have different diagnostic performance in higher-acuity (delayed presentation) patients.
Sufrinko 4 found that mean days from injury to initial VOMS was not associated with risk of prolonged recovery. Conversely, Knell 12 found that cases with prolonged recovery had significantly (p < .001) longer days post-injury to initial presentation (2 days vs. 1 day). Elbin18 5 reported differential VOMS classification in cases presenting within a week of injury, as compared to those presenting later.
The issue with regard to post-injury VOMS performance may be less one of timing, than of symptom level. One shortcoming of the evidence is that the data do not allow for precise stratification of VOMS performance depending on subject symptom level.
With regard to study subjects, one potential issue for DTA studies is the clarification of the unit of analysis. 23 The studies in this review used the subject as the unit of analysis, and this was always clear.
Misclassification and disease progression bias
A standard consensus 20 diagnosis for concussion (SRC) was applied in most of the studies covered in this review. The CARE registry, which provided data for at least four studies, uses essentially the same definition of concussion 11 : “change in brain function after a force to the head, which may be accompanied by temporary loss of consciousness, but is identified in awake individuals with measures of neurologic and cognitive dysfunction.” The definition of “concussion” is both consistent and broad.
For patients who sustain a head injury, identifying presence of any neurologic or cognitive dysfunction is a straightforward (if perhaps nonspecific) diagnostic method. Misclassification bias – diagnosing concussion where it should not be diagnosed –does not appear to be a likely source of error in VOMS studies. Since the VOMS studies in this review set relied on concussion diagnosis by clinicians (either on the playing field or in the clinic setting), the chance of systematic bias from misdiagnosis seems low. Furthermore, the chance of misclassifying non-concussed cases as concussion is also low; the “non-concussed” subjects were either evaluated in pre-season physical exams or selected from the community and had no head injury.
Where misclassification could be an issue, is in the realm of differential presentation to the clinical setting in which VOMS is performed. It is conceivable that subjects sustaining a head injury and diagnosed (correctly) on the playing field as being concussed, could have symptom resolution by the time they present for initial VOMS evaluation. This “disease progression” bias must be acknowledged as a potential source of error in either direction.
If correctly diagnosed concussion resolves by the time initial VOMS is performed, there is high chance of bias against the performance of VOMS to identify concussion (which is in fact already resolved). This risk would seem to increase with the time elapsed from initial injury (and concussion diagnosis leading to study eligibility) to initial VOMS. On the other hand, there is suggestion that cases presenting to concussion clinics after longer delays, have more serious concussion. 12 There is also suggestion that VOMS diagnostic performance differs with varying time since injury. 5 Perhaps cases diagnosed on the field with concussion, who have concussion findings that are resolving over time, have less-severe disease when they present in delayed fashion for VOMS. In such cases, the longer the delay after injury that VOMS is performed, the more the concussion symptoms have diminished as part of the natural history of the condition; more diminished concussion should be considered likely to be more difficult to detect.
There are thus possible sources of both misclassification bias (if concussion has resolved completely by the time of initial VOMS) and spectrum bias in either direction (more severe concussion being seen later, or concussion symptoms nearly resolving in later-seen cases). The existing evidence base does not contain sufficient information to fully judge the potential for these types of bias; for example, few studies describe in detail, independent, blinded confirmation of the diagnosis of concussion before VOMS is performed. Using the approach of relying on clinical diagnosis of concussion is both practical and reasonable, but the studies in this review set leave the possibility open for differential classification likelihood related to whether the concussion diagnosis is in fact present at the time of VOMS.
Partial & differential verification bias
There was no partial verification bias in the study set. All concussed subjects in all studies had a formal clinical diagnosis.
The same general (broad) definition of concussion was used in all studies. The concussion diagnosis was made before the relevant VOMS testing was performed. There was therefore no risk of differential verification bias.
Incorporation bias
Since concussion was diagnosed independently of (and temporally separate from) VOMS, there was low likelihood of incorporation bias. Given the fact that any items found by VOMS (e.g., pronounced worsening of headache symptoms with head movement) could also have informed concussion diagnoses, the possibility of informal incorporation bias is acknowledged for all studies in this review set. This unavoidable risk is acknowledged: some symptoms exacerbated by VOMS maneuvers could have played a role in prompting concussion diagnosis (before VOMS was performed). The temporal independence of concussion diagnosis and formal VOMS assessment render unlikely, the risk of important incorporation bias.
Information bias
Concussion diagnosis and VOMS testing were performed separately and independently. There is no concern for information bias due to differential VOMS result availability prior to concussion diagnosis.
Based on the available methodologic reporting in all studies reviewed, it appears that in all studies the individual administering the VOMS was likely aware of the subject's concussion status. Most studies enrolled only concussed patients. For those studies that incorporated healthy (non-concussed) controls, the non-concussed status of these patients would have been rendered known by the sampling method (e.g., VOMS administration as part of routine physical exam). 1
The awareness of concussion status could pose a threat of confirmation bias. The direction of the bias would be expected to be in favor of VOMS’ diagnostic performance. However, since VOMS symptom scores are patient-reported, and since patients would not be expected to be aware of various “expected” scores for concussion, this form of information bias is an unlikely source of substantial methodological limitation to the VOMS studies.
A final form of information bias to consider in DTA studies relates to availability of clinical data in the study setting as compared to the real-world setting. With the caveat that the most important piece of clinical information – the concussion status – was likely known to VOMS testers, there were no other likely sources of information bias relevant to data availability in study vs. clinical setting.
Selection and exclusion bias
Exclusion bias could result from actions at two time points. The first point entails selection bias (e.g., VOMS application in all eligible subjects). The second time point is post-accrual, and entails exclusion of cases from analysis.
While four studies1,4,5,16 provided information explicitly outlining consecutive enrollment, the other studies’ methodology section wording was less definitive as to whether sampling was consecutive. The four studies8,11,13,17 accruing cases from the nationwide CARE registry would not have been able to control sampling. Other studies in this review set were not explicit in stating consecutive sampling.
The question for a review is whether VOMS studies’ convenience sampling could be substantially problematic. It is possible that VOMS could be more difficult to use in some cases, but this difficulty was not encountered in the studies that did employ consecutive sampling. There were few if any issues with subjects’ inability to complete VOMS and provide interpretable (definitive) results. It is possible that VOMS may be more difficult in some cases, but there is no clear concern over selection bias resulting from convenience sampling.
The second time point at which exclusion can become an issue for VOMS studies, is in the more traditional realm of exclusion bias: excluding cases that have been accrued into the study. Drop-outs and excluded data should be specifically counted and explained, and this information was explicitly reported in nearly all studies.1–6,8–17
Post-accrual exclusion may have been an issue in two studies7,14 in which there were potential areas of minor tabulation discrepancies 7 or incomplete reporting due to the study's being reported only as a conference abstract. 14 These issues do not appear likely to influence the results of this review.
In two studies2,10 there was a related issue of exclusion of cases as outliers: one 10 excluded 16% of the dataset's concussion cases and the other 2 excluded a single outlier case. Based on limited evidence from at least one study 12 reporting sensitivity analysis results demonstrating no outlier effects, it seems unlikely this review's conclusions are substantially affected by these exclusions.
A priori establishment of study objectives and clearly defined VOMS cutoffs
All studies in this review set had clearly defined pre-specified objectives. In some cases, the pre-specified objectives did not include predefined VOMS cut-offs, but this was due to the fact that study goals often included elucidation of preferred VOMS cutoffs. Issues with VOMS cutoffs’ being different have been discussed elsewhere in this review; the problems posed were less those of internal validity and more those of posing difficulties drawing generalized conclusions (from studies using different VOMS calculation methods and cutoffs).
VOMS administration
For VOMS to be fairly judged as a predictive test, the testing procedures should be consistent for all subjects in all locations. In a sense, it would be ideal if only one person executed the VOMS testing for each study. This was not the case in any study in this review. In seven5,8,10,11,13,15,17 of this review's studies subjects were drawn from multiple centers. In all remaining studies VOMS in different study subjects, was not always performed by the same examiner.
There were ten VOMS studies that were single-center. The initial Mucha report 1 described use of VOMS by physical therapists. Sufrinko 4 used neuropsychologists to assess VOMS. Knell 12 employed a team of six physicians, neuropsychologists, nurse practitioners, and certified athletic trainers to administer VOMS. Buttner 9 also used a specially trained team of investigators (varying backgrounds) to assess VOMS. The other six2,3,6,7,14,16 single-center studies noted VOMS was assessed in the clinical setting, without providing details on the assessors.
The VOMS is simple to administer. All studies comprising this review's evidence set cited the original work by Mucha 1 as their model for VOMS performance. The VOMS testing procedure is relatively straightforward, but detailed explanation of each component is beyond the scope of this review; such details (with illustrations) are available in the original Mucha publication. Since the initial VOMS description, there has been assessment of the validity of the VOMS procedure; these studies (in healthy adults) are not subject to discussion in this review but have demonstrated acceptable psychometric properties (internal consistency and test-retest reliability). 16 With some allowances for differing investigator preferences on VOMS components or scoring (as discussed elsewhere in this review), there seems to be low likelihood that the results of this review are affected by the VOMS testing procedures or personnel.
Intellectual or financial conflicts of interest
The VOMS is not a proprietary test. From its inception VOMS has been described as a simple, free method to rapidly make a diagnosis that otherwise requires complicated (or expensive) equipment. Nine studies1,2,5–7,10,11,13,17 included acknowledgments of potential conflicts of interest; none appeared likely to have caused undue influence on results.
While neither a conflict of interest nor necessarily a source of bias, it is noteworthy that a large proportion of the VOMS evidence base has been produced by the same centers and investigators. UPMC, where VOMS was developed and first described, features as an author institutional affiliation in 11 of this review set's 17 studies.1,2,4–8,10,11,13,17 Full validation of VOMS’ widespread utility will be advanced with broader extension of the evidence base.
Reporting of DTA results
In addition to reporting issues discussed elsewhere, the primary considerations recommended for review (particularly by PRISMA-DTA) 23 are inclusion of 2 × 2 classification table data and incorporation of clinical implications. The 2 × 2 classification table data were often unavailable in this review's study set, but in many cases this was due to the fact that there were no non-concussed cases. Absence of full 2 × 2 classification table data was not necessarily a limitation of an individual study, but lack of access to such data does pose challenges to a summary MA. For either the primary or secondary endpoints, the review's evidence set did not reach the minimum number (four) of studies required for use of the Stata diagnostic test MA commands.
In the arena of clinical implication, the studies in the review set all performed well. In all studies, potential application of the VOMS was addressed. The majority of results reports included endorsement of the VOMS use in some form, for some endpoint, but even for the studies that found VOMS was of limited use there were clearly stated clinical implications (i.e., the need for development of reliable testing for concussion).
Supplement 4: extended results
Descriptive results for VOMS domain means and medians
Supplement 4 Table 1 summarizes results from the studies’ reported means (with 95% CI where available) and medians (with IQR where available). Only one study (Ferris22 17 ) reported CIs around medians. When presented, medians’ dispersion was usually noted as IQR so this review follows that approach.
Extended calculations for ma of VOMS scored as a continuous variable
Supplement 4 Table 2 provides results complementing MA forest plot for VOMS as a continuous variable, with focus on effect size (i.e., degree to which domain scores are higher in concussed vs. non-concussed cases). The table includes results on I2 with a general guide that values exceeding 75% likely indicate unacceptable values.
Multi-domain VOMS: sensitivity, specificity, AUROC, and additional measures
For multi-domain (at least four domains) VOMS, the review's main-body results section (Figure 2 and Figure 3) plots paired sensitivity and specificity as well as AUROC. Data for these figures and other multi-domain DTA parameters are provided in Supplement 4 Table 3.
Most of the values in Supplement 4 Table 3 are reported in the original studies. In some cases, such as diagnostic odds ratio (Dx OR), the table's values were calculated from raw data contained within the original reports. (The Dx OR is the ratio of odds of VOMS positivity in concussion, to odds of VOMS positivity in non-concussion.)
Hierarchical summary receiver operator curve (HSROC) for multi-domain VOMS identification of concussion
The Stata module metandi (MA for diagnostic tests)25,26 which employs a mixed-effects approach for binomial modeling, requires at least four studies with full 2 × 2 classification data. Since there were not four studies meeting strict criteria for assessment in MA, calculations and plotting were performed only on an exploratory basis and reported only in this supplementary results section. The data may be useful for planning of future studies (e.g., for sample size calculations).
The data sources for the four sets of 2 × 2 classification data came from three studies: Elbin18 5 , Buttner 9 , and Elbin21 10 . One study (Elbin21) used a net-change scoring method that did not incorporate one of the VOMS components (NPCcm). Another study, Elbin18, contributed two “points” to the SROC analysis. Elbin18 reported 2 × 2 classification data for both total and change score techniques.
It must be explicitly acknowledged that defining the two (non-independent) scoring approaches from one study (Elbin21) as providing two 2 × 2 classification tables (i.e., two circles on the HSROC curve) is inconsistent with expert recommendations 22 that a single summary ROC (SROC) curve should not include two cutoffs from the same study. It is for this reason that the HSROC results are considered exploratory.
With caution recommended due to different VOMS components and double-use of results from one study, the HSROC findings are summarized in Supplement 4′s Table 4. These preliminary results suggest a relatively high VOMS sensitivity. There is an additional suggestion that specificity (negativity of VOMS in absence of concussion) is less favorable than sensitivity. The Dx OR estimates may indicate clinical promise: the odds of VOMS positivity in concussion is 10-fold the odds of VOMS positivity in non-concussion.
All of the HSROC preliminary findings come with the caveat that (as indicated by broad 95% CIs) precision that is too low for robust conclusions. The nature of the imprecision is also manifest in the HSROC curve which is shown in Supplement 4 Figure 1.
Single-domain VOMS: sensitivity, specificity, AUROC, and additional measures
The 17 studies’ results for individual VOMS domains are presented in Supplement 4 Table 5. Results are organized by domain; within domains results are ordered by cutoff score defining positivity, and then by total vs. change score method. Organization of results in this fashion facilitates visualization of different studies’ results for identical domain predictors. This allows for quick visualization of which domains and scoring methods have more supporting data, and also which domains are characterized by disparate findings in different studies (e.g., for total-scored SmoothPurs >0 the Mucha 1 AUROC of .62 was far lower than, and had a 95% CI that did not overlap with, the Kontos 13 AUROC of .90).
Supplement 5: VOMS prediction of prolonged concussion recovery
An initially planned secondary aim, focused on cases that did have concussion, was to determine whether VOMS could be useful to identify those who ultimately had a delayed return to normal (pre-injury) functional status. Paucity of data amenable to synthesis precluded substantive exploration of the delayed-recovery endpoint, but it was judged that the available information and findings were sufficient findings to warrant inclusion in a Supplement. The findings are presented here, with separation into one section for categorical endpoint (delayed vs. non-delayed concussion recovery) and one section for recovery days assessed as a continuous variable.
Assessment of prolonged concussion recovery as a dichotomous outcome
With regard to VOMS use to predict prolonged concussion recovery, the relevant evidence base was far more limited than that for VOMS concussion diagnosis. No studies reported full 2 × 2 table data for the dichotomous endpoint of delayed return to normal. The information that was available varied widely, and due to paucity of data and methodological heterogeneity the evidence was not amenable to synthesis or MA.
Four studies4,7,12,14 assessed delayed return to normal as a categorical outcome (all defined delay at 30 + days). The outcome of recovery days as a continuous variable was reported in five studies.3,8,12,14,17 Using delayed recovery as a categorical outcome, prediction models were often multivariable (with different covariates in different studies). No two studies reported the same results measure for the same VOMS domain.
In assessment of delayed concussion recovery as a dichotomous outcome, Sufrinko 4 and Eagle 7 reported ORs, Price 14 reported NPV, and Knell 12 reported sensitivity, specificity, AUROC, J, LR+, LR-, and predictive values. In a model that adjusted for non-VOMS predictors (post-injury days, headache history, post-concussion symptom score), Eagle reported that positive VMS was associated with a five-fold increase (OR 5.18, 95% 1.52–17.60) in odds of delayed recovery. (Although the OR as reported by Eagle remains valid as a ratio of odds, the incidence of delayed recovery exceeded the “rare-disease” requirements for OR to be interpreted as representative of risk ratio. 33 ) Supplement 5 Table 1 provides data from two studies’ DTA findings for VOMS prediction of delayed recovery.
Other then the VMS finding, Eagle reported that no other VOMS domains were associated with delayed recovery. 7 Non-utility of VOMS for prediction of delayed concussion recovery was also found by Sufrinko, 4 who reported no association between any VOMS domain and prolonged recovery.
Knell's detailed results reporting included NPV of 88–91% for each of four levels of VOMS prediction stratified by sex and by overall-change cutoff (1 or 2). 12 Price's abstract, which did not provide methodologic details, reported a VOMS NPV of 81% (with no other results for categorical outcome of recovery delay). 14 Clinical utility of high NPVs, as suggested by Knell and Price data, is lessened by the tradeoff of high rates of false positives. As shown in Supplement 5 Table 1, specificity ranged from 10–29% in Knell's study (with accompanying PPV ranging from 21–34%). 12
Assessment of recovery days as a continuous variable
Anzalone 3 reported that the number of positive VOMS domains was associated with the number of days required for concussion recovery (correlation coefficient 0.31). Anzalone also assessed hazard ratios of VOMS domains for association with recovery. Most individual domains were statistically significant in univariable modeling but none were significant in adjusted models. (Since no other studies used this methodology, the data are not reproduced here.)
In addition to paucity of data, methodological barriers to synthesis included inter-study variation in recovery times, as well as likelihood of skew in the continuous variable's distribution. Recovery-days reporting from four studies2,3,8,17 is shown in Supplement 5 Table 2. Only one study 17 reported central tendency as a median, and its findings (median of 13 with mean of 25) suggest inadequacy of reliance upon means for the recovery-days endpoint.
Supplemental Material
sj-docx-1-ccn-10.1177_20597002231160941 - Supplemental material for Vestibular/ocular motor screening (VOMS) score for identification of concussion in cases of non-severe head injury: A systematic review
Supplemental material, sj-docx-1-ccn-10.1177_20597002231160941 for Vestibular/ocular motor screening (VOMS) score for identification of concussion in cases of non-severe head injury: A systematic review by Caroline E. Thomas, Stephen H. Thomas and Ben Bloom in Journal of Concussion
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
