Abstract
Study Design
This study is a scoping review.
Objective
There is a broad variability in the definition of degenerative cervical myelopathy (DCM) and no standardized set of diagnostic criteria to date.
Methods
We interrogated the Myelopathy.org database, a hand-indexed database of primary clinical studies conducted exclusively on DCM in humans between 2005-2021. The DCM inclusion criteria used in these studies were inputted into 3 topic modeling algorithms: Hierarchical Dirichlet Process (HDP), Latent Dirichlet Allocation (LDA), and BERtopic. The emerging topics were subjected to manual labeling and interpretation.
Results
Of 1676 reports, 120 papers (7.16%) had well-defined inclusion criteria and were subjected to topic modeling. Four topics emerged from the HDP model: disturbance from extremity weakness and motor signs; fine-motor and sensory disturbance of upper extremity; a combination of imaging and clinical findings is required for the diagnosis; and “reinforcing” (or modifying) factors that can aid in the diagnosis in borderline cases. The LDA model showed the following topics: disturbance to the patient is required for the diagnosis; reinforcing factors can aid in the diagnosis in borderline cases; clinical findings from the extremities; and a combination of imaging and clinical findings is required for the diagnosis. BERTopic identified the following topics: imaging abnormality, typical clinical features, range of objective criteria, and presence of clinical findings.
Conclusions
This review provides quantifiable data that only a minority of past studies in DCM provided meaningful inclusion criteria. The items and patterns found here are very useful for the development of diagnostic criteria for DCM.
Introduction
Background
Degenerative cervical myelopathy (DCM) is the most common non-traumatic cause of cervical spinal cord dysfunction in adults worldwide.1-5 During the last 2 decades, there has been an exponential increase in the number of published studies regarding DCM.6,7 This is not surprising considering that recent estimates suggest DCM could affect 2.3% in a healthy population.3,4,8,9 Furthermore, DCM can cause significant disability, result in loss of independence and impose substantial financial burden.
Timely diagnosis has emerged as a key priority for DCM, given that duration of symptoms is currently one of the only modifiable predictors of surgical outcomes.10-14 Recent estimates suggest that up to 90% of DCM is undiagnosed,15-18 and that individuals may wait 2 to 5 years 19 to obtain a correct diagnosis. Despite these shortcomings, there is still no agreement in the literature on what constitutes DCM or even a standardized nomenclature in the field. This is a challenge in both a research setting and clinical practice and may impede the education of health care providers who first encounter such patients, such as primary care physicians and allied professionals.
Reviewing the inclusion criteria used for clinical research studies may offer a perspective on how DCM should be defined. Previous analysis used to inform the consensus adoption of an index term for DCM and form a minimum dataset suggests that the criteria used to enroll patients into research studies is heterogeneous and may vary substantially.20,21 However, aggregating this content and employing new natural language processing (NLP) methods could offer collective insights into how the field diagnoses DCM.
The authors of the current work hypothesize that DCM criteria used over the past decades highly vary in content and precision, since there is no standardized definition for DCM. However, common patterns in inclusion criteria from previous studies may be used to better define DCM. The objective of this study is (i) to demonstrate the infrequent use of well-defined and reproducible DCM criteria and (ii) to scope the literature and apply topic modeling methodology to map the patterns that have been used as inclusion criteria in primary DCM studies. Essentially, the reported diagnostic criteria will be used as a “proxy” to reflect existing DCM diagnostic criteria used in medical literature.
Methods
Eligibility Criteria and Information Sources
For this work, primary human studies that included 15 or more subjects with DCM qualified for inclusion. Studies were only considered if they utilized predetermined clinical, radiological, and/or surgical criteria in order to select their patient population. Studies were excluded if they met any of the following criteria: (1) no inclusion or exclusion criteria provided in the methods section, (2) the authors provided relevant clinical features of their patients in the methods section but did not specifically report inclusion criteria, (3) the study summarized inclusion criteria but did not distinguish between patients with traumatic vs non-traumatic myelopathy or between patients with myelopathy vs radiculopathy, (4) only nonspecific inclusion criteria were provided (tautological, eg, “signs and symptoms typical for DCM”), (5) the study only reported a standardized scale (eg, modified Japanese Orthopedic Association (m)JOA) score) as a way of qualifying patients’ neurologic function, (6) the study was a review article, animal study, survey, editorial or commentary. If studies were based on data from the same dataset, only one was included in this scoping review. Lastly, only reports in the English language were considered. This review aligns with research Priority 3: Diagnostic Criteria of AO Spine RECODE-DCM. 17 Since this is a scoping review and no identifiable patient information was accessed, informed consent and institutional review board approval were not necessary.
Search Strategy
Myelopathy.org is a scientific and clinical charity dedicated to transforming outcomes in DCM through global research initiatives and by raising awareness among the public and health care professionals. Based on highly sensitive search filters for MEDLINE 22 and EMBASE, 23 a dataset of primary clinical articles exclusively on DCM was developed for the purpose of revealing insights into the research field, and supporting research.6,7,24 For the purposes of this scoping review, this database was accessed in order to identify all DCM papers published between 2005 and 2021.
Study Selection
Based on the recognition that the abstracts would not contain the relevant information to judge the inclusion criteria, full-text review was completed to screen for eligible studies. The rationale for this was that the authors’ primary purpose was to identify any possible DCM definitions across all published primary studies. Such definitions, if provided by the authors of each study, were expected to be found in the methods section of each report. Hence, screening by title/abstract was not relevant for the objective of this scoping review. Before screening, all reviewers were trained with a teach-back method from the senior authors on how to identify eligible reports based on the predefined eligibility criteria. All papers were equally assigned to 7 independent reviewers. Any articles that were considered relevant by the reviewers were included. In order to assure relevancy, the definitions extracted from the included articles were reviewed by the senior authors.
Data Collection Process
The methods section of each report was screened and inclusion/exclusion criteria were extracted. A data collection tool was developed by the research team in order to standardize data extraction for studies that satisfied our inclusion criteria. Data obtained from each study included the following: name and country of first author, publication year, study design, sample size, inclusion/exclusion criteria and DCM definition. Due to the large number of studies, each report was reviewed by only one reviewer and one senior reviewer.
Data Preparation
The raw text of the inclusion criteria from each eligible study was collected. Parts of the criteria were deleted if they were specific for the purposes of a particular research study but not generalizable for the diagnosis of DCM. Specifically, parts that stated other neurological pathologies (eg, amyotrophic lateral sclerosis, multiple sclerosis) as exclusion criteria were removed. In individual studies, these exclusion criteria were used to create a more homogenous research population in order to avoid confounders. For the purpose of this scoping review, it was agreed that these parts of inclusion criteria were not useful and may introduce noise into the analysis. There is no fundamental reason why other neurological pathologies cannot develop in the presence of DCM. Furthermore, many studies excluded OPLL when screening for eligible patients despite the fact that OPLL is currently considered within the spectrum of DCM. Therefore, statements of exclusion of OPLL patients were also deleted for the purpose of this analysis.
Data Analysis
The authors used the following methods for data analysis: (1) Topic modeling with algorithms not based on transformers. Topic modeling is a NLP unsupervised method that identifies word and phrase patterns within texts, and automatically clusters these patterns. The Hierarchical Dirichlet Process (HDP), a nonparametric Bayesian approach to clustering grouped data, was utilized and implemented through the gensim Python package. 25 (2) Topic modeling with transformers-based algorithms. BERTopic for topic modeling is a newer topic modeling algorithm that incorporates the use of transformers, which is the current state of the art algorithm for NLP. 26 The transformers offer the capability to encode contextual information and do not require removal of stop words. 27 More details regarding these analyses have been included in the Supplemental Material.
Analysis of the Meaning of the Identified Topics
The topic that emerged from the above topic modeling algorithms were manually labeled. Specifically, the topic labels were determined based on the topic words and the sample vetting. The meaning of each topic label was interpreted by the senior authors and an overarching theme was selected for each topic using our domain expertise of a multi-stakeholder working group, the RECODE-DCM Diagnostic Criteria Incubator. The topics were named based on the content of each topic. The final determination was reached through a consensus process after several iterations of proposals.
Results
Scoping Review
In total, 1676 papers published between 2005 and 2021 were retrieved for consideration in this scoping review. Of these, 120 studies specified inclusion criteria and were eligible for topic modeling (Figure 1). The most common reasons for exclusion were the following: (1) no inclusion or exclusion criteria were provided in the methods section, (2) the study did not elaborate on the clinical definition of DCM, (3) the study summarized inclusion criteria, but did not distinguish between patients with traumatic vs non-traumatic myelopathy or between patients with myelopathy vs radiculopathy. Table 1 lists the title, journal and year of publication of each included study. Diagram of the included studies. Title, Journal and Year of Publication of Each Included Study.
HDP Results
Four topics emerged from the HDP model and are presented as word clouds in Figure 2. Word clouds are graphical representations of words used in a particular context. The more a certain word or term is represented in a source text, the greater its prominence will be in the word cloud. For HDP Topic 0, disturbance from upper and lower extremity weakness in addition to motor signs emerged as dominant findings (ie, and not sensory disturbance). For HDP Topic 1, fine-motor and sensory disturbance of the upper extremity were considered important. For HDP Topic 2, there was an equal representation of imaging (eg, compression, MRI) and clinical (eg, weakness, motor symptoms and signs) terms, suggesting that a combination of imaging and clinical findings is required for a diagnosis of DCM. The interpretation of the HDP Topic 3 is more challenging given the variety of terms including severity, progression, alignment, instability, objective findings (eg, MRI, clinical signs), and intramedullary. These terms appear totally disconnected; however, the emerging theme is that these might be “reinforcing” (or modifying) factors that can aid in the diagnosis of borderline cases. An additional LDA sensitivity analysis was conducted yielding word clouds model topics (Figure 3) and is described in the Supplemental Material in more detail. In Figures 2 and 3, words with greater size have been identified proportionally greater frequently within all the included papers. Word clouds produced for the HDP model topics. Word clouds produced for the LDA model topics.

BERTopic Results
BERTopic identified 4 topics (Figure 4). Topic 0 was automatically classified by the algorithm as “cord_compression_imaging” based on the most common words. This topic indicates that the presence of imaging findings is necessary for the diagnosis of DCM. Topic 1 was “weakness_signs_symptoms” and summarized the typical clinical features of DCM including both signs and symptoms. Topic 2 label was generated as “compression_cord_gait” and represents the range of objective criteria needed for the diagnosis of DCM. Finally, Topic 3 or “symptoms_bilateral_weakness” represented the need for the presence of clinical findings. Hierarchical clustering (Figure 5) showed proximity of topics 0 and 2 as well as topics 1 and 3. Topic clustering (Figure 6) showed significant overlap of topics 0 and 2. The distribution of topics over time (Figure 7) showed a volatility in topic 0. Similarly, the distribution of topics in the various journals (Figure 8) demonstrated a high frequency of topic 0 in journals dedicated to spine surgery. Key characteristics of the topics that emerged with BERTopic. Hierarchical clustering of the BERTopic model topics. Clustering of the BERTopic model topics. BERTopic model Topics over time. BERTopic model distribution in journals.




Discussion
Degenerative cervical myelopathy is a common but poorly characterized non-traumatic spine disorder. Based on the findings of this scoping review, less than 10% of studies from the past decades used a reproducible set of inclusion criteria. The majority of the reviewed studies included generic statements about DCM, such as “patients with DCM were included”, which were considered ill-defined and not reproducible. Given this heterogeneity, there is a pressing need to develop a widely accepted and standardized definition of DCM and create criteria to support timely diagnosis of this condition. Diagnostic criteria will improve consistency among studies and strengthen the external validity of future research endeavors. In prior work by Nouri et al, the term DCM was introduced as an overarching definition to describe non-traumatic, degenerative pathologies of the cervical spine causing spinal cord impairment secondary to mechanical compression, including cervical spondylotic myelopathy, OPLL, hypertrophy of the ligamentum flavum and degenerative disc disease. 3 This was selected as the best index term for the condition, and a formal definition was created. 1 Unfortunately, however, there is currently a lack of validated, reproducible, and standardized clinical, radiological and/or surgical diagnostic criteria of this hypernym. This absence includes the international classification of diseases, 11th revision. 17 Therefore, recognizing and diagnosing DCM may pose a challenge to physicians. 28
One of the reasons why DCM is often misdiagnosed is that it can present with a variable combination of clinical signs and symptoms. These clinical manifestations include but are not limited to neck pain or stiffness, arm paresthesias, decreased hand dexterity, upper or lower extremity weakness, gait instability, positive Hoffmann sign, increased upper and/or lower extremity deep tendon reflexes, and urinary, bowel and sexual dysfunction.29-31 Notably, none of the aforementioned signs or symptoms is considered pathognomonic for the diagnosis of DCM. In addition, individuals with DCM may have atypical symptoms that have been associated with a particular level of spinal cord compression. 32 The incidence of these symptoms can be as high as 37% 32 and can further complicate the diagnostic process of DCM. Our analysis showed that motor impairment and fine-motor and sensory disturbance of the upper extremity were among the most common clinical criteria used to define DCM. These clinical symptoms reflect some of the items used in the mJOA score which is 1 of the accepted gold standards for evaluating functional impairment in patients with DCM. Interestingly, clinical signs (eg, positive Hoffmann or Babinski sign) were less commonly used to define DCM, despite representing an objective mean to assess for spinal cord compression. This finding is similar to the conclusions of a recent systematic review that suggested the most commonly used scales for assessing spinal cord function in DCM were more subjective and based on patient reports, including the JOA, mJOA, the Neck Disability Index, the Nurick tool and the Short Form 36 quality of life measure. In contrast, only 8% of the included studies assessed objective neurological findings. 29 In addition, the high prevalence of asymptomatic cases with incidentally found spinal canal stenosis and/or spinal cord compression poses another challenge in developing diagnostic criteria for DCM as these findings must be interpreted in the context of relevant clinical symptoms and signs. 33 Based on the results of this scoping review, the diagnostic algorithm of DCM may consider subjective assessments more than objective criteria.
The current scoping review offered significant insights into how DCM is diagnosed in the literature. It is evident that there was not a clear and consistent diagnostic algorithm used for identifying patients with DCM. Several approaches were used to analyze the data. The senior authors concluded that a quantitative analysis of all or some of the criteria would not be as powerful for providing contextual insights. Similarly, qualitative synthesis or mixed methods would be too time intensive and introduce subjective biases. Therefore, NLP with topic modeling was selected since it is a contemporary method that provides novel insights compared to traditional analysis methods.
Themes that emerged from this analysis were that imaging with cord compression, and motor function of upper and lower extremities were weighted heavily in the diagnosis of DCM. Interestingly, clinical signs (eg, Hoffman’s) as well as sensory features (eg, pain or paresthesias) were less important in diagnosing DCM, despite their reputed specificity or significance to patients, respectively. The interpretation of the emerging topics provides insight on what should be included in diagnostic criteria of DCM and creates interesting ideas. Based on topic analysis, the following concepts should be considered when developing DCM diagnostic criteria: • weakness and motor signs are required for the diagnosis • fine-motor and sensory disturbance of upper extremity is required for the diagnosis • combination of imaging and clinical findings is required for the diagnosis • disturbance to the patient from the symptoms is required for the diagnosis • presence of clinical findings is required for the diagnosis • description of “reinforcing” (or modifying) factors that can aid in the diagnosis in borderline cases, such as spinal instability, cord signal changes etc. • description of imaging abnormality • description of typical clinical features • description of the range of objective criteria
The concept of “modifying factors” is particularly significant as the presence of these may help to reduce the number of missed patients when too much focus is placed on the classic presentation. These factors may also increase the weight of specific imaging findings pertinent to the condition.
The ultimate goal of this work was to summarize previously published definitions of DCM in order to later develop standardized diagnostic criteria for this condition. This endeavor aims to improve patient care by facilitating earlier diagnosis and treatment and providing a reference tool for primary care physicians, allied health professionals and other specialists that encounter DCM. In addition, diagnostic criteria will help to standardize future research studies, enhance the generalizability of results and increase external validity. To date, there has been no review that has identified commonly used inclusion criteria to screen for eligible research participants. Notably, there is tremendous variability with respect to what has been used as criteria for diagnosing DCM. Although identifying patients with DCM may be simple to some specialists, there is a significant proportion of patients who are diagnosed, and subsequently treated, in a delayed fashion. Given the annual admission rates of DCM have markedly increased over the last 2 decades, there is a pressing need to identify patients early in their disease course and refer them for definitive management.3,34,35 Unfortunately, DCM is often misdiagnosed, particularly in milder forms, with a time between symptom onset and diagnosis often surpassing 3 years. This delay in diagnosis undoubtedly increases disease burden and results in incomplete postoperative recovery, impaired quality of life and life-long disability. 33
In addition to the insights provided here, the development of DCM diagnostic criteria should consider additional factors. 36 First, the degree of cervical canal stenosis and cord compression do not always correlate with the severity of DCM. As such, diagnostic criteria must emphasize the need to interpret these imaging findings in the context of relevant signs and symptoms. Second, patients with milder symptoms and subtler signs of cord compression may not fully meet criteria for diagnosis of DCM and should be classified into categories such as possible, probable or conditional. Finally, another important consideration is that each criteria must be well-defined in order to reduce ambiguity and variability in interpretation. For example, in the literature the definition of the term ‘weakness’ varied from subjective, functional impairment to an objective loss of muscle strength in the Medical Research Council 0-5 scale.
Limitations
This study has several limitations. First, the authors aimed to investigate DCM definitions in only the last 2 decades. This was decided due to the large number of publications on this topic. Second, due to the volume of existing reports, each publication was screened by a single author. However, all reviewers who screened the reports were trained in a teach-back manner in order to assure accuracy and reliability. Finally, objective evaluation metrics of the topic modeling process are not available. 27 Furthermore, there is no ground truth for the topic modeling process. There is also no assurance that the produced topics will be “informative or useful from a human point of view.” Topic modeling does, however, offer “interpretable, well represented and coherent groups of semantically similar documents”. While human interpretation of topics by domain experts is the standard, these methods carry the inherent limitations of subjective interpretation. However, the analysis of the data by a multi-disciplinary group with considerable expertise in the subject mitigates these shortcomings.
Conclusion
The current scoping review summarizes commonly used criteria for diagnosing DCM based on literature published in the last 2 decades. There is currently no universally-accepted clinical definition of DCM. There is a pressing need to standardize nomenclature and develop diagnostic criteria for DCM in order to facilitate timely diagnosis of this condition and implement appropriate management strategies. This study constitutes the first step of an effort to create a validated and widely accepted definition of DCM and diagnostic criteria. This study further exemplifies how topic modeling can provide a novel way to gain insights from the literature.
Supplemental Material
Supplemental Material - Scoping Review with Topic Modeling on the Diagnostic Criteria for Degenerative Cervical Myelopathy
Supplemental Material for Scoping Review with Topic Modeling on the Diagnostic Criteria for Degenerative Cervical Myelopathy by Stavros Matsoukas, Carl Moritz Zipser, Freschta Zipser-Mohammadzada, Najmeh Kheram, Andrea Boraschi, Zhilin Jiang, Lindsay Tetreault, Michael G. Fehlings, Benjamin M. Davies, and Konstantinos Margetis in Global Spine Journal
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Stavros Matsoukas: none, Carl Moritz Zipser: grant from the Swiss Paraplegia Foundation (FoKo_2019_01) and a grant from the International Foundation for Research in Paraplegia (P 190), Freschta Zipser-Mohammadzada: none, Najmeh Kheram: none, Andrea Boraschi: Funded by the Swdsiss National Science Foundation through Project Nr. 205321_182683, Zhilin Jiang: none, Lindsay Tetreault: none, Michael G. Fehlings: none, Benjamin M. Davies: none, Konstantinos Margetis: none.
ORCID iDs
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
