Abstract
Background:
Impairment of higher language functions associated with natural spontaneous speech in multiple sclerosis (MS) remains underexplored.
Objectives:
We presented a fully automated method for discriminating MS patients from healthy controls based on lexical and syntactic linguistic features.
Methods:
We enrolled 120 MS individuals with Expanded Disability Status Scale ranging from 1 to 6.5 and 120 age-, sex-, and education-matched healthy controls. Linguistic analysis was performed with fully automated methods based on automatic speech recognition and natural language processing techniques using eight lexical and syntactic features acquired from the spontaneous discourse. Fully automated annotations were compared with human annotations.
Results:
Compared with healthy controls, lexical impairment in MS consisted of an increase in content words (p = 0.037), a decrease in function words (p = 0.007), and overuse of verbs at the expense of noun (p = 0.047), while syntactic impairment manifested as shorter utterance length (p = 0.002), and low number of coordinate clause (p < 0.001). A fully automated language analysis approach enabled discrimination between MS and controls with an area under the curve of 0.70. A significant relationship was detected between shorter utterance length and lower symbol digit modalities test score (r = 0.25, p = 0.008). Strong associations between a majority of automatically and manually computed features were observed (r > 0.88, p < 0.001).
Conclusion:
Automated discourse analysis has the potential to provide an easy-to-implement and low-cost language-based biomarker of cognitive decline in MS for future clinical trials.
Keywords
Introduction
Multiple sclerosis (MS) is an inflammatory and neurodegenerative disease characterized by demyelination of axons, axonal and neuronal cells loss in the spinal cord and brain, including cortical gray matter.1–3 Among sensorimotor and other physical dysfunctions, cognitive decline represents a major cause of disability and affects a great proportion of patients with MS.4–6 Cognitive dysfunction has been reported in all MS phenotypes and may be present from the very early stage of the disease.6,7 The most affected cognitive domains include information processing speed, complex attention, memory, and executive function. 8 The impact of cognitive decline on quality of life can be enormous, affecting family, social, and/or professional status.9,10 However, cognitive relapses or worsening can occur independent of neurological worsening or subjective cognitive deficits,11–13 and therefore early, regular cognitive monitoring can help clinicians recognize MS disease activity and tailor treatment recommendations.
Language decline is one of the primary domains reflecting the potential progression of cognitive dysfunction. Indeed, a recent study has reported that 75% of MS individuals self-reported some form of language impairment. 14 However, the majority of research concerning language disorders in MS is based on verbal fluency or naming deficits. 15 These simple tests, though revealing certain linguistic problems,15–17 do not give a complete picture of potential language dysfunction in MS. It is thus conceivable that the assessment of the natural speech production during spontaneous discourse, comprising both the syntactic and lexical components of language, may draw a novel and more comprehensive frame to uncover potential MS linguistic deficits than naming or fluency tests.
While the syntax domain covers the rules of the word, clause, and utterance order, including the relationship among sentence elements and the language’s grammar, the lexical domain reflects the vocabulary and meaning of particular words. However, how the lexical and syntactic language domains in MS are affected still remains poorly understood. Only two previous studies based on a low sample of MS patients have reported a decline in language grammar, shorter sentence length, and limited syntactic complexity and vocabulary.18,19
Nowadays, progress in digital technologies provides new possibilities in language processing, including automatic speech recognition (ASR) and natural language processing (NLP) methods. ASR involves transforming spoken language into text format, which requires the application of statistical models and algorithms to transcribe audio recordings into written text. In contrast, NLP uses machine learning algorithms to analyze the structure of sentences, words, and phrases to gain an understanding of the context and meaning behind them. Combining these techniques makes it possible to comprehensively analyze potential language abnormalities in patients and identify early signs of cognitive deterioration. Indeed, language assessments have already been extensively studied in other neurodegenerative disorders, such as Alzheimer’s disease or mild cognitive impairment.20–22 Such objective language-based biomarkers may significantly benefit from automated data analysis acquired even outside the laboratory environment and provide screening tool to detect and monitor disease progression, helping professionals make necessary interventions 23 and improve patient outcomes. However, patients with MS may present not only with cognitive alterations but also with a motor speech disorder, 24 which both might pose difficulty to ASR and NLP techniques to transcribe and annotate speech correctly.
Importantly, the new concept of progression independent of relapse activity (PIRA) that emerged in recent years is today considered a critical part of disease progression in MS patients.25,26 Unfortunately, the PIRA is still often undetected not only due to the low sensitivity of measures monitoring tools, but also challenges associated with assessment burden (staff, time, financial support). In this context, identifying new sensitive and automated tools for monitoring cognitive/language performance is extremely important. As the first step in this effort, the present pilot and proof of concept study aims to assess whether the syntactic and lexical language abnormalities derived from natural language samples can be detected in MS based on objective, fully automated assessment using ASR and NLP techniques. The additional aim was to explore the sensitivity of discourse analysis in MS through a fully automated approach using state-of-the-art ASR and NLP techniques in comparison with manual text transcription and annotation.
Materials and methods
Study design and participants
Consecutive Czech patients with a clinically confirmed diagnosis of definite MS according to the revised McDonald Criteria 2010 27 were consecutively recruited at the Charles University and General University Hospital, Prague, Czech Republic. The inclusion criteria for MS patients were (1) relapse-free state for at least 30 days prior to testing, (2) completion of at least elementary education lasting 8 years, and (3) no neurological disorder (e.g. stroke, epilepsy, Huntington’s disease) or communication disorders that would significantly interfere with recording speech protocol (e.g. stuttering, aphasia, apraxia of speech) unrelated to MS. Each patient was ranked by a board-certified neurologist according to the Expanded Disability Status Scale (EDSS). 28 The neuropsychological assessment included the symbol digit modalities test (SDMT) 29 and the Rao adaptation of the Paced Auditory Serial Addition Test-3 (PASAT-3). 30 The SDMT assesses rapid information processing, working memory, and visual scanning, while the PASAT-3 represents multifactorial analysis that assesses auditory information processing speed and flexibility, cognitive processing speed, sustained and also divided attention and working memory functions, and calculation ability.31,32 The Beck Depression Inventory-Second Edition (BDI-II) was used for the assessment of depressive symptoms 33 and Fatigue Severity Scale (FSS) for the evaluation of the impact of fatigue. 34
In addition, age-, sex-, and education-matched healthy control group with no history of neurological or communication disorders was included solely to facilitate the evaluation of the extent of severity of language disorder in MS.
Speech examination
Speech data were recorded in one session in a quiet room with low ambient noise using a head-mounted condenser microphone (Beyerdynamic Opus 55, Heilbronn, Germany) placed approximately 5 cm from the participant’s mouth. Recordings were sampled at 48 kHz with 16-bit resolution.
Each participant underwent a speech assessment guided by a trained speech professional (M.N., T.T., J.R.). Participants were instructed to perform spontaneous discourse on a neutral, freely chosen topic. If the examinator observed the signs of anxiety or excitement, the task was repeated based on another self-chosen topic. The content of the discourse task was monitored and categorized into seven topics, including description of (1) current day (19%), (2) hobby (15%), (3) holiday (15%), (4) past event (14%), (5) childhood (12%), (6) work (11%), and (7) others (14%). The mean length of recordings was 123 s [standard deviation (SD) 16], and the mean number of words was 216 (SD 64). The task’s length was comparable with previous studies on lexical and syntactic features in patients with dementia.22,35
Speech transcription and annotation
The speech recordings were transcribed into text files using Google Cloud Speech-to-Text API, 36 chosen based on the comparison of the leading ASR software 37 based on the referred accuracy, documentation quality, and implementation difficulty. The paralinguistic phenomena, including empty and filled pauses (e.g. ‘mmm’, ‘ehm’), repetitive speech phenomena (i.e. repetition of phonemes or syllables), and non-verbal phenomena (i.e. laughing, coughing), were excluded from the transcripts; these paralinguistic phenomena were presented scarcely and did not substantially influence recording lengths. Text files were further processed using MorphoDiTa tool 38 with the Czech language model available from LIDNAT/CLARIN center. 39 Each word in the text was labeled with the corresponding word type. If a non-existing word was detected, it was eliminated from further analyses.
In addition to the automated analysis, we also conducted manual transcription and annotation. A speech specialist (M.Š.) transcribed and annotated each audio recording manually.
Linguistic analysis
We proceed with the analysis using eight linguistic features that have been studied in neurodegenerative diseases in previous literature22,35,40,41 The criteria for feature selection were as follows (1) covering complex aspects of the lexical domain, syntax, and distinctive patterns of cognitive impairment (i.e. features’ computational principle should be distinctive and thus low correlation among the parameters could be expected), and (2) the possibility for easy and robust implementation using available NLP tools to fully automate the analysis process.
We selected four following features to cover the lexical part of the language: content words, 40 function words, 40 moving-average type–token ratio, 42 and reference rate to reality; 35 for moving-average type–token ratio the window size was set to 57 words in accordance with our available data set and recommendations for determining the subject’s vocabulary. In addition, we investigated four syntactic features: n-grams, 43 coordinate clauses, 41 subordinate clauses, 41 and mean length of utterance. 44 See Table 1 for a detailed feature description.
Description of lexical and syntactic features.
Accepting only a limited number of features, we lower the probability of a type II error and reduce potential overfitting in the regression analysis. The linguistic features were found only weakly correlated (Pearson:|r|<0.49), except the content words and function words (Pearson: r = –0.88); despite a high correlation, we preferred to preserve the content words and function words due to the completeness of the lexical analysis. The analysis was conducted in MATLAB (MathWorks, Natick, MA).
Statistical analysis
A one-way analysis of covariance (ANCOVA) was used to discriminate MS from the control group based on linguistic features. Effect sizes were determined using eta squared (η²), with η² >0.009 indicating small, η² >0.059 indicating medium, and η² >0.139 indicating a large effect. The analysis was adjusted to the content of discourse (covariate). Partial correlation was used to assess the relationship between linguistic features and clinical data within the MS group, with age, sex, education, and content of discourse as covariates. Pearson correlation was applied to test for significant relationships between pairs of linguistic features obtained from the automated and manual datasets. The magnitude of agreement between features obtained from both data sets was measured as the root mean squared error normalized by the mean observed value (NRMSE). 45 All analyses used a two-tailed p < 0.05 threshold for statistically significant differences. Moreover, we performed a binary logistic regression followed by leave-one-subject-out cross-validation to evaluate the ability of linguistic feature combinations to differentiate between groups (i.e. accuracy, sensitivity, and specificity). Several different classification scenarios were evaluated, including classifiers based on the combination yielding the best accuracy. The subset of language features providing the best accuracy was searched automatically using a grid-search approach. Overall diagnostic accuracy was reported as the area under the curve (AUC), determined from the receiver operating characteristic curve.
Results
Participants
The MS group consisted of 120 participants (89 females) with a mean age of 43.8 (SD 10.9, range 18–74) years and a mean education of 14.7 (SD 2.9, range 10–22; see Table 2). A total of 94 patients were diagnosed with relapsing-remitting MS, 15 with secondary progressive MS, 8 with primary-progressive MS, and 3 with clinically isolated syndrome. MS patients had EDSS scores between 1.0 and 6.5. In addition, the healthy control group consisted of 120 age- and sex-matched participants (88 females) with a mean age of 45.8 (SD 17.6, range 18–73) years and mean education of 15.0 (SD 2.9, range 8–23). No between-group significance between MS and control groups was found for age (p = 0.45), education (p = 0.39), or sex (p = 1).
Clinical characteristics of MS patients.
Data are shown as mean (SD, range).
BDI-II, back depression inventory-second edition; EDSS, Expanded Disability Status Scale; FSS, Fatigue Severity Scale; PASAT-3, Paced Auditory Serial Addition Test-3; SD, standard deviation; SDMT, symbol digit modalities test.
Linguistic features and sensitivity analysis
Considering lexical features, the MS group was significantly discriminated from healthy controls by an increase in content words (p = 0.037, η² = 0.018) and a decrease in function words (p = 0.007, η² = 0.030; see Figure 1). Further word type analysis showed that the increase of content words in MS was predominantly caused by the significantly higher number of verbs in MS compared with controls (p = 0.003, η² = 0.037; see Table S1). A lower reference rate to reality in the MS group compared with healthy controls (p = 0.047, η² = 0.016) confirms the overuse of verbs at the expense of nouns. No significant difference was found between MS and healthy controls using moving-average type–token ratio (p = 0.22).

Results of linguistic analysis for lexical and syntactic features.
For syntactic features, MS showed shorter mean utterance length (p = 0.002, η² = 0.039) as well as low number of coordinate clauses (p < 0.001, η² = 0.086) compared with controls. No significant difference between MS and healthy controls was found for subordinate clauses (p = 0.43) and n-grams (p = 0.61).
Considering sensitivity analysis (see Figure 2), combining six features, including content words, function words, reference rate to reality, coordinate clauses, subordinate clauses, and mean length of utterance, led to the best achieved AUC of 0.70 (accuracy of 63.3%, specificity of 63.6%, and sensitivity of 63.1%).

Selected pairs of linguistic features contributing to the best classification accuracy with classification boundaries separating MS from controls.
Correlations between language and clinical markers
Examination of the relationship between linguistic features and clinical scales including EDSS, SDMT, and PASAT-3 revealed a correlation between mean length of utterance and SDMT (Partial correlation: r = 0.25, p = 0.008; see Figure 3). No significant relationships were observed between linguistic features and BDI-II and FSS.

Significant correlation between SDMT and mean length of utterance.
Comparing results from automated and manual data set
The ASR system achieved a word error rate of 22.4% compared with manually transcribed texts. Our analysis revealed very strong correlations (Pearson: r > 0.88, p < 0.001) between features extracted from the automated and manual data sets with NRMSE lower than 0.18, except for the mean length of utterance, where we found a moderate correlation (Pearson: r = 0.58, p < 0.001) with NRMSE of 0.44 (see Figure 4). The ANCOVA analysis allowed us to significantly differentiate MS patients from controls using the same linguistic features as in the case of automated analysis, except for content words, where we only achieved a trend toward significance (p = 0.086; see Figure S1).

Relationship between features extracted from automated and manual data set.
Discussion
This study proved the hypothesis of higher language function deficits in both lexical and syntactic domains in MS patients determined based on discourse analysis of spontaneous speech. Considering lexical features, the MS group showed a decrease in function words, demonstrating lower stress on grammar and proper sentence structure. The absence of function words in MS is compensated with an increased occurrence of content words. Indeed, detailed word type analysis revealed that MS patients tend to use verbs more frequently. Such overuse of verbs in MS was also demonstrated by decreased reference rate to reality, assessing the frequency of verbs related to nouns. For instance, the sentences ‘Yesterday, I went to the store and picked up some groceries like salad and carrots. Then I came back home and made dinner.’ could be formulated by MS patient as ‘I went to store yesterday. I bought salad, carrot, came back home, and made dinner.’. In this example, there is lower use of function words such as ‘and’ or ‘then’ and instead higher use of necessary content words, to express patient thought. No significant difference was found between MS and controls using moving-average type–token ratio, indicating that the vocabulary of MS patients is not restricted. Even though previous studies report some deficiencies in the lexical domain, their occurrence in MS seems to be rather rare.18,19,46 Thus, we may hypothesize that our findings in the lexical area are predominantly associated with grammatical deficits. This would also be in agreement with the previous study, indicating difficulty with grammar in MS. 18
Regarding the syntactic domain, the MS group manifests significantly shorter sentence lengths without the intention to evolve more complex sentences with various clauses, which is suggested by a decline in coordinate clauses. The difficulties with sentence structure in MS are also supported by reduced function words. For instance, the sentence ‘I went to the store and bought some red apples but forgot to add oranges, so I wasn’t happy.’ could be reformulated by the MS patient as ‘I went to the store and bought apples. I didn’t buy oranges.’. In this example, we can see that the thought is similar, but the sentences are more fragmented without the intention to expand them into a broader sentence with minor additional information. In line with our findings, syntax deficiencies, including shorter utterance length and difficulty with sentence comprehension, have already been reported in MS.18,46,47 No statistical difference between MS and controls was found using n-grams, indicating no difficulty with word finding and the necessity of repetition of the same phrase. This might be expected since reduced vocabulary in MS was not notable.
The alteration of language function in MS seems to reflect different neuropathological processes across brain regions rather than the cognitive decline captured by standardized neuropsychological assessment. Indeed, we could only find a weak but significant correlation between the mean length of utterance and SDMT, which captures a hallmark cognitive deficit in MS. The relevance of this finding can be supported by a previous study reporting that the frontal lobe region, known to be involved in cognitive impairment in MS, 48 plays a critical role in both syntactic processing and speed of processing. 49 Also, based on differences in speech performance between children and adults, the previous study has shown links between higher-order processing demands and syntactic complexity with utterance length. 50 However, our cognitive assessment only examined visual and auditory information processing speed. Hence, further studies are needed to investigate relationships between language performance and other cognitive domains, such as memory or executive functions, frequently affecting MS patients.51,52
This study aimed to systematically select linguistic features covering most of the language domains, which could be possible to analyze using known techniques, including ASR and NLP, and with minimum human control. The advantage of our solution based on only eight language features with interpretable neurocognitive behavior was that it did not require a large sample size to train the specific model and is ready to use for future clinical trials. Although we could find a high statistical significance between MS and controls across several language features, our classification experiment reached a relatively low discrimination accuracy with an AUC of 0.70. If diagnostic utility should be the primary endpoint, combining more features that quantify different properties of language impairment would likely lead to enhancement of reported accuracy. However, such accuracy seems to be still reasonable considering our relatively short recordings with an average number of 216 words. In the future, we expect significant enhancement in this approach by mass screening via smartphones outside a laboratory environment, allowing us to collect considerably longer audio recordings. 53
Despite a relatively high word error rate of 22.4% in the automated transcription, our automated approach yielded highly comparable results with manual analysis in terms of strong correlations and significant group differences between MS and controls among the language features in both data sets. This is crucial from the clinical point of view, as it is more important to achieve a correct estimate of the patient’s language performance than to obtain the precise transcription of individual words. The only exception was the mean length of utterance, characterized by the lowest correlation between manual and automated labels. This discrepancy is understandable, given the challenging task of predicting punctuation in spontaneous speech. 54 Although Google Cloud Speech-to-Text API presents a state-of-the-art ASR system, it is trained mainly using data sets with healthy speech, which might present a notable disadvantage for our tasks. Motor speech disorder, that is, spastic-ataxic dysarthria typically encountered in MS, 24 may cause more complications to the acoustic model of the ASR system that attempts to recognize a word based on acoustic information. In addition, we have found that MS manifests different word selection and sentence structures from the healthy controls. This might lead to difficulties for the language model that finds the probability of one word following another. Since the research in tuning ASR system using data sets with impaired speech already suggests significant improvements, 55 the accuracy of automated annotation could likely be significantly improved in the future. Furthermore, Czech poses a more challenging and complex task for ASR compared with more common languages such as English or German, 56 suggesting that follow-up research in such languages will likely already result in better automated speech transcription of patients with MS.
This is a proof-of-concept study showing that automated language analysis could potentially detect pathological cognitive performance in MS. Current cognitive measures dedicated to MS care have low sensitivity to detect cognitive change, are influenced by the learning effect, and other factors such as depression or fatigue.52,57 Furthermore, traditional monitoring tools, such as brief international cognitive assessment for MS, are relatively time-consuming, personnel-intensive, and sometimes not well tolerated by patients, especially when administered frequently. Therefore, there is an urgent need for sensitive and easily applied biomarkers able to monitor disease activity. The proposed automated analysis of lexical and syntactic deficits could represent a new, potentially more sensitive, easy-to-implement, and low-cost language-based biomarker of cognitive decline in MS for future clinical trials and routine clinical practice. However, future longitudinal studies are needed to describe the evolution and dynamics of language performance over time, to investigate whether the decline in language performance reflects PIRA,25,58 and to explore whether the decline in language-based biomarkers increases the sensitivity for the detection of disease activity beyond established assessment tools such as EDSS or quantitative measures of walking and hand function. In addition, technical improvement of language-based biomarker software and passive, unconscious (e.g. without required active participation of patients) screening of language performance through mobile application could further improve the sensitivity of this technique to detect cognitive change and enable very frequent or continuous monitoring. In summary, language-based biomarkers might have very high potential to revolutionize the way on how to detect disease progression, including PIRA, and thus enable clinicians to modify treatment as soon as possible to prevent further disability progression, leading to the improvement of everyday functioning, quality of life, and clinical care.
A potential limitation of this study is that our results are based solely on the Czech language. It is desirable to reproduce the findings across different languages to ensure the robustness of used linguistic features characterizing the cognitive language decline in MS. Also, we did not investigate BDI-II and FSS in healthy controls; therefore, we cannot entirely exclude the effect of depression and fatigue on cognitive linguistic function. 59 However, we did not reveal any significant correlation between linguistic features and BDI-II or FSS, indicating rather low influence of depression and fatigue on language performance in our MS cohort. Admittedly, since the extent of cognitive dysfunction may differ across various clinical subtypes of MS 60 and the majority of our sample had a relapsing-remitting course, future studies are encouraged to assess the sensitivity of syntactic and lexical language features across these differing MS subtypes.
In conclusion, we present an objective, fully automated method for discriminating MS patients from healthy controls using linguistic analysis. MS participants manifest a decline in both lexical and syntactic language domains. Language impairments in MS are prevalent in shorter, undeveloped sentences and grammatical incorrectness. If confirmed in future studies and improved sensitivity, this approach might provide a fully automated, easy-to-implement, and low-cost language-based biomarker of cognitive decline and disease progression in MS for future clinical trials and clinical practice.
Supplemental Material
sj-docx-1-tan-10.1177_17562864231180719 – Supplemental material for Lexical and syntactic deficits analyzed via automated natural language processing: the new monitoring tool in multiple sclerosis
Supplemental material, sj-docx-1-tan-10.1177_17562864231180719 for Lexical and syntactic deficits analyzed via automated natural language processing: the new monitoring tool in multiple sclerosis by Martin Šubert, Michal Novotný, Tereza Tykalová, Barbora Srpová, Lucie Friedová, Tomáš Uher, Dana Horáková and Jan Rusz in Therapeutic Advances in Neurological Disorders
Supplemental Material
sj-docx-2-tan-10.1177_17562864231180719 – Supplemental material for Lexical and syntactic deficits analyzed via automated natural language processing: the new monitoring tool in multiple sclerosis
Supplemental material, sj-docx-2-tan-10.1177_17562864231180719 for Lexical and syntactic deficits analyzed via automated natural language processing: the new monitoring tool in multiple sclerosis by Martin Šubert, Michal Novotný, Tereza Tykalová, Barbora Srpová, Lucie Friedová, Tomáš Uher, Dana Horáková and Jan Rusz in Therapeutic Advances in Neurological Disorders
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
