Sage Journals: Discover world-class research

Abstract

Objective: Accurately predicting functional outcomes after acute ischemic stroke is essential for healthcare institutions to optimize staffing and resource allocation. Although text mining has been applied to build such models, most prior studies emphasize traditional machine learning, with limited comparison to deep learning methods. Methods: Clinical text notes were collected from a Taiwanese hospital to build the experimental dataset. Four textual feature representation techniques were evaluated: bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Correspondingly, four predictive models were tested: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). Results: The best performance was obtained using BOW features with an SVM classifier. Feature fusion strategies, combining representations such as BOW + TF-IDF and BOW + BERT, also yielded strong performance. Notably, the BOW + TF-IDF combination with SVM achieved the lowest type I error, effectively minimizing the misclassification of patients with poor outcomes. Conclusion: Traditional machine learning methods outperformed deep learning models in this study. Among all combinations, BOW + TF-IDF features with SVM provided the most accurate predictions and lowest risk of false positives in stroke outcome prediction.

Keywords

acute ischemic stroke deep learning functional outcome prediction machine learning text mining

Introduction

Stroke is recognized as one of the leading causes of death worldwide. In addition to its fatal consequences, stroke is a major contributor to long-term disability in adults, placing a substantial burden on healthcare systems globally.^1–3 More than half of stroke survivors experience unfavorable outcomes,⁴ and older patients, in particular, tend to suffer functional decline between 18 and 60 months post-stroke.⁵

Given these challenges, it is crucial to develop an early warning system capable of accurately predicting a patient’s functional recovery following a stroke. Accurate outcome prediction can help patients and their families prepare for necessary post-acute care, while also enabling healthcare policymakers to strategically plan staffing and allocate resources for the medium- and long-term care of stroke patients.^6–8

Previous research on functional outcome prediction has utilized various types of data. These include structured data, such as demographics, stroke subtypes (e.g., total anterior circulation infarcts, partial anterior circulation infarcts), and the presence of cerebellar symptoms^9–14; imaging data, such as X-rays and angiographic images^15–17; and unstructured text data, such as clinical and radiology reports.^18–23

With the increasing availability of clinical text data and the advancement of text mining technologies, clinical text classification has become a prominent research area.^24–26 Many existing studies have employed traditional text mining techniques for feature extraction and representation, such as term frequency-inverse document frequency (TF-IDF), alongside traditional machine learning models like k-nearest neighbors (KNN) and support vector machines (SVM).^{18–22,27–29}

Meanwhile, deep learning techniques have been widely adopted for various text classification tasks. Models such as bidirectional encoder representations from transformers (BERT) for feature representation and convolutional neural networks (CNN) for prediction have demonstrated superior performance over traditional approaches in general text classification settings.^30,31 However, to date, there is a lack of studies evaluating the effectiveness of deep learning methods for predicting functional outcomes following stroke.

This study aims to fill this gap by comparing the performance of several well-established traditional machine learning and deep learning approaches for functional outcome prediction. In addition, we explore the impact of feature fusion—where multiple types of features from various sources are combined—on model performance. Prior work has shown that feature fusion often leads to improved classification accuracy compared to single-feature approaches.^32–35

The contributions of this paper are threefold. First, we systematically evaluate the predictive performance of traditional and deep learning methods for functional outcome prediction after stroke. Second, we identify the best-performing model, which can serve as a guideline for medical institutions to implement early warning systems and support proactive care planning. This model also provides a strong baseline for future research and potential performance enhancement. Third, we investigate the impact of concatenating multiple types of textual features on prediction accuracy, providing insights into the benefits of multi-feature integration in clinical text mining applications.

Methods

The text mining procedure

The text mining procedure for functional outcome prediction is illustrated in Figure 1. The dataset comprises narrative clinical text documents along with other relevant medical records. The process begins with text preprocessing, which involves selecting appropriate clinical notes and assigning class labels based on the patients’ functional outcomes. Subsequently, feature representation and model construction are carried out using both traditional machine learning and deep learning approaches. Finally, the predictive performance of each model is evaluated to identify the most effective method for forecasting functional outcomes following a stroke event.

Figure 1.

The text mining procedure for functional outcome prediction.

Data collection

The experimental dataset was collected from a local hospital in Taiwan and includes records of over 6000 patients who were hospitalized for ischemic stroke between 2006 and 2022.

Narrative clinical notes, specifically, admission notes documenting patients’ clinical symptoms, were extracted from the hospital’s electronic medical records (EMR) database. Figure 2 presents an example of such narrative clinical notes. After excluding records with missing data, a total of 5191 text documents corresponding to 5191 patients were retained for analysis.

This 65 year-old man has past medical history of hypertension with regular medical control at LMD for 2–3 years. This time, he suffered from dizziness with gradually progressive L’t side limb weakness and an unsteady gait was noted this afternoon. He denied fever, nausea, vomiting, diarrhea or numbness. Therefore, he visited our emergency room for help. His laboratory data were within the normal range. A CT scan of the brain showed no active brain lesion. After initial treatment at the emergency room. Under the impression of stroke and hypertension, he was admitted to the neurologic ward for further treatment.

Figure 2.

An example of the narrative clinical notes.

Each document is linked to a functional outcome evaluated using the modified Rankin Scale (mRS). An mRS score between 0 and 2 was categorized as a good outcome, while a score between three and six was considered a poor outcome. To comprehensively assess the performance of traditional and deep learning methods, each document was associated with four distinct mRS evaluations: at hospital discharge and at 30, 90, and 180 days post-stroke. Consequently, four separate experimental datasets were created to support functional outcome prediction at different time points. Table 1 summarizes the basic characteristics of these datasets.

Table 1.

Basic information about the experimental datasets.

	No. of data examples	No. of good/bad outcomes
mRS	5191	2396/2795
mRS 30	3759	2086/1673
mRS 90	3544	2164/1380
mRS 180	3168	2026/1142

Textual feature extraction and representation

In this study, each clinical text document was processed using four distinct feature extraction methods, resulting in four types of feature representations. The methods employed were bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), embeddings from language models (ELMo), and bidirectional encoder representations from transformers (BERT). Prior to feature extraction, text preprocessing was conducted, including the removal of specific punctuation marks (e.g., %, $, etc.) and the expansion of contractions (e.g., “can’t” to “cannot” and “don’t” to “do not”).

BOW and TF-IDF

The bag-of-words (BOW) method represents a document by calculating the term frequency of each word in a predefined dictionary, that is, the number of times a given term appears within the document. These word frequencies are then used as features to represent the document. However, not all words contribute equally to a document’s meaning; some may appear frequently but carry little informative value. To address this limitation, the term frequency-inverse document frequency (TF-IDF) method is employed. TF-IDF adjusts term weights by considering how common or rare a word is across the entire corpus, thereby emphasizing more informative and discriminative terms.³⁶

The TF-IDF is based on

{TF ‐ IDF}_{t, d} = {t f}_{t, d} \times \log (\frac{N}{{d f}_{t}}),

(1)

where

{t f}_{t, d}

means the frequency of term t in document d;

{d f}_{t}

is the document frequency of term t; and N indicates the total number of documents in a corpus.

For BOW and TF-IDF feature extraction, stop word removal and lemmatization were performed as part of the preprocessing steps. The Scikit-learn library in Python was used to implement both BOW and TF-IDF feature extraction. Following the recommendations of Dessi et al.,³⁷ Lin et al.,³⁸ and Sheikh et al.,³⁹ 300 terms were selected to represent each document, resulting in 300-dimensional feature vectors.

ELMo and BERT

Contextualized text representation is a dynamic word embedding technique that differs from prediction-based methods like Word2Vec. It was introduced to address the issue of polysemy, where a single word can have multiple meanings depending on context, which traditional word embedding methods fail to capture. This approach leverages deep learning architectures such as bidirectional long short-term memory (BiLSTM) networks.⁴⁰ Two representative models that utilize contextualized embeddings are Embeddings from Language Models (ELMo) and Bidirectional Encoder Representations from Transformers (BERT).

ELMo is based on a two-layer BiLSTM network that learns contextual information by processing input sequences in both forward and backward directions. The forward pass captures the current word along with its preceding context, while the backward pass captures the word along with its succeeding context. The final ELMo embedding is derived from a weighted sum of the internal states of the BiLSTM layers. In this study, ELMo features were extracted using pre-trained models from the AllenNLP library, producing 1024-dimensional feature vectors for each document.

In contrast, BERT is built on the transformer architecture, which utilizes self-attention mechanisms rather than recurrent layers. Unlike traditional encoder-decoder models that struggle to retain long-range dependencies, BERT’s attention mechanism allows the model to capture relationships between all words in a sentence simultaneously. The bidirectional nature of BERT enables it to understand context from both preceding and succeeding words, making it highly effective for language modeling. In this architecture, the transformer encoder generates rich contextual embeddings by assigning attention weights that help the model focus on the most relevant parts of the input sequence.

For this study, BERT features were extracted using pre-trained models from Google’s TensorFlow library. Each document was represented by a 768-dimensional feature vector.

Prediction techniques

This study applies four prediction (or classification) techniques to develop functional outcome prediction models: k-nearest neighbor (KNN), support vector machine (SVM), convolutional neural network (CNN), and long short-term memory (LSTM). Specifically, KNN and SVM are implemented using the Scikit-learn library, while CNN and LSTM are developed using the TensorFlow framework. To evaluate model performance, a 5-fold cross-validation strategy is employed, in which each experimental dataset is split into 80% training and 20% testing subsets. The area under the receiver operating characteristic (ROC) curve (AUC) is used as the primary evaluation metric to assess the predictive performance of the models.

KNN

The k-nearest neighbor (KNN) classifier operates by measuring the distance between an unknown test instance and its k nearest neighbors in the training set to determine its class label. Typically, the Euclidean distance is used as the distance metric. In the simplest case where k = 1, the class label of the single nearest neighbor is assigned to the test instance. When k is greater than 1 (e.g., k = 5), the final classification is determined by a majority vote among the class labels of the k nearest neighbors.⁴¹ In this study, the KNN classifier is implemented using the default settings of the Scikit-learn library, with k set to 5.

SVM

A support vector machine (SVM) employs a kernel function to transform the original feature space of a two-class training dataset into a higher-dimensional space, where a separating hyperplane can be constructed to distinguish between the two classes. The objective of the kernel function is to maximize the margin of the hyperplane, thereby improving the classifier’s ability to differentiate between classes. Commonly used kernel functions include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel.⁴² In this study, the SVM classifier is implemented using the default settings of the Scikit-learn library, which utilize a linear kernel and a regularization parameter of C = 1.0.

CNN

A convolutional neural network (CNN) consists of an input layer, convolutional layers, pooling layers, a flattening layer, fully connected layers, and an output layer. Originally developed for computer vision tasks, CNNs utilize convolutional layers to extract local features and generate feature maps, while pooling layers reduce the dimensionality of the data, thereby lowering the number of training parameters and computational complexity. This process helps retain essential information while enabling the extraction of deeper hierarchical features.⁴³

For text classification tasks, the input is typically represented as an $n \times k$ matrix, where n denotes the length of the sentence and k represents the dimensionality of each word vector. This matrix is fed into the convolutional layers to extract relevant textual features. The resulting feature maps are then downsampled via pooling layers, flattened, and passed through fully connected layers before being fed into the output layer for final classification.

LSTM

Long short-term memory (LSTM) networks⁴⁴ are a specialized type of recurrent neural network (RNN) designed to retain and utilize information over long sequences. Unlike traditional RNNs, LSTMs are capable of preserving dependencies between past and current inputs, making them particularly well-suited for sequential data. In LSTMs, the output at each time step is influenced not only by the current input but also by the information retained from previous time steps, allowing the model to capture temporal relationships effectively.

LSTMs address the common issues of vanishing and exploding gradients in standard RNNs through a gated architecture composed of three primary components: the forget gate, input gate, and output gate, along with a memory cell. These gates regulate the flow of information, determining what to keep, update, or discard from the memory cell. Each gate is controlled by learnable parameters, and their activation is determined by weighted combinations of the input data. This mechanism enables the model to protect and manage its internal memory state dynamically throughout the learning process.

Results

Single text feature prediction models

Table 2 presents the AUC scores achieved by the various prediction models for patients’ modified Rankin Scale (mRS) outcomes at the time of hospital discharge. The results indicate that the best-performing model combines support vector machine (SVM) with BOW as the text feature representation method. Furthermore, the findings suggest that the top-performing approaches are all based on traditional machine learning techniques, particularly those employing BOW or TF-IDF for feature extraction, and KNN or SVM for classification.

Table 2.

AUC rates from different prediction models for different feature representations.

	KNN	SVM	CNN	LSTM
BOW	0.611	0.773	0.521	0.496
TF-IDF	0.626	0.741	0.528	0.503
BERT	0.569	0.688	0.549	0.547
ELMo	0.513	0.513	0.532	0.514

Figure 3 compares the AUC scores obtained from different prediction models at 30, 90, and 180 days following a stroke. Once again, the KNN and SVM models, when combined with BOW or TF-IDF features, consistently outperform models using BERT and ELMo representations. Moreover, these traditional classifiers and feature extraction methods demonstrate superior performance in predicting long-term functional outcomes. Among them, the combination of BOW features and the SVM classifier emerges as the most effective approach.

Figure 3.

AUC rates from different prediction models for mRS scores 30, 90, and 180 days after a stroke.

In contrast, among the deep learning methods, both CNN and LSTM perform better when using BERT for text feature representation compared to the other methods. However, regardless of the feature representation used, CNN and LSTM consistently underperform relative to the traditional classifiers KNN and SVM.

Prediction models obtained by concatenating multiple text features

Table 3 presents the AUC scores achieved by various prediction models using different combinations of feature representations. The results indicate that SVM is the best-performing classifier, followed by KNN. Among the feature combinations, BOW + TF-IDF and BOW + BERT yield the highest performance when paired with SVM and KNN, highlighting the effectiveness of combining traditional and contextualized text representations.

Table 3.

AUC rates from different prediction models obtained by using different feature representation combinations.

	KNN	SVM	CNN	LSTM
BOW + TF-IDF	0.606	0.773	0.512	0.511
BOW + BERT	0.620	0.772	0.540	0.514
BOW + ELMo	0.519	0.547	0.532	0.505
TF-IDF + BERT	0.574	0.705	0.547	0.547
TF-IDF + ELMo	0.513	0.513	0.543	0.533
BERT + ELMo	0.510	0.518	0.540	0.475

Figure 4 compares the performance of various prediction models based on mRS scores at 30, 90, and 180 days post-stroke. For the KNN and SVM classifiers, the top three feature representation combinations are BOW + BERT, BOW + TF-IDF, and TF-IDF + BERT. Among these, BOW + BERT and BOW + TF-IDF achieve the best performance when used with the SVM classifier, with nearly identical results. However, the BOW + TF-IDF combination is recommended due to its lower feature dimensionality—600 features compared to 1068 for BOW + BERT—making it a more computationally efficient option. Additionally, this combination demonstrates superior performance for long-term functional outcome prediction.

Figure 4.

AUC rates from different prediction models for mRS scores 30, 90, and 180 days after a stroke obtained with different combined feature representation combinations.

For the deep learning models, the TF-IDF + BERT feature representation yields the best performance when used with the CNN classifier. In contrast, the LSTM model shows only a slight improvement with TF-IDF + BERT compared to other feature combinations for short-term functional outcome prediction. Although the BOW + BERT combination enables LSTM to achieve high AUC scores for predicting the mRS score at 30 days post-stroke (mRS30), the long-term prediction performance remains comparable across the different feature representation combinations.

Discussion

Overall, the two experimental studies demonstrate that traditional text mining approaches, specifically, feature representation methods such as BOW and TF-IDF, along with classification techniques like KNN and SVM, outperform deep learning-based approaches. This includes both feature representations such as BERT and ELMo, and classifiers such as CNN and LSTM, in the context of functional outcome prediction. Among the evaluated approaches, the best results are achieved using SVM in combination with BOW, BOW + TF-IDF, or BOW + BERT, all of which yield similarly high AUC scores.

While the performance of prediction models based on individual and combined feature representations was compared, the results indicate that using a feature fusion strategy, concatenating multiple types of text features, does not necessarily lead to improved prediction performance. Therefore, in addition to evaluating AUC scores, it is also critical to assess the impact of type I error in functional outcome prediction. Type I error refers to instances where the model incorrectly classifies patients with poor outcomes as having good outcomes. High type I error rates can mislead patients and their families in making follow-up care decisions and may negatively affect health policy planning, particularly in the allocation of medical personnel and resources for stroke rehabilitation.

Table 4 presents the type I error rates from SVM models using BOW, BOW + TF-IDF, and BOW + BERT feature representations. The results show that, regardless of the feature representation used, type I error increases significantly from short-term to long-term outcome prediction. This highlights the increased difficulty and uncertainty associated with predicting long-term functional outcomes after stroke.

Table 4.

Type I errors from SVM by BOW, BOW + TF-IDF, and BOW + BERT.

	BOW	BOW + TF-IDF	BOW + BERT
mRS	0.263	0.259 (−0.004)	0.267 (+0.004)
mRS 30	0.433	0.432 (−0.001)	0.438 (+0.005)
mRS 90	0.541	0.541 (−0)	0.555 (+0.014)
mRS 180	0.615	0.613 (−0.002)	0.631 (+0.016)

However, a slight reduction in type I error is observed when using the BOW + TF-IDF feature representation with SVM, compared to using BOW alone as the baseline. Although the performance difference is modest, it holds practical significance in real-world applications. For instance, among 1000 stroke patients with poor outcomes at discharge (as indicated by their mRS scores), the SVM model using BOW + TF-IDF correctly classifies approximately four more patients than the model using only BOW.

Therefore, despite the increased computational complexity associated with extracting both BOW and TF-IDF features (resulting in 600 feature dimensions) and training the corresponding SVM model, the use of BOW + TF-IDF is recommended for feature representation in developing SVM-based functional outcome prediction models.

While this study provides an empirical comparison of several well-known traditional machine learning and deep learning approaches to identify the most effective feature representation methods and classification techniques for functional outcome prediction, several limitations remain that warrant further investigation:

(1) Class Imbalance: The experimental datasets used in this study exhibit class imbalance. Future research could apply data re-sampling or augmentation techniques to re-balance the training sets and evaluate whether these methods improve prediction performance.

(2) Alternative BERT Variants: In addition to conventional BERT embeddings, domain-specific models such as ClinicalBERT could be explored to assess whether contextual embeddings tailored to medical texts yield better results.

(3) Multimodal Learning: Future work could also incorporate multimodal learning techniques by integrating structured data, imaging data, and clinical text. Such an approach may enhance prediction accuracy by leveraging complementary information across modalities.

(4) Missing Data Handling: This study excluded records with missing data from the original EMR database. Employing imputation techniques to recover missing values would increase the dataset size and may lead to improved model performance. It would be worthwhile to investigate whether models trained on imputed datasets outperform those developed using only complete cases.

Conclusion

This study focuses on the application of text mining techniques for predicting the functional outcomes of stroke patients, incorporating both traditional machine learning and deep learning approaches. Specifically, bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF) were used as traditional feature representation methods, while k-nearest neighbor (KNN) and support vector machine (SVM) served as the corresponding prediction models. For deep learning-based methods, embeddings from language models (ELMo) and bidirectional encoder representations from transformers (BERT) were used as pre-trained feature representations, with convolutional neural networks (CNN) and long short-term memory networks (LSTM) employed as the prediction models.

Experimental results based on narrative clinical notes of stroke patients collected from a Taiwanese hospital demonstrated that traditional machine learning methods significantly outperformed their deep learning counterparts. In particular, the best performance was achieved using BOW for feature representation and SVM for classification.

Additionally, this study explored the use of a feature fusion strategy by concatenating multiple types of features. However, the results revealed that feature fusion does not necessarily enhance model performance. Among the various feature combinations, the best performance with SVM was observed using BOW + TF-IDF and BOW + BERT, both of which performed comparably to SVM with BOW alone, without statistically significant differences in AUC scores.

However, further analysis of type I error, defined as the misclassification of patients with poor outcomes into the good outcome class, revealed that the lowest error rate was achieved using the BOW + TF-IDF combination with the SVM model.

Therefore, considering both high prediction accuracy and reduced type I error, the combination of BOW + TF-IDF for feature representation and SVM for classification is recommended for functional outcome prediction in stroke patients.

Footnotes

ORCID iD

Chih-Fong Tsai

Ethics considerations

This study was approved by the Ditmanson Medical Foundation Chia-Yi Christian Hospital Institutional Review Board (CYCH-IRB No.2022086). Patient identifiers were removed to ensure patient confidentiality and privacy. In addition, patient consent was not required and patient data will not be shared with third parties.

Author contributions

Conceptualization: Yu-Hsiang Su and Chih-Fong Tsai; Methodology: Chih-Fong Tsai; Software: Chih-Fong Tsai; Supervision: Yu-Hsiang Su; Validation: Chih-Fong Tsai; Resources: Yu-Hsiang Su; Data curation: Yu-Hsiang Su; Writing – Original Draft: Yu-Hsiang Su and Chih-Fong Tsai; Writing – Review & Editing: Yu-Hsiang Su and Chih-Fong Tsai.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ditmanson Medical Foundation Chia-Yi Christian Hospital (grant number R112-022-2).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Fan

, et al. Global burden, risk factor analysis, and prediction study of ischemic stroke, 1990-2030. Neurology 2023; 101(2): e137–e150.

Katan

Luft

. Global burden of stroke. Semin Neurol 2018; 38(2): 208–211.

Markus

. The global burden of stroke. Int J Stroke 2023; 18(6): 632–633.

Campbell

BCV

Khatri

. Stroke. Lancet 2020; 396(10244): 129–142.

Shin

Lee

Chang

, et al. Multifaceted assessment of functional outcomes in survivors of first-time stroke. JAMA Netw Open 2022; 5(9): e2233094.

Cunningham

. The outcome wheel: a potential tool for shared decision-making in ischemic stroke thrombolysis. CJEM: Canadian Journal of Emergency Medical Care 2008; 10(6): 545–551.

Hung

M-C

Hsieh

C-L

Hwang

J-S

, et al. Estimation of the long-term care needs of stroke patients by integrating functional disability and survival. PLoS One 2013; 8(10): e75605.

Hsieh

C-Y

Lee

T-H

Chang

K-C

Taiwan Stroke Society . A nationwide plan for postacute care of stroke in Taiwan. Int J Stroke: Official Journal of the International Stroke Society 2014; 9(1): E3.

Abujaber

Albalkhi

Imam

, et al. Predicting 90-day prognosis in ischemic stroke patients post thrombolysis using machine learning. J Personalized Med 2023; 13(11): 1555.

10.

Abujaber

Alkhawaldeh

Imam

, et al. Predicting 90-day prognosis for patients with stroke: a machine learning approach. Front Neurol 2023; 14: 1270767.

11.

Abujaber

Albalkhi

Imam

, et al. Machine learning-based prediction of 90-day prognosis and in-hospital mortality in hemorrhagic stroke patients. Sci Rep 2025; 15: 16242.

12.

Douiri

Grace

Sarker

, et al. Patient-specific prediction of functional recovery after stroke. Int J Stroke 2017; 12(5): 539–548.

13.

Kongsawasdi

Klaphajone

Wivatvongvana

, et al. Prognostic factors of functional outcome assessed by using the modified Rankin scale in subacute ischemic stroke. J Clin Med Res 2019; 11(5): 375–382.

14.

Veerbeek

Pohl

Luft

, et al. External validation and extension of the early prediction of functional outcome after stroke (EPOS) prediction model for upper limb outcome 3 months after stroke. PLoS One 2022; 17(8): e0272777.

15.

Herzog

Kook

Gotschi

, et al. Deep transformation models for functional outcome prediction after acute ischemic stroke. Biom J 2023; 65(6): 2100379.

16.

Hilbert

Ramos

van Os

HJA

, et al. Data-efficient deep learning of radiological image data for outcome prediction after endovascular treatment of patients with acute ischemic stroke. Comput Biol Med 2019; 115: 103516.

17.

Liu

Ouyang

, et al. Functional outcome prediction in acute ischemic stroke using a fused imaging and clinical deep learning model. Stroke 2023; 54(9): 2316–2327.

18.

Garg

Naidech

, et al. Automating ischemic stroke subtype classification using machine learning and natural language processing. J Stroke Cerebrovasc Dis 2019; 28(7): 2045–2051.

19.

Kim

Zhu

Obeid

, et al. Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS One 2019; 14(2): e0212778.

20.

Ong

Orfanoudaki

Zhang

, et al. Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports. PLoS One 2020; 15(6): e0234908.

21.

Sedghi

Weber

Thomo

, et al. Mining clinical text for stroke prediction. Netw Model Anal Health Inform Bioinforma 2015; 4: 16.

22.

Sung

S-F

Lin

. EMR-based phenotyping of ischemic stroke using supervised machine learning and text mining techniques. IEEE J Biomed Health Inform 2020; 24(10): 2922–2931.

23.

Sung

S-F

Hsieh

C-Y

Y-H

. Early prediction of functional outcomes after acute ischemic stroke using unstructured clinical text: retrospective cohort study. JMIR Med Inform 2022; 10(2): e29806.

24.

Altınel

Ganiz

. Semantic text classification: a survey of past and recent advances. Inf Process Manag 2018; 54(6): 1129–1153.

25.

Mujtaba

Shuib

Idris

, et al. Clinical text classification research trends: systematic literature review and open issues. Expert Syst Appl 2019; 116: 494–520.

26.

Percha

. Modern clinical text mining: a guide and review. Annu Rev Biomed Data Sci 2021; 4: 165–187.

27.

Abujaber

Albalkhi

Imam

, et al. Machine learning-based prognostication of mortality in stroke patients. Heliyon 2024; 10(7): e28869.

28.

Abujaber

Imam

Albalkhi

, et al. Utilizing machine learning to facilitate the early diagnosis of posterior circulation stroke. BMC Neurol 2024; 24: 156.

29.

Abujaber

Yaseen

Nashwan

, et al. Prediction of stroke-associated hospital-acquired pneumonia: machine learning approach. J Stroke Cerebrovasc Dis 2025; 34(2): 108200.

30.

Minaee

Kalchbrenner

Cambria

, et al. Deep learning-based text classification: a comprehensive review. ACM Comput Surv 2021; 54(3): 1–40.

31.

Peng

, et al. A survey on text classification: from traditional to deep learning. ACM Trans Intell Syst Technol 2022; 13(2): 31–41.

32.

Cengil

Çınar

. The effect of deep feature concatenation in the classification problem: an approach on COVID-19 disease detection. Int J Imag Syst Technol 2022; 32(1): 26–40.

33.

Liu

, et al. Multi-type feature fusion based on graph neural network for drug-drug interaction prediction. BMC Bioinf 2022; 23: 224.

34.

Wang

Yang

Zeng

, et al. Shallow and deep feature fusion for digital audio tampering detection. EURASIP J Appl Signal Process 2022; 2022: 69.

35.

Wang

Deng

Shao

, et al. Multi-scale feature fusion for histopathological image categorisation in breast cancer. Comput Methods Biomech Biomed Eng: Imaging & Visualization 2023; 11(6): 2350–2362.

36.

Manning

Raghavan

Schütze

. Scoring, term weighting, and the vector space model. In: Manning (ed) Introduction to information retrieval, . Cambridge University Press, 2008, pp. 109–133.

37.

Dessi

Helaoui

Kumar

, et al. TF-IDF vs word embeddings for morbidity identification in clinical notes: an initial study. Proceedings of the first workshop on smart personal health interfaces. Cagliari, Italy, March 17, 2020, pp. 1–12.

38.

Lin

W-C

Tsai

C-F

Chen

. Factors affecting text mining based stock prediction: text feature representations, machine learning models, and news platforms. Appl Soft Comput 2022; 130: 109673.

39.

Sheikh

Illina

Fohr

, et al. Learning word importance with the neural bag-of-words model. Proceedings of the 1st workshop on representation learning for NLP. Berlin, Germany, August 11, 2016, pp. 222–229.

40.

Melamud

Goldberger

Dagan

. Context2vec: learning generic context embedding with bidirectional LSTM. International conference on computational natural language learning. Berlin, Germany, August 11 – 12, 2016, pp. 51–61.

41.

Cover

Hart

. Nearest neighbor pattern classification. IEEE Trans Inf Theor 1967; 13(1): 21–27.

42.

Chang

Lin

. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011; 2: 1–27.

43.

Venkatesan

. Convolutional neural networks in visual computing: a concise guide. CRC Press, 2017.

44.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Comput 1997; 9(8): 1735–1780.

Predicting functional outcomes after a stroke event by clinical text notes: A comparative study of traditional machine learning and deep learning methods

Abstract

Keywords

Introduction

Methods

The text mining procedure

Data collection

Textual feature extraction and representation

BOW and TF-IDF

ELMo and BERT

Prediction techniques

KNN

SVM

CNN

LSTM

Results

Single text feature prediction models

Prediction models obtained by concatenating multiple text features

Discussion

Conclusion

Footnotes

ORCID iD

Ethics considerations

Author contributions

Funding

Declaration of conflicting interests

References