Ensemble learning based functional independence ability estimator for pediatric brain tumor survivors

Abstract

A history of brain tumor strongly affects children’s cognitive abilities, performance of daily activities, quality of life, and functional outcomes. In light of the difficulties in cognition, communication, physical skills, and behavior that these patients may encounter, occupational therapists should perform a comprehensive needs-led assessment of their global functioning after recovery. Such an assessment would ensure that the patients receive adequate support and services at school, at home, and in the community. By predicting the functional activity performance of children with a history of brain tumor, clinical workers can determine the progress of their ability recovery and the optimal treatment plan. We selected several features for testing and employed common machine learning models to predict Functional Independence Measure (WeeFIM) scores. The ensemble learning models exhibited stronger predictive performance than did the individual machine learning models. The ensemble learning models effectively predicted WeeFIM scores. Machine learning models can help clinical workers predict the functional assessment scores of patients with childhood brain tumors. This study used machine learning models to predict the WeeFIM scores of patients with childhood brain tumors and to demonstrate that ensemble machine learning models are more suitable for this task than are individual machine learning models.

Keywords

cognitive and clinical estimation ensemble learning machine learning pediatric brain tumor survivors WeeFIM occupational therapy

Introduction

Brain tumors are the most common pediatric solid tumors and have the highest cancer death rate among children. The incidence of primary central nervous system (CNS) tumors in children is approximately 0.003%.¹ In 2013, approximately 3050 pediatric patients were diagnosed with benign or malignant primary CNS tumors.² Brain and CNS tumors have more than 100 histopathological subtypes, each with varying incidence depending on patient age and subtype. The incidence of CNS tumors in children differs by country and region, ranging from 1.12–5.14 cases for every 100,000 people, with the highest incidence being in the United States.³

Gliomas arising from neuroglia cells are the most common form of childhood brain tumors (CBT). The incidence and survival rate of gliomas vary depending on tumor site and histopathological subtype. In particular, astrocytic tumors account for 40%–50% of CNS tumors in children.¹ Medulloblastomas are the most common embryonal tumor and comprise 10%–25% of CBTs.^4,5 These tumors only occur in the posterior fossa and may cause leptomeningeal spread; their treatment method combines surgery and radiotherapy (for patients younger than 3 years old).⁴ AT/RT is a rare and invasive embryonal tumor⁶ most often found in children younger than 3 years old and has unfavorable prognosis.³ Current treatment methods for AT/RT remain controversial because of a lack of randomized controlled trials; thus, physicians have yet to reach a consensus on the optimal treatment method. Current treatments for AT/RT consist of multidrug therapy and radiotherapy.⁷ In addition, ependymoma is a rare tumor found in the neuroectoderm and is capable of local extension and metastasis.⁸ The tumor is the third most common CBT and accounts for 8%–10% of CNS tumors in children. The current treatment method for ependymoma is removal through surgery with strict safety measures followed by radiotherapy at the primary site of the tumor.⁷

The continuing expansion of oncology knowledge, increase in surgery treatment success rate, and advancements in medical treatment methods have contributed to an increase in the CBT survival rate.⁹ However, the literature has reported that survivors of brain tumors must overcome numerous problems after their recovery. For these patients, overcoming cancer is the first obstacle among many after diagnosis. Despite the advancements in research and treatment technology, these patients are subject to negative physiopsychological, sociopsychological, and neuropsychological influences.⁹ Approximately 60% of these patients have at least one disability, such as visual, motor, cognitive and neural disability, or endocrine complications.⁹ The effects of tumor site and medical treatment may be related to impairments in patients’ cognitive, behavioral, or bodily functions, including those in executive, memory, motor, visual, spatial, and linguistic function.^9,10 These impairments may have long-term effects on the quality of life and functional outcomes of CBT survivors.¹¹ In light of the difficulties in cognition, communication, physical skills, and behavior experienced by these patients, occupational therapists should perform a comprehensive needs-led assessment of a child’s global functioning after recovery. This assessment ensures that the patient receives adequate support and services at school, at home, and in the community.¹¹ However, from an occupational therapy perspective, these impairments reflect the effect of cognitive function on a patient’s ability to live independently and demonstrated competence. Therefore, the early recognition of cognitive function disability is crucial to providing timely intervention and preventing the rare occurrence of secondary behaviors in the patient’s daily life and psychological trauma.¹²

The Wechsler Intelligence Scale for Children, 4th Edition (WISC-IV) is the gold standard for assessing the intellectual ability of children aged 6–16 years. The WISC-IV employs four indices, namely the verbal comprehension index (VCI), processing speed index (PSI), working memory index (WMI), and perceptual reasoning index (PRI)¹³ to assess patients, devise clinical treatment plans, and guide educational and nursing plans for gifted children and those with mental and learning disabilities.

The Functional Independence Measure for Children (WeeFIM) is an 18-item, 7-level ordinal scale instrument for measuring the performance of essential daily function in children. The scale assesses three domains, namely selfcare, mobility, and cognition, through interviews or observation of a child’s performance of a task on the basis of criterion standards.¹⁴ The WeeFIM is divided into two main functional streams, namely “dependent” (i.e. requiring assistance: score of 1–5) and “independent” (i.e. not requiring assistance: score of 6–7). Scores of 1 (total assistance) and 2 (maximal assistance) are classified as “complete dependence”; scores of 3 (moderate assistance), 4 (minimal contact assistance), and 5 (supervision or set-up) are classified as “modified dependence”; and scores of 6 (modified independence) and 7 (complete independence) are classified as “independent.” The WeeFIM is a commonly used instrument to evaluate children’s functional independence.

Machine learning (ML) models have been applied to various aspects of health care. Given the complexity of big health care data, a novel framework can provide new applications for identifying optimal innovative health-care management practices.¹⁵ Health-related data are stored in repositories managed and controlled by various entities. For example, the government manages electronic health records, health-care providers control electronic medical records, and patients directly manage their personal health records.¹⁶ Numerous studies have analyzed health records.^15–17 Some have applied medical imaging data to brain tumor classification^18–24 and grading.^25,26 Studies on ML prediction have also identified predictive risk factors for seizures on the basis of topographic brain tumor anatomy.²⁷ Radiomics features²⁸ can be used to predict H3 K27M mutation and survival after metastasis.²⁹ ML has been widely applied in various types of medical prediction. For example, an ML model was used to predict the incidence of cancer in Europe³⁰ and to assess the influence of data integration on the predictive ability of a caner survival model.³¹ A deep-learning method was used to classify types of prostate cancer and as an effective auxiliary tool for decision-making regarding prostate cancer.³² Microarray gene expression has been used as the basis for ensemble-learning feature selection in the diagnosis of prostate cancer.³³ ML has also been applied to dynamic sentiment analysis for breast cancer.³⁴ In addition, ML has been used to establish a preliminary screening and diagnosis tool for Guillain–Barré syndrome subtypes,³⁵ to predict the health condition of children with cerebral palsy,³⁶ and to preoperatively predict surgical morbidity in children.³⁷ However, no study has used ML models to predict the functional performance of CBT survivors.

This study used ML to determine the functional independence of patients with CBT and the importance of each WeeFIM index. This study made the following contributions: (1) applying ML algorithms to predict the WeeFIM score of patients with CBT; (2) analyzing the performance of each ML algorithm in WeeFIM score prediction; (3) analyzing the effect of each input variable on the performance of each ML algorithm; and (4) providing suggestions for subsequent studies on the use of ML algorithms to predict the functional independence ability of patients with CBT.

Proposed framework

Figure 1 presents the application framework. Patients with CBT mostly undergo surgery to remove the tumor or receive extensive radiotherapy or chemotherapy treatment during the inpatient period. After the patient begins to receive treatment, the medical team administers a rehabilitation intervention based on the patient’s condition to return the patient’s function to the premorbid state.³⁸ We collected clinical data as input data for the proposed ML models and employed the WeeFIM subscale scores and total score to predict the functional independence ability of patients with CBT. The prediction results can provide reference for clinical personnel, including rehabilitation physicians, occupational therapists in the rehabilitation department, and physiotherapist in the rehabilitation department, to determine the optimal rehabilitation plan for each patient.

Figure 1.

The proposed framework.

Methods

This section provides a brief introduction of the ML algorithms we applied for WeeFIM prediction, namely AdaBoost, decision tree (DT), multilayer perceptron (MLP), k-nearest neighbors regression (k-NNR), random forest (RF), support vector regression (SVR), and ensemble learning models, and details the verification procedures in the experiment. The following section explains the experiment.

k-Nearest neighbors regression

The NNR algorithm was the simplest ML algorithm adopted in this study and is the oldest and easiest regression method. The instance selection algorithm for k-NNR is a wrapper algorithm.^39,40 In this study, the number of neighbors was set to 5.

Support vector regression

The SVR is a supervised ML algorithm used to process regression tasks. The SVR balances model complexity with training errors and performs adequately in processing high-dimensional data.⁴¹ We employed the radial basis function as the kernel and assigned the function a kernel coefficient of 0.1.

Decision tree

Decision tree is a data mining classification technique commonly used in various fields to create classification systems based on multiple covariates and to develop prediction algorithms for a target variable.^42,43 DT has been widely applied in medical research.⁴² In this study, the maximum depth of the tree was set to 15.

Random forest

Random forest⁴⁴ combines the power of multiple DTs. Each tree is dependent on the random vector value of the independent sample, and every tree in the forest exhibits the same distribution. In this study, we set the number of trees to 100 and the maximum depth to 15.

Multilayer perceptron

Multilayer perceptrons,⁴⁵ also known as a “multilayer feedforward neural networks,” is the most commonly used neural network classification technique. In this study, we selected the sigmoid function as the activation function and used the stochastic gradient-based optimizer. The batch size was set to 4, and the maximum number of iterations was set to 100. The solver for weight optimization is Adam. One hidden layer (with 100 neurons) was embedded in the MLP model.

AdaBoost

AdaBoost⁴⁶ is an excellent boosting algorithm with a sound theoretical basis and has been successfully practically applied. AdaBoost is an ensemble algorithm but did not exhibit notable performance in our study. The maximum number of estimators was set to 100, and the linear loss function was employed.

MLP + MLP + MLP

In the first ensemble learning model, we combined three MLP frameworks. The sigmoid function was selected as the activation function, the stochastic gradient-based optimizer was employed, and the maximum number of iterations was set to 100. The batch sizes were 1, 4, and 16 for each MLP model to perform training on different batch sizes. By combining the three MLP models, the prediction accuracy of the learning framework increased.

RF + MLP + MLP

The second ensemble learning model combined two MLP frameworks and an RF framework. The activation function of the MLP frameworks was the sigmoid function, the stochastic gradient-based optimizer was employed, and the maximum number of iterations was set to 100. The batch sizes of the MLP frameworks were 1 and 4. In the RF framework, the number of trees was set to 100, and the maximum depth was set to 15.

RF + MLP + Adaboost

For the third ensemble learning model, an MLP, RF, and AdaBoost framework were combined. The activation function of the MLP framework was the sigmoid function, and the stochastic gradient-based optimizer was employed. The maximum number of iterations and batch size were set to 100 and 4, respectively. The number of trees and the maximum depth of the RF framework were set to 100 and 15, respectively. The maximum number of estimators in the AdaBoost framework was set to 100, and the linear loss function was adopted.

RF + MLP

In the fourth ensemble learning model, we combined an MLP framework with an RF framework. We selected the sigmoid function as the activation function, adopted the stochastic gradient-based optimizer, and set the maximum number of iterations and batch size to 100 and 4, respectively. For the RF framework, we set the number of trees and maximum depth to 100 and 15, respectively.

The ensemble learning models (i.e. MLP + MLP + MLP, RF + MLP + MLP, RF + MLP + AdaBoost, and RF + MLP) were tested, and the prediction results were compiled. The output results were determined through voting to achieve higher prediction accuracy than that of a single prediction.

This study adopted an ensemble averaging method to combine several ML models. Two commonly used ensemble averaging methods are the simple averaging method, represented by (1), and the weighing averaging method, represented by (2).

G (x) = \frac{1}{T} \sum_{t = 1}^{T} G_{t} (x)

(1)

G (x) = \frac{1}{\sum_{t = 1}^{T} c_{t}} \sum_{t = 1}^{T} {c_{t} \cdot G}_{t} (x)

(2)

where x denotes the input value, G(x) denotes the final output value, G_t(x) is the output value of model t, T denotes the number of models, and c_t represents a weighted value for model t. The simple averaging method involves setting the weighted value for the output of each model to 1; each model has equal importance. The weighing averaging method accounts for the weighted value of the output of each model separately to calculate the weighted average. In this experiment, no substantial difference in performance was observed among the models, and therefore, the simple averaging method was used to predict performance. The results revealed that the simple averaging method was superior to a single model in terms of prediction.

Figure 2 presents the processing flowchart of the algorithms. First, we preprocessed the data and converted features with concentrated data into input formats suitable for each ML algorithm. During data preprocessing, we ensured that each dimension had equal influence on the ML models. The leave-one-out cross-validation method, the most rigorous cross-validation method, was adopted for performance analysis to determine the advantages and disadvantages of each ML algorithm.

Figure 2.

The processing flowchart of the ML models.

This study used three commonly used assessment indicators, namely mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R² score) for model assessment.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |

(3)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(4)

R^{2} = 1 - \frac{\sum_{i}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i}^{n} {(y_{i} - \bar{y})}^{2}}

(5)

Variable n denotes the number of data points, $y_{i}$ denotes the value of the ith data point, $\hat{y_{i}}$ denotes the predictive value of the ith data point, and $\bar{y}$ is the average value of the data. The indicators can be used to compare model performance, thereby providing reference for selecting a model for medical research and other applications.

Results

The results section is divided into a description of the data and the results of prediction. The data description section provides an introduction to and preliminary analysis of the database; the prediction results section presents the design of the experiment and a comparison of the performance between the ML ensemble models and WeeFIM prediction method.

Data description

We referenced a dataset on patients with CBT^47–48 from May 2021 for an experiment in the Eugenio Medea Scientific Institute in Bosisio Parini, Italy. The original dataset consisted of 78 patients, five of whom we excluded as outliers. Therefore, the experiment dataset consisted of 73 patients with brain tumors diagnosed during the developmental stage, and their age ranged from 6 to 18 years.

The Shapiro-Wilk test was adopted to explore the normality of the 36 features. This test is the most used normality test and can determine whether the random selection of samples has normality. A smaller W value indicates the rejection of normality.⁴⁹ Figure 3 presents the Shapiro ranking, in which high values for age at diagnosis, tumor type, time since diagnosis, VCI, VSI, FSIQ, WMI, PSI, selfcare bathing, selfcare dressing (upper), selfcare dressing (lower), mobility stairs, cognition problem-solving, WeeFIM selfcare, WeeFIM cognition, and WeeFIM total were observed. This suggested that these features had normality.

Figure 3.

Shapiro ranking of the features.

Figure 4 presents the Pearson matrix and the correlation between each feature. Red and blue indicate positive and negative correlations, respectively, with darker colors representing higher degrees of correlation. Figure 5 depicts the relationship between the WeeFIM scores in the dataset. The diagonal kernel density estimation plot indicates normality in the WeeFIM selfcare, WeeFIM cognition, WeeFIM mobility, and WeeFIM total data. Pairplot was used to represent the relationships between two variables (e.g. linear or nonlinear and whether two variables were significantly correlated). A slanted linear line represents a linear regression. Scores closer to the regression line indicate a stronger correlation.

Figure 4.

Pearson matrix of the features.

Figure 5.

The relationship between the WeeFIM scores in the dataset.

Most of the scores for WeeFIM total and WeeFIM selfcare were concentrated around the diagonal line (Figure 5). Therefore, the highest correlation was between WeeFIM total and WeeFIM selfcare, followed by that between WeeFIM total and WeeFIM mobility and that between WeeFIM total and WeeFIM cognition.

We classified features with concentrated data into four groups, namely age-related features (i.e. age at diagnosis, time since diagnosis, and age at assessment), tumor-related features (i.e. tumor site, tumor type, chemotherapy, radiotherapy, surgery, and hydrocephalus), intellectual assessment subtest scores (i.e. VCI, VSI, FSIQ, WMI, and PSI), and 18 subitem scores on the WeeFIM scale. We selected several features for testing and divided them into 15 sets from Set A to Set O. We then employed common ML models (RF, MLP, AdaBoost, SVR, DT, and NNR) to predict WeeFIM total, WeeFIM selfcare, WeeFIM mobility, and WeeFIM cognition scores.

Table 1 presents the selected features. Through the experiment design, this study explored the correlations among various assessement reports and predicted WeeFIM scores (e.g. the correlations among prediction models for age, tumor position, tumor type, surgery, and intelligence test and the correlation between prediction models for tumor position and tumor type) and the importance of each assessment report. Through experiment verification, we identified specific types of assessment reports as key predictors, thereby providing reference for collecting data from patients to predict WeeFIM scores. Accurately and rapidly formulating a treatment plan before occupation therapy and using few assessment indicators for WeeFIM score prediction can ensure few resources are used, improve treatment, and accelerate patients’ recovery; these are the contributions of this study.

Table 1.

The feature selection of different sets.

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
Age at diagnosis	✓	✓	—	—	✓	✓	—	✓	✓	✓	—	✓	—	—	—
Tumor site	✓	—	✓	—	✓	—	✓	✓	✓	—	✓	—	✓	—	—
Tumor type	✓	—	✓	—	✓	—	✓	✓	✓	—	✓	—	✓	—	—
Chemotherapy	✓	—	✓	—	✓	—	✓	✓	✓	—	✓	—	✓	—	—
Radiotherapy	✓	—	✓	—	✓	—	✓	✓	✓	—	✓	—	✓	—	—
Surgery	✓	—	✓	—	✓	—	✓	✓	✓	—	✓	—	✓	—	—
Hydrocephalus	✓	—	✓	—	✓	—	✓	✓	✓	—	✓	—	✓	—	—
Time since diagnosis	✓	✓	—	—	✓	✓	—	✓	✓	✓	—	✓	—	—	—
Age at assessment	✓	✓	—	—	✓	✓	—	✓	✓	✓	—	✓	—	—	—
VCI (verbal comprehension index)	✓	—	—	✓	—	✓	✓	✓	—	✓	✓	—	—	✓	—
VSI (visual spatial index)	✓	—	—	✓	—	✓	✓	✓	—	✓	✓	—	—	✓	—
FSIQ (full scale intelligent quotient)	✓	—	—	✓	—	✓	✓	✓	—	✓	✓	—	—	✓	—
WMI (working memory index)	✓	—	—	✓	—	✓	✓	✓	—	✓	✓	—	—	✓	—
PSI (processing speed index)	✓	—	—	✓	—	✓	✓	✓	—	✓	✓	—	—	✓	—
Selfcare eating	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare grooming	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare bathing	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare dressing (upper)	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare dressing (lower)	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare toileting	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare bladder	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Selfcare bowel	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Mobility bed/chair/wheelchair transfer	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Mobility toilet transfer	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Mobility tub/shower transfer	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Mobility walk/wheelchair	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Mobility stairs	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Cognition comprehension	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Cognition expression	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Cognition social interaction	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Cognition problem solving	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓
Cognition memory	✓	✓	✓	✓	✓	✓	✓	—	—	—	—	—	—	—	✓

Prediction result

We employed the leave-one-out cross validation method for performance analysis. Figure 6 presents the results of the WeeFIM total score prediction using the ML models from Set A, namely RF, MLP, AdaBoost, SVR, DT, NNR, and four ensemble learning models. After the predicted values were compared with the actual values, the absolute values were plotted using three colors; blue represents |R| ≤ 1, green represents 1 < |R| ≤ 2, and orange represents |R| ≥ 2.

Figure 6.

The WeeFIM-Total prediction results of the ML models from Set A. (a) MLP + MLP + MLP, (b) RF + MLP + MLP, (c) RF + MLP + Adaboost, (d) RF + MLP, (e) MLP, (f) Adaboost, (g) RF.

Figure 6 presents the cross-validated predictions by each model. The data predicted by the models were more concentrated than those predicted by the single ML models. The prediction of the MLP model generated more concentrated results than those generated by the other single models. Figure 7 presents the residuals of the results of WeeFIM total score prediction from Set A. The vertical and horizontal axs respectively represent the residuals and predicted values, respectively.

Figure 7.

The residuals of the WeeFIM-Total prediction results from Set A. (a) MLP + MLP + MLP, (b) RF + MLP + MLP, (c) RF + MLP + Adaboost, (d) RF + MLP, (e) MLP, (f) Adaboost, (g) RF.

We evaluated prediction error in terms of the mean absolute error (MAE) of the models for WeeFIM cognition, WeeFIM mobility, WeeFIM selfcare, and WeeFIM total. Figures 8–11 (WeeFIM total), respectively. The four ensemble learning models had a low prediction error for Sets A–G and O. Therefore, the ensemble learning models are accurate. The MLP learning model has a lower prediction error than do the other learning models, indicating higher accuracy.

Figure 8.

The prediction error in terms of MAE (WeeFIM–Cognition).

Figure 9.

The prediction error in terms of MAE (WeeFIM–Mobility).

Figure 10.

The prediction error in terms of MAE (WeeFIM–Selfcare).

Figure 11.

The prediction error in terms of MAE (WeeFIM–Total).

Figures 12–15 present the prediction error in terms of RMSE for cognition (WeeFIM cognition), mobility (WeeFIM mobility), selfcare (WeeFIM selfcare), and total (WeeFIM total), respectively. The four ensemble learning models had low prediction error in terms of RMSE for Sets A–G and O, also indicating that they are accurate. The MLP learning model had lower prediction error in terms of RMSE than did the other learning models, suggesting a higher accuracy.

Figure 12.

The prediction error in terms of RMSE (WeeFIM–Cognition).

Figure 13.

The prediction error in terms of RMSE (WeeFIM–Mobility).

Figure 14.

The prediction error in terms of RMSE (WeeFIM–Selfcare).

Figure 15.

The prediction error in terms of RMSE (WeeFIM–Total).

We also evaluated prediction error in terms of R² values. Figures 16–19 present the prediction error in terms of R² values for cognition (WeeFIM cognition), mobility (WeeFIM mobility), selfcare (WeeFIM selfcare), and total (WeeFIM total), respectively. The four ensemble learning models exhibited strong prediction performance for Sets A–G and O. Therefore, the models are accurate. The MLP learning model exhibited superior prediction to that of the other learning models, indicating higher accuracy.

Figure 16.

The prediction error in terms of R² Score (WeeFIM–Cognition).

Figure 17.

The prediction error in terms of R² Score (WeeFIM–Mobility).

Figure 18.

The prediction error in terms of R² Score (WeeFIM–Selfcare).

Figure 19.

The prediction error in terms of R² Score (WeeFIM–Total).

Discussion

Identifying possible treatment methods for patients with diagnosed brain tumor can enable clinical workers to predict side effects and prognosis. This knowledge assists them in determining proper rehabilitation services and in devising a comprehensive rehabilitation plan.⁵⁰ We demonstrated that the use of ML models can predict three WeeFIM indices and the total score, which can assist clinical workers in planning treatment and clinical strategies for clinical care.

Studies have indicated that children with a history of brain tumor have a high risk of cognitive disability, particularly in terms of processing speed, executive functions, memorization, and focus.^{10,12,51–54} This influences their cognitive function, and results in poor occupational performance.¹² Most studies have explored the correlation between neurocognition assessments and the cognitive abilities of patients with CBT. Few have explored whether damage to a tumor site or pharmacotherapy affect quality of life and functional performance for patients with CBT.¹¹ These factors limit patients’ ability to exercise and their processing skills, thereby affecting their performance of daily tasks.⁹ Önal and Huri⁵⁵ revealed that patients with medulloblastoma exhibit poorer executive function and occupational performance than do patients without medulloblastoma.

We revealed that WeeFIM selfcare and the cognition comprehension item in WeeFIM total were the most crucial for predicting WeeFIM total. The cognitive understanding and functional performance of children with brain tumors are influenced by numerous factors. Therefore, compulsory cognition methods should be applied to increase or teach such patients new skills and to enable occupational therapists to focus on skills related to the patient’s education.⁵⁶ Our results indicate that occupational therapists should consider the effect of rehabilitation plans on patients’ lives and cognition.

In addition to the machine learning models which are mainly used in this study, in order to verify the performances of other common models, we also applied 10-fold cross validation to conduct logistic regression and SVR experiments. The experimental results of “Set A” show that, in terms of logistic regression, the R² score of WeeFIM-Cognition is 0.52; the R² score of WeeFIM-Mobility is 0.53; the R² score of WeeFIM-Selfcare is 0.59; the R² score of WeeFIM-Total is 0.59. In terms of SVR, the R² score of WeeFIM-Cognition is 0.8; the R² score of WeeFIM-Mobility is 0.72; the R² score of WeeFIM-Selfcare is 0.86; the R² score of WeeFIM-Total is 0.87. The experimental results show that the performance of these two methods in 10-fold cross validation is acceptable, but not very outstanding. Thereby, the combination of logistic regression and SVR model may not provide very superior performance.

From the perspective of ensemble learning, the combination of two models with higher performance usually brings better results. Through the experiments, we found that for a single model, RF and MLP perform better. Therefore, in the experiments, we demonstrate the performance of RF + MLP in Figures 8–19. Moreover, in order to further explore the performance of the combination of the three models, we additionally conduct additional experiments of combining other three models. However, because there are many algorithms and their combinations are too many, we only list some representative models in this study.

The ground truths of the data are obtained by qualified actual clinical staffs in this study. The prediction results of machine learning models will be compared with the ground truths. The experimental results show that the prediction results of the ensemble learning models are very close to the evaluation reports of actual clinical staffs. The feasibility of the proposed framework is also validated in the experiment.

Moreover, Principal Component Analysis (PCA) analysis is also adopted in the leave-one-out cross validation experiments. Figures 20 and 21 illustrate the WeeFIM–Total prediction results and residuals of the ML models with PCA (number of components is 5). Feature selection is an important issue for data mining. Generally, even though the original data (Set A) provide better results, PCA with 5 components still also gives good prediction performance. This experiment shows that the PCA is also a good method for the dimension reduction process. However, for reaching the highest WeeFIM prediction accuracy, using the original data is suggested.

Figure 20.

The WeeFIM–Total prediction results of the ML models with PCA (number of components is 5). (a) MLP + MLP + MLP, (b) RF + MLP + MLP, (c) RF + MLP + Adaboost, (d) RF + MLP, (e) MLP, (f) Adaboost, (g) RF.

Figure 21.

The residuals of the WeeFIM–Total prediction results with PCA (number of components is 5). (a) MLP + MLP + MLP, (b) RF + MLP + MLP, (c) RF + MLP + Adaboost, (d) RF + MLP, (e) MLP, (f) Adaboost, (g) RF.

Conclusion

We selected features and created 15 sets to predict WeeFIM scores using ML. Our results suggest that clinical physicians should evaluate the cognitive assessment results and daily cognitive performance of patients with CBTs. Clinical physicians should also focus on patients’ selfcare. This study is the first to create ensemble ML models for this purpose by using multiple ML models, namely RF, MLP, AdaBoost, SVR, DT, and NNR, and to use the leave-one-out cross-validation method for the performance evaluation. The prediction results indicate that the ensemble algorithm–based ML models possess considerable predictive capabilities and that the MLP model exhibited favorable prediction performance. Sets containing the WeeFIM subitem scores exhibited a strong correlation with WeeFIM cognition, WeeFIM mobility, WeeFIM selfcare, and WeeFIM total.

Highlights

• Most studies have explored the correlation between neurocognitive assessments and the cognitive abilties of patients with CBTs.

• By exploring the subitems of the WeeFIM scale, we proposed models that can assist clinical workers in evaluating the progress of a patient’s ability recovery and form the optimal treatment plan for intervention.

• We applied ML models to large-scale datasets to predict the daily functional performance of patients with CBTs. We also provided suggestions to increase the efficiency of health care services.

Footnotes

Authors’ contributions

Pei-Hua Lin and Ping-Huan Kuo design this study. Ping-Huan Kuo wrote the program. Pei-Hua Lin design the experiments. Pei-Hua Lin and Ping-Huan Kuo wrote this paper. All authors revised the manuscript and read and approved the version submitted.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Science and Technology, Taiwan, under Grants MOST 109-2221-E-194-053-MY3.

Research ethics and patient consent

All the data in this study are obtained from a public resource in Ref. [47]. The database is publicly available and provided in Ref. [48] (). This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Human Research Ethics Committee, National Chung Cheng University (Application number: CCUREC111081701).

ORCID iD

Ping-Huan Kuo

Appendix

References

Kanti

Kumar

. Pediatric glioblastoma. In: Glioblastoma. Brisbane, Australia, Codon Publications, 2017, pp. 297–312.

McKean-Cowdin

Razavi

Barrington-Trimis

, et al. Trends in childhood brain tumor incidence, 1973–2009. J Neurooncol 2013; 115: 153–160.

Johnson

Cullen

Barnholtz-Sloan

, et al. Childhood brain tumor epidemiology: a brain tumor epidemiology consortium review. Cancer Epidemiol Biomarkers Prev 2014; 23: 2716–2736.

Millard

Braganca

KCD

. Medulloblastoma. J Child Neurol 2016; 31: 1341–1353.

Bartlett

Kortmann

Saran

. Medulloblastoma. Clin Oncol 2013; 25: 36–45.

Hilden

Meerbaum

Burger

, et al. Central nervous system atypical teratoid/rhabdoid tumor: results of therapy in children enrolled in a registry. J Clin Oncol 2004; 22: 2877–2884.

Udaka

Packer

. Pediatric brain tumors. Neurol Clin 2018; 36: 533–556.

Reni

Gatta

Mazza

, et al.

Ependymoma

Crit Rev Oncol Hematol 2007; 63: 81–89.

Demers

Gélinas

Carret

A-S

. Activities of daily living in survivors of childhood brain tumor. Am J Occup Ther 2015; 70: 7001220040p1–7001220040p8.

10.

Kesler

Lacayo

. A pilot study of an online cognitive rehabilitation program for executive function skills in children with cancer-related brain injury. Brain Inj 2011; 25: 101–112.

11.

Adcock

Burke

. Children with brain tumours: a critical reflection on a specialist coordinated assessment. Br J Occup Ther 2014; 77: 429–433.

12.

Önal

Huri

. Cognitive functions of children with brain tumor in the treatment process. Br J Occup Ther 2021; 84: 164–172.

13.

Keith

Fine

Taub

, et al. Higher order, multisample, confirmatory factor analysis of the wechsler intelligence scale for children-fourth edition: what does it Measure. School Psych Rev 2006; 35: 108–127.

14.

Wong

Chan

, et al. Functional independence measure (WeeFIM) for Chinese children: Hong Kong cohort. Pediatrics 2002; 109: E36.

15.

Zhou

Zhang

Chen

, et al. A novel framework for bringing smart big data to proactive decision making in healthcare. Health Inform J 2021; 27: 146045822110246.

16.

Uddin

Stranieri

Gondal

, et al. Rapid health data repository allocation using predictive machine learning. Health Inform J 2020; 26: 3009–3036.

17.

Moll

Cajander

. Oncology health-care professionals’ perceived effects of patient accessible electronic health records 6 years after launch: a survey study at a major university hospital in Sweden. Health Inform J 2020; 26: 1392–1403.

18.

NQK

Hung

TNK

, et al. Radiomics-based machine learning model for efficiently classifying transcriptome subtypes in glioblastoma patients from MRI. Comput Biol Med 2021; 132: 104320.

19.

Tandel

Tiwari

Kakde

. Performance optimisation of deep learning models using majority voting algorithm for brain tumour classification. Comput Biol Med 2021; 135: 104564.

20.

Tandel

Balestrieri

Jujaray

, et al. Multiclass magnetic resonance imaging brain tumor classification using artificial intelligence paradigm. Comput Biol Med 2020; 122: 103804.

21.

Deepak

Ameer

. Brain tumor classification using deep CNN features via transfer learning. Comput Biol Med 2019; 111: 103345.

22.

Noreen

Palaniappan

Qayyum

, et al. Brain tumor classification based on fine-tuned models and the ensemble method. Comput Mater Contin 2021; 67: 3967–3982.

23.

Tang

Zawaski

Francis

, et al. Image-based classification of tumor type and growth rate using machine learning: a preclinical study. Sci Rep 2019; 9: 12529.

24.

Novak

Zarinabad

Rose

, et al. Classification of paediatric brain tumours by diffusion weighted imaging and machine learning. Sci Rep 2021; 11: 2987.

25.

Grist

Withey

MacPherson

, et al. Distinguishing between paediatric brain tumour types using multi-parametric magnetic resonance imaging and machine learning: a multi-site study. Neuroimage Clin 2020; 25: 102172.

26.

Naser

Deen

. Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med 2020; 121: 103758.

27.

Akeret

Stumpo

Staartjes

, et al. Topographic brain tumor anatomy drives seizure risk and enables machine learning based prediction. Neuroimage Clin 2020; 28: 102506.

28.

Chen

Sun

, et al. Automated machine learning based on radiomics features predicts H3 K27M mutation in midline gliomas of the brain. Neuro Oncol 2019; 22: 393–401. DOI: 10.1093/neuonc/noz184

29.

Chen

Huang

Yan

, et al. Two machine learning methods identify a metastasis-related prognostic model that predicts overall survival in medulloblastoma patients. Aging (Albany NY) 2020; 12: 21481–21503.

30.

Sekeroglu

Tuncal

. Prediction of cancer incidence rates for the European continent using machine learning models. Health Inform J 2021; 27: 146045822098387.

31.

Guo

Bian

Modave

, et al. Assessing the effect of data integration on predictive ability of cancer survival models. Health Inform J 2020; 26: 8–20.

32.

Eminaga

Al-Hamad

Boegemann

, et al. Combination possibility and deep learning model as clinical decision-aided approach for prostate cancer. Health Inform J 2020; 26: 945–962.

33.

Gumaei

Sammouda

Al-Rakhami

, et al. Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression. Health Inform J 2021; 27: 146045822198940.

34.

Balakrishnan

Idicula

Jones

. Deep learning based analysis of sentiment dynamics in online cancer community forums: an experience. Health Inform J 2021; 27: 146045822110075.

35.

Alarcón-Narváez

Hernández-Torruco

Hernández-Ocaña

, et al. Toward a machine learning model for a primary diagnosis of Guillain-Barré syndrome subtypes. Health Inform J 2021; 27: 146045822110214.

36.

Bertoncelli

Altamura

Vieira

, et al. PredictMed: a logistic regression-based model to predict health conditions in cerebral palsy. Health Inform J 2020; 26: 2105–2118.

37.

Cooper

Wei

Fernandez

, et al. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models. Comput Biol Med 2015; 57: 54–65.

38.

Pruitt

Ayyangar

Craig

, et al. Pediatric brain tumor rehabilitation. J Pediatr Rehabil Med 2011; 4: 59–70.

39.

Nguyen

Morell

DeBaets

. Large-scale distance metric learning for k-nearest neighbors regression. Neurocomputing 2016; 214: 805–814.

40.

Song

Liang

, et al. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 2017; 251: 26–34.

41.

Zhang

O’Donnell

. Support vector regression. In: Machine learning. Amsterdam, Netherlands: Elsevier, 2020, pp. 123–140.

42.

Song

Y-Y

Ying

. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiat 2015; 27: 130–135.

43.

Patel

Rana

, et al. A survey on decision tree algorithm for classification. International Journal of Engineering Development and Research 2014; 2(1): 1–5.

44.

Cutler

Edwards

Beard

, et al. Random forests for classification in ecology. Ecology 2007; 88: 2783–2792.

45.

Park

Lek

. Chapter 7 - Artificial Neural Networks: Multilayer Perceptron for Ecological Modeling. Developments in Environmental Modelling 2016; 28: pp. 123–140.

46.

Cao

Miaoq

Liuj

, et al. Advance and prospects of adaboost algorithm. Acta Autom Sin 2013; 39: 745–758.

47.

Oprandi

Oldrati

delle

, et al. Processing speed and time since diagnosis predict adaptive functioning measured with WeeFIM in pediatric brain tumor survivors. Cancers (Basel) 2021; 13: 4776.

48.

Chiara

. Cognitive and clinical predictors of adaptive functioning in pediatric brain tumor survivors. Epub ahead of print. 2021. DOI: 10.5281/zenodo.4733570

49.

Razali

Wah

. Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Stat Model Anal 2011; 2: 21–33.

50.

Shahpar

Mhatre

PV.

Huang

. HuangME. Update on brain tumors: new developments in neuro-oncologic diagnosis and treatment, and impact on rehabilitation strategies. PM&R 2016; 8: 678–689.

51.

Wolfe

Walsh

Reynolds

, et al. Executive functions and social skills in survivors of pediatric brain tumor. Child Neuropsychol 2013; 19: 370–384.

52.

Dockstader

Wang

Bouffet

, et al. Gamma deficits as a neural signature of cognitive impairment in children treated for brain tumors. J Neurosci 2014; 34: 8813–8824.

53.

Chieffo

Tamburrini

Frassanito

, et al. Preoperative neurocognitive evaluation as a predictor of brain tumor grading in pediatric patients with supratentorial hemispheric tumors. Child’s Nerv Syst 2016; 32: 1931–1937.

54.

Taiwo

ZNaS

King

. The neurological predictor scale: a predictive tool for long-term core cognitive outcomes in survivors of childhood brain tumors. Pediatr Blood Cancer 2017; 64: 172–179.

55.

Önal

Huri

. Relationships between executive functions and occupational performance of children with medulloblastoma. Br J Occup Ther 2021; 84: 251–258.

56.

Tanner

Keppner

Lesmeister

, et al. Cancer rehabilitation in the pediatric and adolescent/young adult population. Semin Oncol Nurs 2020; 36: 150984.