Sage Journals: Discover world-class research

Abstract

Objective

Medulloblastoma (MB) is a highly malignant brain tumor. Early diagnosis and treatment are important to improve patients’ survival. However, it is difficult to distinguish MB from other brain tumors in magnetic resonance imaging (MRI) with the naked eye. This study proposed a new hybrid deep learning model named InceptentionNet, combining Inception and self-attention mechanisms to recognize MB with MRI images.

Methods

InceptentionNet integrated multiscale feature extraction and dynamic focus on relevant regions. This model was trained using a dataset with 736 MRI images, including 106 MB and 630 non-MB images. Other single convolutional neural network models, including MobileNet, Residual Network, Densely Connected Convolutional Network, Visual Geometry Group, and Inception, were also trained. All models’ performance was evaluated. In addition, we conducted external tests to verify the generalization of the model.

Results

The InceptentionNet model achieved an accuracy of 98.07% ± 0.77%, a precision of 91.43% ± 4.56%, a F1-score of 93.54% ± 2.44%. And the area under curve and recall were respectively 99.41% ± 0.08% and 96.03% ± 3.61%. In external tests, this model still performed best, achieving 90.94% accuracy and 92.79% AUC. These metrics indicated that our model exhibited a good performance in distinguishing MB. The accuracy of InceptentionNet was the highest among other single models, indicating our hybrid model outperform other models. Additionally, images combined with attention heatmaps exhibited high clinical interpretability.

Conclusion

InceptentionNet demonstrates robust predictive capabilities and has the potential as a diagnostic assistant tool. In the future, the model should be trained and validated using larger data and multiclass classification should be expanded.

Keywords

Medulloblastoma convolutional neural network self-attention magnetic resonance imaging brain tumor

Introduction

Medulloblastoma (MB) is a malignant embryonic tumor located on the posterior cranial fossa, originating from granule cell precursors.¹ Classified as grade IV in the World Health Organization pathological classification, MB exhibits highly malignant behaviors, and 30% patients were demonstrated metastasis at diagnosis.^2,3 The mass effect caused by MB not only can lead nausea, vomiting, sensory and motor disorders, but can also cause the brain herniation followed by the brain stem compression and death.

MB is the most common intracranial primary malignant brain tumor in children, while rare in adults.^4,5 Surgery, chemotherapy, and craniospinal irradiation are the main treatment methods for MB patients.⁶ Considering the highly malignant behavior, MB should be diagnosed and treated as early as possible. However, since the interpreting magnetic resonance imaging (MRI) is complex and the variation among radiologist, it is hard to distinguish between MB and other brain tumors, especially those like ependymoma origin in the fourth ventricle. Pathology can distinguish MB, but biopsy and craniotomy surgery both harbor danger for patients to get tumor tissues. And not all patients can undergo operations. Hence, a noninvasive, automated, and highly efficient system for assisting in the diagnosis of MB is needed.

With the thriving of deep learning technology in the field of medical image recognition, many models based on deep learning have implemented in brain tumors, especially in gliomas. For example, Voort et al.⁷ developed a single multitask convolutional neural network (CNN) using MRI scans to predict the mutation status and grade of gliomas. Xia et al.⁸ used a CNN model to distinguish between glioblastoma and lymphoma. Tang et al.⁹ proposed a new deep learning-based model to predict the tumor genotypes of glioblastoma using MRI scans. Chen et al.¹⁰ used a CNN model to predict the PTEN mutation of gliomas. However, only a few researches have focused on the MB. Peng et al.¹¹ developed a fully automated pipeline to segmentate MB. Bareja et al.¹² used nnU-Net-based segmentation models to automatically delineate the MB. But these models can’t distinguish MB from other tumors nor the subgroups of MB. Chen et al.¹³ used mask regional CNN to distinguish the molecular subgroups of MB. But they only used one model and the accuracy needs to be improved.

To focus on the most critical areas and improve the accuracy of the CNN model, self-attention has been introduced, allowing the CNN model to dynamically weight different parts of an image.¹⁴ Introducing a self-attention mechanism into a CNN can significantly enhance its ability to identify MB by improving feature extraction and refining the network's focus on tumor-specific characteristics. This article proposed a novel approach called InceptentionNet that combined the strengths of both CNN and self-attention mechanisms to achieve more precise identification of MB with MRI images (Figure 1).

Figure 1.

The workflow of this study.

Methods

Ethical approval

The public data of this study are all from Kaggle (https://www.kaggle.com/datasets/waseemnagahhenes/brain-tumor-for-14-classes). Since Kaggle is a public online platform, it does not require the approval of ethics committee. In addition, the external validation data were obtained from Shanghai Children's Medical Center. This study was approved by the Institutional Ethics Committee of Shanghai Children's Medical Center and the ethics approval number is SCMCIRB-Κ2022159–1.

Data acquisition

Training data

The training data in this study were obtained from Kaggle named Brain Tumor for 14 classes. There are 131 MB images in the dataset. To address concerns regarding duplicate images, we confirm that a two-step process was used: (1) we applied perceptual hash (pHash) comparison to automatically flag near-identical images; (2) all flagged pairs were manually reviewed by a radiologist to verify redundancy before removal. As a result, 25 duplicate MB images and 6 duplicate non-MB images were excluded. After excluding 25 duplicate images, 106 MB images were enrolled in this study. As for non-MB images, 636 images were randomly selected from other 13 brain tumor classes and normal class. After excluding six duplicate images, 630 non-MB images were enrolled. In the end, a total of 736 images were included in this study.

External validation data

We retrospectively collected MRI images of 118 patients from Shanghai Children's Medical Center for external validation. This study was approved by the Ethics Committee of Shanghai Children's Medical Center (SCMCIRB-Κ2022159–1).

Data preprocessing

To prepare the MRI images for model training, a series of preprocessing steps were applied to enhance image quality and highlight critical features. Noise reduction was performed using a Gaussian filter with a standard deviation of 2, effectively smoothing nonessential details while preserving the edges of key structures. Contrast adjustment was achieved through histogram equalization, which redistributed pixel intensity values across a broader dynamic range to improve the visibility of subtle features. These steps ensured a cleaner and more informative dataset, crucial for robust model performance. For example, after denoising, the average peak signal-to-noise ratio of the images increased by approximately 15%, and contrast adjustment led to a 22% increase in the standard deviation of pixel intensities, indicating improved feature differentiation.

To artificially expand the dataset and simulate real-world clinical variations, augmentation techniques such as random rotation (angles ranging between 0° and 45°), horizontal flipping, and zooming (scaling factors randomly selected between 0.8 and 1.2) were applied. These methods increased the dataset size from 736 images to 2944 images, with the number of samples in each tumor class augmented by a factor of four. This not only addressed the issue of data imbalance but also enhanced the model's robustness to variations in orientation and scale, which are common challenges in medical imaging. For instance, after applying these augmentations, the model's accuracy improved by approximately 5%, demonstrating the effectiveness of the augmented dataset in improving the model's generalization capability.

Model construction

Proposed deep learning model

In recent years, many models based on CNN have applied for medical image analysis given the great performance. Inception, also named as GoogLeNet, is a CNN architecture which allows the CNN to capture features at different scales, bringing excellence performance for image classification and object detetion.¹⁵ For example, Rao et al.¹⁶ used Inception V3 model to classify medical radiology images and obtained 99.98% accuracy. Singh et al.¹⁷ used Inception V3 model to distinguish between health and blood cancer cells using cellular morphology dataset. However, just like other CNN models, Inception model lacks the capture of long-range dependencies across the image, which might be important to MB identification considering the complex tumor structures. And attention-based models can capture global dependencies but lack the spatial locality, which is essential for the medical image detection.^18,19 To address these limitations and adapt to MB identification, a new model is needed to be proposed.

To address the limitations of conventional CNN models in capturing both local and global information, we developed a hybrid deep learning architecture named InceptentionNet, which integrates an Inception-based spatial feature extractor with a multihead self-attention mechanism. The overall architecture is illustrated in Figure 2.

Figure 2.

The architecture of the InceptentionNet model.

The model consists of three main components. First, the stem module applies a 3 × 3 convolution (stride = 1, 64 filters) to process the input image and generate initial low-level features. Second, a modified Inception block is used to extract multiscale features through parallel branches with 1 × 1, 3 × 3, and 5 × 5 convolutions, each followed by batch normalization and ReLU activation. To preserve spatial information, we replaced the traditional max-pooling path with a 3 × 3 convolution followed by stride-2 downsampling. The outputs of the parallel branches are concatenated along the channel dimension to form a comprehensive feature representation.

After feature extraction, we introduced a multihead self-attention module consisting of four attention heads, each with 64-dimensional query and key vectors. The attention module dynamically reweighted the spatial features according to their contextual relevance. The attention scores were computed using scaled dot-product attention, followed by softmax normalization, allowing the model to highlight tumor-relevant regions—such as the posterior fossa or cerebellar vermis—with greater precision.

This architectural design was informed by both radiological knowledge and deep learning best practices. MB often exhibit diverse intratumoral appearances (e.g. necrosis, cysts, calcification) and variable locations in the posterior fossa, which makes local feature extraction and global context modeling equally important. The combination of Inception's multiscale feature extraction and self-attention's contextual sensitivity allows the model to better handle such complexity, thereby improving its ability to distinguish MB from other tumor types.

The InceptentionNet model was trained using the Adam optimizer with an initial learning rate of 0.005. To ensure efficient convergence while avoiding overfitting, we applied a learning rate decay strategy, which automatically reduced the learning rate when the validation loss plateaued. Training was conducted over a maximum of 40 epochs, with early stopping enabled—training was halted if the validation loss did not improve for 10 consecutive epochs. A mini-batch size of 8 was used throughout the training process, which was selected empirically based on GPU memory constraints and training stability. To further enhance generalization and reduce overfitting, a dropout rate of 0.3 was applied after each dense layer in the classification head. All hyperparameters were determined through preliminary grid search. These training configurations ensure a reproducible and reliable optimization process for the proposed model.

Model comparisons

To obtain the benchmarking performance and highlight the improvements, the InceptentionNet was compared with other single models based on CNN, including MobileNet, Residual Network (ResNet), Densely Connected Convolutional Network (DenseNet), Visual Geometry Group (VGG), and Inception.

MobileNet is a CNN model which is specially designed for mobile and embedded device application.²⁰ With high efficiency of computation and memory, this model is very suitable for devices with limited processing power. Since this model has good performance in visual recognition, such as image classification, this model is also used for medical image recognition. For example, Huang et al.²¹ used MobileNet-V2/IFHO model to detect early-stage diabetic retinopathy, achieving an average precision of 97.521%.

ResNet introduced the residual learning concept and enable the training of deeper networks, significantly improving the CNN performance.²² Sahli et al.²³ used the ResNet model combined with SVM for glioblastoma segmentation and classification. Shin et al.²⁴ used ResNet-50 to distinguish glioblastoma from solitary brain metastasis.

DenseNet is a CNN model which allows each layer to connect with other layer in a feed-forward manner.²⁵ DenseNet is also highly effective in image classification and object detection. For example, Guo et al.²⁶ used a model based on DenseNet to classify glioma subtypes and achieved an accuracy of 0.878. Alshammari et al.²⁷ used a DenseNet-based model to classify brain metastasis tumor with a accuracy of 98.5%.

VGG is a popular model at image classification tasks and notable for depth.²⁸ VGG-based models have widely applied in medical image recognition. Basha et al.²⁸ used VGG-16 to segment brain tumor. Saluja et al.²⁹ used VGG-16 for glioma grade classification.

Inception uses multiple convolutional and pooling operations in parallel within a single layer, having good performance of capturing features at different scales.³⁰ Wu et al.³¹ used Inception model to classify glioma and encephalitis. Khaliki et al.³² used inception-V3 to classify different brain tumors.

Statistical analysis

To evaluate the performance of the new model and other baseline models, several metrics were analyzed, including accuracy, precision, recall, F1-score, receiver operating characteristic (ROC) curve, and area under the curve (AUC). Accuracy assesses the correctness of predictions, and the equation is Accuracy = (True positives + True negatives)/(True positives + True negatives + False positives + False negatives). Precision assesses the accuracy of positive predictions, and the equation is Precision = True positives/(True positives + False positives). Recall assesses the ability to identify all true positives, and the equation is Recall = True positive/(True positives + False negatives). F1-score is the harmonic mean of precision and recall, and the equation is F1-score = 2 × (Precision × Recall)/(Precision + Recall). AUC assesses overall discrimination ability.

Results

Experimental setup

In this study, all images from Kaggle were used to train the model with five-fold cross validation, ensuring the robustness and prevention of overfitting. To maintain comparability and consistency, all models were trained using five-fold cross validation. For each model, hyperparameter tuning was performed using grid search to identify the optimal learning rate, batch size, and regularization parameters. All models were trained until convergence, defined as no significant improvement in the validation loss over 10 consecutive epochs, while early stopping was applied to prevent overfitting. Training was conducted on an NVIDIA RTX 3090 GPU with 24GB of memory, leveraging the PyTorch deep learning framework in Python (version 3.9). Model training and validation were monitored using performance metrics, including accuracy, precision, recall, F1-score, ROC curve, and AUC, to ensure comprehensive evaluation.

Cross-validation results

During the training, we used hyperparameters to optimize the performance of the hybrid model. An Adam optimizer with an initial learning rate of 0.005 was used during the training process. The batch size was 8. And the training process was monitored at different epochs (Figure 3(a)).

Figure 3.

Accuracy and receiver operating characteristic (ROC) curves. Accuracy of InceptentionNet at different epochs (a). ROC curves of InceptentionNet (b), MobileNet (c), Residual Network (d), Densely Connected Convolutional Network (e), Visual Geometry Group (f), and Inception (g).

For comparison, we also implemented five widely used CNN architectures—Inception, MobileNet, ResNet, DenseNet, and VGG—under the same experimental settings. The results for all models are reported as mean ± standard deviation across the five folds, as shown in Table 1. Our model achieved the highest overall performance, with an accuracy of 98.07% ± 0.77%, a precision of 91.43% ± 4.56%, a recall of 96.03% ± 3.61%, an F1-score of 93.54% ± 2.44%, and an AUC of 99.41% ± 0.08%. Compared to Inception (97.52% ± 0.51% accuracy, 92.21% ± 0.01% AUC) and DenseNet (96.93% ± 1.44% accuracy, 99.43% ± 0.15% AUC), our model provides both superior performance and lower variance across all metrics, especially in recall and F1-score, which are critical for imbalanced diagnostic tasks.

Table 1.

The performing metrics of models.

Model	Accuracy	Precision	Recall	F1-score	AUC
Inception	97.52% ± 0.51%	85.45% ± 2.58%	100.00% ± 0.00%	92.13% ± 1.50%	92.21% ± 0.01%
MobileNet	92.96% ± 3.53%	93.35% ± 8.72%	58.41% ± 29.73%	64.74% ± 28.19%	98.30% ± 0.30%
ResNet	62.84% ± 27.88%	38.19% ± 28.43%	35.08% ± 34.94%	17.97% ± 5.00%	64.46% ± 11.39%
DenseNet	96.93% ± 1.44%	96.09% ± 4.42%	82.86% ± 13.40	88.01% ± 7.00%	99.43% ± 0.15%
VGG	88.33% ± 4.91%	67.77% ± 19.82%	76.51% ± 32.83%	60.97% ± 23.09%	96.93% ± 0.50%
InceptentionNet	98.07% ± 0.77%	91.43% ± 4.56%	96.03% ± 3.61%	93.54% ± 2.44%	99.41% ± 0.08%

AUC: area under curve; ResNet: residual network; DenseNet: densely connected convolutional network; VGG: visual geometry group.

In contrast, ResNet and VGG exhibited significantly lower performance, particularly in terms of recall and AUC, suggesting their limited suitability for MB identification tasks under the current dataset conditions. MobileNet achieved relatively strong precision (93.35% ± 8.72%) but suffered from unstable recall (58.41% ± 29.73%), indicating a higher rate of false negatives. These results confirm that InceptentionNet effectively balances sensitivity and specificity while maintaining robustness across folds, which is essential for clinical decision support systems in medical imaging.

In addition, Figure 3(b–g) shows the ROC curves and AUC values of each model in the five-fold cross validation. The results include the specific results of each fold and the corresponding average results (including 95% confidence interval).

External-validation results

In addition to performing five-fold cross-validation on publicly available datasets, we further evaluated the generalizability and real-world clinical applicability of our proposed model by conducting external testing using clinical samples collected from a hospital setting. Specifically, we retrospectively collected MRI images of 118 patients from Shanghai Children's Medical Center, including 66 MB images and 52 non-MB images. This study complies with the Declaration of Helsinki and has been approved by the Hospital Ethics Committee and the data was obtained from public database hence the informed consent was waived off (No.: SCMCIRB-Κ 2022159–1). The results of the external evaluation are presented in Table 2. Our model achieved outstanding performance with an accuracy of 90.94%, precision of 97.30%, F1-score of 91.96%, and a recall of 94.56%, surpassing all baseline and comparative models. Notably, the accuracy exceeded that of the baseline Inception model by more than 1%, underscoring the superior predictive capability of our approach.

Table 2.

The performing metrics of models in external test.

	Accuracy	Precision	Recall	F1-score	AUC
InceptentionNet	90.94%	97.30%	91.96%	94.56%	92.79%
MobileNet	87.50%	87.25%	100.00%	93.19%	88.00%
ResNet	79.82%	85.10%	92.63%	88.70%	67.22%
DenseNet	88.99%	88.60%	100.00%	93.95%	87.05%
VGG	89.45%	90.87%	97.45%	94.05%	87.27%
Inception	89.79%	90.21%	98.79%	94.31%	91.62%

AUC: area under curve; ResNet: residual network; DenseNet: densely connected convolutional network; VGG: visual geometry group.

The ROC curves of all models evaluated on the external dataset are shown in Figure 4. Our proposed model achieved the highest AUC of 92.79%, with its ROC curve lying closest to the top-left corner, indicating excellent discriminatory power on real-world clinical data. These results substantiate that our model is not limited to a single dataset or overfitting to public data but demonstrates robust performance and strong generalization ability in authentic clinical scenarios. This external validation highlights the potential of our model to be integrated into clinical workflows, offering high-accuracy decision support for physicians and improved outcomes for patients.

Figure 4.

The ROC curve of external validation.

Visualization and clinical interpretability analysis

To enhance clinical interpretability, we generated Grad-CAM-based heatmaps overlaid on representative MRI images (Figure 5), which visualize the spatial regions most strongly attended to by the InceptentionNet model during classification. These heatmaps provide an intuitive visual explanation of the model's decision-making process by highlighting areas that contribute most significantly to the final prediction.

Figure 5.

Images combined with attention heat for medulloblastoma and non-medulloblastoma.

The highlighted regions were found to correspond well with radiological hallmarks commonly associated with MB. Specifically, the model consistently focused on tumor localization within the fourth ventricle and adjacent midline structures—areas that are known to be frequent sites of MB occurrence. In addition, the attention maps accurately captured features related to mass effect, including compression of the brainstem and displacement of surrounding tissue, which are critical indicators in radiological assessment. Moreover, the model was able to identify signal heterogeneity patterns—such as hypointense or hyperintense regions on T1- or T2-weighted imaging—suggestive of necrosis, cystic degeneration, and calcification, all of which are commonly seen in advanced-stage MBs.

By focusing on these clinically relevant features, the generated heatmaps not only provide post-hoc interpretability but also demonstrate alignment with the diagnostic reasoning typically employed by expert neuroradiologists. This alignment further supports the feasibility of integrating the model into real-world radiological workflows as a complementary diagnostic aid.

Discussion

In this research, we proposed a new model named InceptentionNet which was consisted of Inception and self-attention. This model had both the spatial awareness and global focus and was particularly suitable for the identification of MB considering the complex tumor structures. InceptentionNet achieved outstanding performance on the task of identification MB images compared to other single CNN models according to the results of performance metrics. The accuracy as 90.94% and AUC was 92.79%, demonstrating excellent predictive capability. Images combined with heatmaps suggested that InceptentionNet also had a great clinical interpretability.

The excellent performance of our hybrid model might be attributed to the model architecture and the suitability of MB identification task. Traditional CNN models focus on local features, while the introducing of self-attention allows model to establish a global context across the whole image.^33,34 So, the model can capture the relationships regardless the physical distance between pixels or features, which is very important to the recognition tasks of brain tumors considering the complex structures.³⁵ The combination of Inception and self-attention modules can enhance feature representation. The Inception model can capture multiscale features through multiscale convolution operations. Self-attention mechanism can enhance and refine the feature representation. Additionally, the attention mechanism can also provide intuitive insights about the model's decision-making process. We can easily understand how the hybrid model “views” the input by observing which areas receive higher attention weights. These advantages bring high performance of our hybrid model in the task of recognize MB. Besides, combining different mechanisms might help the model to be used in other data since it can capture features from multiple perspectives.

In addition to reporting average performance metrics, we analyzed the misclassified cases to better understand the limitations of the proposed model. Most false positive predictions occurred when non-MB tumors—such as cystic ependymomas or midline gliomas—shared anatomical locations or imaging characteristics with MBs. These tumors often exhibit similar intensities and occupy regions in the posterior fossa, leading the model to incorrectly classify them as MB due to overlapping visual cues. Conversely, false negatives were primarily associated with atypical presentations of MB. In these cases, the lesions were either very small or demonstrated weak contrast relative to surrounding brain tissue, which made accurate identification more challenging even for human observers. Some of these errors also appeared in MRI slices with motion artifacts or partial volume effects, suggesting that image quality plays a significant role in model sensitivity.

This research also has significant clinical values. MB is a malignant brain tumor which requires early diagnosis and treatment. But it is difficult to distinguish MB from other tumors, such as ependymoma, with the MRI scans by naked eyes.³⁶ Pathology is the gold standard to identify MB, while tumor tissues are needed from biopsy or craniotomy surgery, and both harbor danger for patients. Our model can noninvasively identify MB tumor with excellent performance, having the potential of clinical deployment and supporting radiologists.

There are several limitations in this research. Firstly, we only focused on binary classification, and the classification of pathological grades or genetic mutations were not enrolled in this study. In the future research, we will expand this model to multiclass tasks. Secondly, there was no external test group. Data form multiple centers should be used to confirm the model's generalizability in the future. Although the dataset used in this study is publicly available and facilitates reproducibility, it lacks comprehensive clinical annotation and may not fully represent real-world patient variability, which should be addressed in future multicenter studies. Finally, although there is a clinical potential in our model, a deployable system has not been developed. Further development is necessary for an integrated system for clinical translation.

Conclusion

This study developed a hybrid model named InceptentionNet which integrated Inception and self-attention mechanisms for the recognition of MB with MRI images. InceptentionNet can capture multiscale features and focus on critical regions, demonstrating high performance compared with other single CNN models. Additionally, images combined with attention heatmaps exhibited high clinical interpretability. Therefore, this model has the potential to assist radiologists with the MB recognition. For future work, this model should be trained and validated with more data and developed into an accessible diagnostic tool.

Footnotes

ORCID iDs

Chenhao Fang

Chao Li

Hong Chen

Xianzhen Chen

Zhaoli Shen

Ethical considerations

This study complies with the Declaration of Helsinki and has been approved by the Hospital Ethics Committee and the data was obtained from public database hence the informed consent was waived off (No.: SCMCIRB-Κ 2022159–1).

Author contributions

Conceptualization: CF, CL, HC, XC, and ZS; data curation: CL, HL, QZ, and SL; formal analysis: CF and CL; funding acquisition: HL, HC, XC, and ZS; investigation: CF and CL; methodology: CF, CL, HL, QZ, and SL; project administration: CF and CL; resources: CF and SL; software: CF and CL; supervision: HC, XC, and ZS; validation: HL and QZ; visualization: CL and HL; writing—original draft: CF, CL, HL, QZ, SL, HC, XC, and ZS; writing—review and editing: HC, XC, and ZS.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Science Foundation of Fujian Province, National Natural Science Foundation of China, and Scientific Research Project of National Key clinical specialty construction project (Grant Nos. 2023J01583, 32200781, and 2022YBL-ZD-04).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Guarantors

CF and CL.

References

Franceschi

Giannini

Furtner

, et al. Adult medulloblastoma: updates on current management and future perspectives. Cancers (Basel) 2022; 14: 3708.

Fouladi

Gajjar

Boyett

, et al. Comparison of CSF cytology and spinal magnetic resonance imaging in the detection of leptomeningeal disease in pediatric medulloblastoma or primitive neuroectodermal tumor. J Clin Oncol 1999; 17: 3234–3237.

Louis

Perry

Wesseling

, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro Oncol 2021; 23: 1231–1251.

Thorbinson

Kilday

. Childhood malignant brain tumors: balancing the bench and bedside. Cancers (Basel) 2021; 13: 6099.

Ostrom

Price

Neff

, et al. CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2016–2020. Neuro Oncol 2023; 25: iv1–iv99.

Northcott

Robinson

Kratz

, et al. Medulloblastoma. Nat Rev Dis Primers 2019; 5: 11.

van der Voort

Incekara

Wijnenga

MMJ

, et al. Combined molecular subtyping, grading, and segmentation of glioma using multi-task deep learning. Neuro Oncol 2023; 25: 279–289.

Xia

, et al. Deep learning for automatic differential diagnosis of primary central nervous system lymphoma and glioblastoma: multi-parametric magnetic resonance imaging based convolutional neural network model. J Magn Reson Imaging 2021; 54: 880–887.

Tang

Jin

, et al. Deep learning of imaging phenotype and genotype for predicting overall survival time of glioblastoma patients. IEEE Trans Med Imaging 2020; 39: 2100–2109.

10.

Chen

Lin

Zhang

, et al. Deep learning radiomics to predict PTEN mutation status from magnetic resonance imaging in patients with glioma. Front Oncol 2021; 11: 734433.

11.

Peng

Kim

Patel

, et al. Deep learning-based automatic tumor burden assessment of pediatric high-grade gliomas, medulloblastomas, and other leptomeningeal seeding tumors. Neuro Oncol 2022; 24: 289–299.

12.

Bareja

Ismail

Martin

, et al. nnU-Net-based segmentation of tumor subcompartments in pediatric medulloblastoma using multiparametric MRI: a multi-institutional study. Radiol Artif Intell 2024; 6: e230115.

13.

Chen

Fan

, et al. Molecular subgrouping of medulloblastoma based on few-shot learning of multitasking using conventional MR images: a retrospective multicenter study. Neurooncol Adv 2020; 2: vdaa079.

14.

Niu

Zhong

. A review on the attention mechanism of deep learning. Neurocomputing 2021; 452: 48–62.

15.

Szegedy

Wei

Yangqing

, et al. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June 2015 2015, pp.1–9.

16.

Ashwath Rao

Kini

Nostas

. Content-based medical image retrieval using pretrained inception V3 model. In: Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences. Singapore: Springer Singapore, 2022, pp.641–652.

17.

Singh

Sharma

Aggarwal

, et al. InceptionV3 in medical imaging: enhancing precision in acute lymphoblastic leukaemia diagnosis. In: 2024 2nd International Conference on Computer, Communication and Control (IC4), 8-10 Feb 2024 2024, pp.1–6.

18.

Baffour

Qin

Wang

, et al. Spatial self-attention network with self-attention distillation for fine-grained image recognition. J Vis Commun Image Represent 2021; 81: 103368.

19.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. arXiv 2017; 1706: 3762.

20.

Howard

Zhu

Chen

, et al. MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv 2017; 1704: 4861.

21.

Huang

Sarabi

Ragab

. MobileNet-V2/IFHO model for accurate detection of early-stage diabetic retinopathy. Heliyon 2024; 10: e37293.

22.

Zhang

Ren

, et al. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR Las Vegas, NV, June 01, 2016 2016, p.1).

23.

Sahli

Ben Slama

Zeraii

, et al. ResNet-SVM: fusion based glioblastoma tumor segmentation and classification. J Xray Sci Technol 2023; 31: 27–48.

24.

Shin

Kim

Ahn

, et al. Development and validation of a deep learning-based model to distinguish glioblastoma from solitary brain metastasis using conventional MR images. AJNR Am J Neuroradiol 2021; 42: 838–844.

25.

Huang

Liu

van der Maaten

, et al. Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017; 243: 2261–2269.

26.

Guo

Wang

Chen

, et al. Multimodal MRI image decision fusion-based network for glioma classification. Front Oncol 2022; 12: 819673.

27.

Alshammari

. Densenet_ HybWWoA: a DenseNet-based brain metastasis classification with a hybrid metaheuristic feature selection strategy. Biomedicines 2023; 11: 1354.

28.

Simonyan

Zisserman

. Very deep convolutional networks for large-scale image recognition. arXiv 2015; 1409: 1556.

29.

Saluja

Trivedi

Sarangdevot

. Advancing glioma diagnosis: integrating custom U-net and VGG-16 for improved grading in MR imaging. Math Biosci Eng 2024; 21: 4328–4350.

30.

Szegedy

Liu

Jia

, et al. Going deeper with convolutions. arXiv 2014; 1409: 4842.

31.

, et al. Differentiation of glioma mimicking encephalitis and encephalitis using multiparametric MR-based deep learning. Front Oncol 2021; 11: 639062.

32.

Khaliki

Başarslan

. Brain tumor detection from images and comparison with transfer learning methods and 3-layer CNN. Sci Rep 2024; 14: 2664.

33.

Zhao

Wang

Zhang

, et al. A review of convolutional neural networks in computer vision. Artif Intell Rev 2024; 57: 99.

34.

Liu

, et al. Self-attention convolutional neural network for improved MR image reconstruction. Inf Sci (Ny) 2019; 490: 317–328.

35.

Bauer

Wiest

Nolte

, et al. A survey of MRI-based medical image analysis for brain tumor studies. Phys Med Biol 2013; 58: R97–129.

36.

de Bont

Packer

Michiels

, et al. Biological background of pediatric medulloblastoma and ependymoma: a review from a translational research perspective. Neuro Oncol 2008; 10: 1040–1060.

Precise identification of medulloblastoma in MRI images using a convolutional neural network integrated with a self-attention mechanism

Abstract

Objective

Methods

Results

Conclusion

Keywords

Introduction

Methods

Ethical approval

Data acquisition

Training data

External validation data

Data preprocessing

Model construction

Proposed deep learning model

Model comparisons

Statistical analysis

Results

Experimental setup

Cross-validation results

External-validation results

Visualization and clinical interpretability analysis

Discussion

Conclusion

Footnotes

ORCID iDs

Ethical considerations

Author contributions

Funding

Declaration of conflicting interests

Guarantors

References