Abstract
White matter hyperintensity (WMH) is associated with cognitive impairment. In this study, 79 patients with WMH from hospital 1 were randomly divided into a training set (62 patients) and an internal validation set (17 patients). In addition, 29 WMH patients from hospital 2 were used as an external validation set. Cognitive status was determined based on neuropsychological assessment results. A deep learning convolutional neural network of VB-Nets was used to automatically identify and segment whole-brain subregions and WMH. The PyRadiomics package in Python was used to automatically extract radiomic features from the WMH and bilateral hippocampi. Delong tests revealed that the random forest model based on combined features had the best performance for the detection of cognitive impairment in WMH patients, with an AUC of 0.900 in the external validation set. Our results provide clinical doctors with a reliable tool for the early diagnosis of cognitive impairment in WMH patients.
Significance Statements
• Our VB-Net could segment the WMH and whole-brain subregions automatically and accurately. • RF models based on radiomic features of WMH and bilateral hippocampi performed excellently in detecting cognitive impairments. • Age, education level and Hachinski score were risk factors for cognitive impairment in patients with WMH.
Introduction
White matter hyperintensities (WMH) are common in the elderly population, and the WMH burden typically increases with age. 1 According to previous reports, more than 80% of people aged 60 to 70 years suffered from WMH, and the prevalence of WMH among people aged 80 to 90 years could reach 100%. 2 Many studies had shown that WMH was a risk factor for cognitive impairment. 3 WMH could increase the risk of all-cause dementia by 14% and accelerate the transformation of cognitive impairment into dementia. 4 Interventions in the early stages of cognitive impairment could help delay the onset of irreversible dementia. However, at present, the diagnosis of cognitive impairment in WMH patients still relies on clinical symptoms and neuropsychological tests, which are highly subjective, time-consuming, and easily influenced by patient cooperation. 5 Therefore, investigating new technologies for the rapid and accurate identification of cognitive impairment in WMH patients is highly important to help patients receive targeted treatment in time, improve their prognosis and improve quality of life.
Magnetic resonance imaging (MRI) can be used to evaluate the brain structure and function of patients with WMH in a noninvasive, radiation-free, and high-resolution manner. 6 Radiomics, developed by Dutch professor Philippe Lambin in 2012, 7 enables the quantitative analysis of physiological and pathological changes in lesions by collecting many invisible image features through high throughput from raw MRI data. Previous studies had shown that hippocampus-based radiomics could be used for imaging-based diagnosis of mild cognitive impairment (MCI) and Alzheimer’s disease (AD). 8 Hanseeuw BJ et al reported reduced hippocampal volume (HV), cortical metabolism and thickness in patients with MCI. 9 The internal texture features of WMH had also been found to be closely related to the occurrence of cognitive impairment. 10 However, the diagnosis of cognitive impairment in patients with WMH by combining analyses of WMH and the hippocampus in MR images had not been reported. At present, the segmentation of brain subregions and WMH relies mostly on manual or semiautomatic delineation, which is very time-consuming and subjective. In this study, patients with WMH with Fazekas scores of 2 or 3 were included and divided into a group with cognitive impairment and a group without cognitive impairment based on their cognitive status. A convolutional network was applied to automatically perform whole-brain subregion and WMH segmentation. Radiomic features of WMH and bilateral hippocampi were extracted and trained through machine learning methods to develop rapid and accurate models to diagnose cognitive impairment in patients with WMH.
Methods
Participants
A total of 122 patients with WMH admitted to the Second Affiliated Hospital of Chongqing Medical University (Hospital 1) between September 2019 and June 2023 were retrospectively included. WMH was evaluated using the modified Fazekas scale on T2 fluid attenuated inversion recovery (T2-FLAIR) sequences (0 = non-lesion, 1 = punctate lesion, 2 = lamellar fusion, and 3 = extensive fusion). The Fazekas score for each patient was determined by 2 radiologists with more than 10 years of experience who were blinded to the patients’ clinical information and cognitive status. The inclusion criteria were as follows: (1) aged 45-80 years and (2) Fazekas grade 2 or 3. The exclusion criteria were as follows: (1) acute intracranial macrovaseular diseases such as ischaemic stroke and cerebral haemorrhage; (2) metabolic encephalopathy, ischaemic hypoxic encephalopathy, or other nonvascular white matter lesions; (3) other coexisting intracranial lesions such as tumours, dementia, craniocerebral trauma, or other diseases; (4) incomplete clinical data; (5) incomplete imaging data; and (6) severe imaging artefacts. All patients completed a formal neuropsychological assessment including the following tests: the Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), Clinical Dementia Rating (CDR), Geriatric Depression Scale (GDS), Activities of Daily Living (ADL) and Hachinski Ischaemia Index scale (HIS). According to the cognitive status based on neuropsychological test results, all patients were subdivided into 2 groups: WMH, with cognitive impairment, and WMH, without cognitive impairment. Finally, 42 patients with cognitive impairment and 37 patients without cognitive impairment from hospital 1 were included in this study. They were randomly divided into a training set (62 patients) and an internal validation set (17 patients) at a ratio of 8:2. In addition, we collected MRI data from 29 patients, including 14 patients with cognitive impairment and 15 patients without cognitive impairment, from Chongqing University Central Hospital (Hospital 2) between February 2020 and July 2023 as an external validation set. The inclusion and exclusion criteria were the same as those described above. The flow chart of subject recruitment was shown in Figure 1. Flow chart of subject recruitment.
Clinical and Laboratory Characteristics
Clinical and laboratory characteristics, including sex, age, smoking status, alcohol abuse status, diabetes status, hypertension status, coronary heart disease status, and body mass index (BMI), were directly obtained via standardized forms. The laboratory characteristics included fasting blood glucose (GLU), glycosylated haemoglobin (HbAlc), total cholesterol (TC), low density lipoprotein (LDL), high density lipoprotein (HDL), free fatty acid (FFA), triglyceride (TG), thrombin time (TT), prothrombin time (PT), activated partial thrombin time (APTT), fibrinogen (FIB) and D-dimer.
MRI Scanning
MR images of all the subjects treated at Hospital 1 were obtained using a 3.0T MRI scanner (Achieva 3.0T scanner, Phillips, Netherlands). The scanning sequence included T1-weighted imaging (T1WI), T2-FLAIR, and diffusion-weighted imaging (DWI). The parameters of the T1WI sequence were as follows: repetition time (TR) = 7.9 ms, echo time (TE) = 39 ms, field of view (FOV) = 256 × 256, layer thickness = 2 mm, and layer spacing = 1 mm. The parameters of the T2-FLAIR sequence were TR = 4800 ms, TE = 279 ms, FOV = 256 × 256, layer thickness = 1.6 mm, and layer spacing = 0.8 mm. The parameters of the DWI sequence were as follows: TR = 6274 msec, TE = 68 msec, FOV = 220 × 220, matrix = 88 × 88, thickness = 2.5 mm, interslice gap = 30 mm. MRI data from Hospital 2 were obtained using a 1.5T MRI scanner (uMR560 1.5T, United Imaging Limited, China). The parameters of the T1WI sequence were as follows: TR = 10.4 ms, TE = 4.4 ms, FOV = 256 × 240, layer thickness = 1.0 mm, and layer spacing = 0 mm. The parameters of the T2-FLAIR sequence were TR = 6500 ms, TE = 464.5 ms, FOV = 256 × 240, layer thickness = 1 mm, and layer spacing = 0 mm. The parameters of the DWI sequence were TR = 3000 msec, TE = 97.6 msec, FOV = 230 × 230, matrix = 128 × 100, thickness = 6 mm, and interslice gap = 30 mm.
Brain Subregion Segmentation and Volume Extraction
The preprocessing steps included skull stripping, bias correction, and image resampling to 1 mm isotropic resolution. A 3D VB-Net was trained for brain subregion segmentation, 11 which takes each sample T1 image as input and outputs a corresponding brain map label. By adjusting the network parameters based on the difference between the actual brain partition and the output brain partition, continuous training was performed until the network converged, ensuring that the output label image closely matched the corresponding partition image of the sample. Throughout the training process, we adopted a cascade strategy of layer-by-layer segmentation to more accurately capture the complexity and difficulty of the brain segmentation problem. Additionally, the segmentation performance of the network was enhanced by providing additional information to the lower network, progressively achieving the fine division of brain regions, midbrain regions and brain structures. This model was trained on 1800 T1WI images from the publicly available Consortium for Reliability and Reproducibility (CoRR) dataset and the Chinese Brain Molecular and Functional Mapping (CBMFM) project,12,13 with an average overlap rate of 0.92 between the dice and the fundamental truth. The entire brain was automatically divided into 109 subregions according to the DK atlas (Supplemental material 1), 14 including 20 subregions in the frontal lobe, 22 subregions in the temporal lobe, 12 subregions in the parietal lobe, 12 subcortical nuclei, white matter structures and other structures. The segmentation process took less than half a minute for each patient. Then, the volume of 109 brain regions was extracted automatically.
WMH Segmentation
The WMH on FLAIR images was automatically segmented using a 2D VB-Net, which incorporated the advantages of an efficient encoder-decoder framework for feature embedding, residual connections for information flow, and bottleneck layers for model compression. 15 This 2D VB-Net was trained on a large MR dataset of 1045 WMH subjects. 16 To handle differences in pixel spacing between scanners, the image and labelled areas were resampled evenly in the x and y directions with a spacing of 0.5. The pixel spacing in the z direction remained unchanged. The input two-dimensional block for training was 256 × 256, which was randomly sampled from the image body. The kaim initialization strategy was adopted for weight initialization, and the optimizer was adaptive moment estimation, with the first moment coefficient = 0.9, the second moment coefficient = 0.999, the learning rate = 0.001, and the mini-batch size = 48. The number of bottleneck structures of the high levels was reduced. The vertical depth was the level, with the original input and output depth as level 1; after downsampling 4 times as level 5, a lower level indicated a greater spatial resolution but fewer feature maps and vice versa. Because the number of segmentation categories is relatively small, reducing the bottleneck structure would not impact the segmentation accuracy but would help reduce the model parameters and hence improve the robustness. The output block was adjusted, and 2 convolutional layers were added to the original basis to generate the segmentation probability map. The WMH segmentation model was directly applied to the FLAIR images to identify the sites of WMH ROIs. T1WI images were registered to FLAIR using the medical image registration software Advanced Normalization Tools (ATNs) with affine registration.
WMH and Hippocampal Feature Extraction
Using the PyRadiomics package in Python (version 2.1.2, https://pyradiomics. readthedocs.io/), 824 radiomics features were automatically extracted from the WMH regions in the FLAIR/T1WI sequences and the bilateral hippocampi in the T1WI sequence of each patient. These features included first-order statistics and texture features derived from the original images and wavelet transformations of the original images. Eight types of wavelet features were obtained and labelled LLL, LLH, LHL, LHH, HLL, HLH, HHL, and HHH. Additionally, 14 shape features were extracted from the original images. Finally, z score normalization was used to standardize all the radiomic features, and the reproducibility of the features was assessed using a pipeline that met the recommendations of the Image Biomarker Standardization Initiative.
Feature Dimensionality Reduction, Model Building, and Evaluation
The volumes of 109 brain subregions, radiomics features of WMH-T1WI, WMH-FLAIR and each hippocampus were dimensionality reduced and modelled, and all the features were then fused to establish a combined model. The dimensionality reduction process included Max-Relevance and Min-Redundancy (MRMR) and the least absolute shrinkage and selection operator (LASSO) methods. Finally, the random forest (RF) classifier was trained on the training set, adjusted on the internal validation set and evaluated on the independent set.
17
The flow chart of our study is shown in Figure 2. Flow chart of WMH and brain subregion segmentation (A), feature extraction from WMH, bilateral hippocampi (B), and feature dimension reduction and model construction (C).
Statistical Analysis
Statistical analyses were performed using the R language (version 4.0.4) and Python software using the scikit-learn library. The Mann‒Whitney U test or Student's t test was used to test the normal distribution of continuous variables. The chi-square test was used to compare categorical variables. A two-tailed P value <.05 indicated statistical significance. The performance of the classifier model on the test subset was assessed by the average accuracy, sensitivity/recall, specificity, precision and 95% confidence intervals (CIs) based on the case probability cut-off of 0.5, as well as the F score indicator and the area under the curve (AUC). The Delong test was used to compare the performance of each model.
Results
Patients, Characteristics in the Training Set, Testing Set and External Testing Set.
Note. P value was calculated from two-sample t test for continuous variables and from Chi-squared test for discrete variables. *means P values less than 0.05. MMSE, mini-mental state examination; MoCA, montreal cognitive assessment; GDS, geriatric depression scale; ADL, activities of daily living; GLU, Glucose; TC, total cholesterol; LDL-C, low density lipoprotein cholesterol; HDL-C, high density lipoprotein cholesterol ; FFA, free fatty acids; TG, triglyceride; PT, prothrombin time; APTT, activated partial thromboplastin time; TT, thrombin time; FIB, fibrinogen.
Features Retained After Dimensionality Reduction by LASSO.

Receiver operating characteristic (ROC) curves for diagnostic models in the training set (A), testing set (B) and external testing set (C).

Calibration curves for nomogram goodness of fit in the training set (A), testing set (B) and external testing set (C). The 45° line indicated that the probability predicted by the model matches the actual probability. The closer the distance between the 2 curves was, the higher the accuracy.

Decision curves of different models in the training set (A), testing set (B) and external testing set (C). The Y-axis represented the net benefit, and the X-axis represented the threshold probability. The RF model had a greater overall net gain in detecting cognitive impairment in patients with WMH.
Discussion
Many studies had shown that WMH could cause cognitive impairment and accelerate the transition from mild cognitive impairment to dementia.18,19 Once a patient is diagnosed with irreversible dementia, 20 families and societies face a heavy burden in both developed and developing countries.21,22 Therefore, early and accurate identification of cognitive impairment in WMH patients has important clinical significance. At present, cognitive impairment in patients with WMH is mainly diagnosed through clinical symptoms and neuropsychological assessments, which are highly subjective and time-consuming, and their clinical applications are greatly limited. In this study, we used deep learning to automatically identify and segment brain subregions and WMH and established models using radiomics and machine learning to achieve automatic diagnosis. Our results indicated that the RF model based on radiomic features of WMH and bilateral hippocampi had excellent accuracy, with AUC values of 0.976, 0.937, and 0.900 in the training set, internal validation set, and external validation set, respectively. Our method, for the first time automatically detected cognitive impairment in patients with WMH without the need for human intervention and could be completed within 3 minutes. This method was objective and accurate and had important clinical application significance. Physicians only needed to import the MR images of patients to quickly and automatically obtain accurate diagnostic results. This strategy greatly improved the efficiency of clinical practice. Furthermore, external independent validation was utilized to ensure the reliability and universality of the results. In the past, Chu T et al used the volume and texture features of the hippocampus based on T1-weighted MR images to establish a support vector machine model to diagnose MCI, with an AUC of 0.90. Their method needed more than 4 hours to process a patient, making it a time-consuming strategy. 23 Liu M et al proposed a multimodel deep learning framework for automatic hippocampal segmentation and MCI diagnosis using structural MRI data, with an accuracy of 76.2%. 24 Leandrou S et al extracted radiomic features from the entorhinal cortex and hippocampus to distinguish NC, MCI and AD. The results revealed that the F1 scores of XGBoost for the discrimination of NC vs AD, MC vs MCI, and MCI vs AD were 0.949, 0.818 and 0.810, respectively. 25 Our research was the first to diagnose cognitive impairment in WMH patients using artificial intelligence.
MRI can be used to quantitatively evaluate changes in brain structure and is popular for WMH imaging in clinical practice. Previously, Maillard et al 26 suggested that early detection of microstructural damage in the brain needed advanced MRI techniques, such as diffusion tensor imaging (DTI). Our method was based on basic T1WI and FLARI sequences, which were routine scanning sequences used in clinical practice and did not need advanced or expensive additional imaging sequences, thereby reducing the economic burden on patients. Most previous studies on WMH were based on 2D FLAIR sequences. 27 In this study, thin-layer 3D FLAIR sequences were used for the first time, which ensured that the results were reliable. Obtaining many 3D images would require at least 30 minutes when using traditional manual drawing methods to segment WMH and hippocampi layer by layer, greatly reducing the efficiency of clinical diagnosis. On the other hand, long-term manual sketching could easily lead to visual fatigue and operational errors. The 3D-VB network we developed was based on U-Net, which could automatically segment WMH and brain subregions. It achieved a Dice consistency of 0.878 with manual segmentation by 2 observers, indicating high accuracy.
In this study, after dimensionality reduction, 19 features were considered to have important diagnostic value and were used for model building, including 2 first-order features and 14 texture features from WMH and the hippocampus 28 and 2 volume features from the temporal and parietal lobes. Previous studies had also revealed that radiomic features of first-order features and textural features played important roles in the diagnosis of other diseases, such as dementia and Parkinson’s disease. The first-order features describe the distribution of voxel intensity in the image area through basic metrics, thereby reflecting changes in WMH and the internal structure of the hippocampus. The remaining 14 texture features were obtained by calculating the statistical correlation between adjacent voxels, providing a measure of voxel intensity spatial arrangement. The grey level co-occurrence matrix (GLCM) quantifies the incidence of voxels with the same intensity in a fixed direction within a predetermined distance. 29 The grey level dependency matrix (GLDM) was used to quantify the number of connected voxels that depended on centrosomes. 30 The grayscale size region matrix (GLSZM) was defined as the number of connections that shared the same grayscale intensity. When the microstructure of the WMH or hippocampus changes, the intensity and continuity of voxels change and are reflected by these texture features. GLSZM and GLCM can reflect mild pathological changes, including increased water content and decreased myelin sheath content, which made white matter fibres rough and blurry. In addition, we also reported that 2 volume features of the temporal and parietal lobes were preserved. 31 Although changes in the hippocampus and temporal lobe related to cognitive impairment had been well documented in studies, changes in the parietal lobe had not been fully explored. Verfaillie SC et al reported that the parietal cortex was thinner in patients with subjective cognitive decline (SCD) who progressed to non-AD dementia than in patients with stable SCD. 32 Delvenne JF et al reported that white matter fibres projecting from the posterior region of the corpus callosum to the parietal lobe region were damaged in MCI and AD patients, as reflected by the relevant diffusion parameters of DTI. 33
This study also identified several clinical risk factors, such as age, education level, and Hachinski scores, for cognitive impairment in patients with WMH. Many previous studies had shown that WMH burden could significantly increase with age and that old age was a key risk factor for cognitive impairment. Educational level might be a protective factor because the level of education inversely correlates with the likelihood of cognitive impairment. Harkness et al 34 noted that good educational experience and rich intellectual activities strengthen an individual’s ability to resist cognitive impairment. In addition, this study revealed for the first time that the Hachinski score was also a risk factor for cognitive impairment in patients with WMH. The Hachinski scale is a common clinical tool used to identify vascular dementia. 35 Our finding reflected that the impairment of cerebral blood vessels were also important factors of cognitive impairment.
This study had several limitations. First, due to the retrospective design and small sample size, selection biases might occurred. Second, all patients in the sample were Asian. Therefore, multiethnic populations from different countries needed to be examined to further validate the model. Finally, to achieve automatic and rapid diagnosis, this study did not include DTI sequences. Future research should develop automatic analysis methods for DTI images and comprehensively use multimodal MRI data to further improve diagnostic accuracy.
Conclusion
In this study, we established an automated and rapid method to identify cognitive impairment in WMH patients. Deep learning of VB-Nets was used to automatically segment white matter hyperintensities and brain subregions. Machine learning and radiomics were used to accurately diagnose cognitive impairments. The results of our study provide doctors with a reliable tool for the early diagnosis of cognitive impairment in practice.
Supplemental Material
Supplemental Material - Automatic Detection of Cognitive Impairment in Patients With White Matter Hyperintensity Using Deep Learning and Radiomics
Supplemental Material for Automatic Detection of Cognitive Impairment in Patients With White Matter Hyperintensity Using Deep Learning and Radiomics by Junbang Feng, Xingyan Le, Li Li, Lin Tang, Yuwei Xia, Feng Shi, Yi Guo, Yueqin Zhou, and Chuanming Li in American Journal of Alzheimer’s Disease & Other Dementias®
Footnotes
Acknowledgments
The authors thank the participants and referring technicians for their selfless and valuable assistance in this study.
Author Contributions
All the authors contributed to the study conception and design. Junbang Feng and Xingyan Le: Provision of study materials or patients. Junbang Feng, Xingyan Le, Li Li and Lin Tang: Collection and assembly of data. Yuwei Xia, Feng Shi: Data analysis and interpretation. All authors: Manuscript writing. All authors: Final approval of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Fundamental Research Funds for the Central Universities of China (Project NO. 2022CDJYGRH-004); the Natural Science Foundation Project of Chongqing (CSTB2024NSCQ-MSX1265); the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202400117); and the Chongqing medical scientific research project (Joint project of Chongqing Health Commission and Science and Technology Bureau) (2022QNXM013).
Ethical Statement
Data Availability Statement
The dataset used and analysed in this study is available from the corresponding author upon request.
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
