Abstract
Brain surgery offers the best chance of seizure-freedom for patients with focal drug-resistant epilepsy, but only 50% achieve sustained seizure-freedom. With the explosion of data collected during routine presurgical evaluations and recent advances in computational science, we now have a tremendous potential to achieve precision epilepsy surgery: a data-driven tailoring of surgical planning. This review highlights the clinical need, the relevant computational science focusing on machine learning, and discusses some specific applications in epilepsy surgery.
Review
Brain surgery offers the best chance of seizure-freedom for patients with focal drug-resistant epilepsy (DRE), but only 50% achieve sustained seizure-freedom. Efforts to improve outcomes have focused on better epilepsy localization with advanced imaging and electrophysiology. Presurgical testing has grown exponentially in breadth, and depth. With this data explosion comes a tremendous potential to improve patient care, a potential that remains largely untapped. This review summarizes the case for machine learning (ML), its challenges, and its opportunities as a path to close the gap between data availability and data utility in the context of epilepsy surgery.
The Case for Machine Learning
The time is ripe with convergence of 2 forces: the clinical need has never been stronger, and the computational tools are now available.
Clinical case: the need for precision epilepsy surgery. While it has been shown
1,2
that epilepsy specialists are no better than chance in predicting outcomes, the decision of whether and what to resect or ablate for the best outcome in any ONE patient is still largely subjective. Noninvasive presurgical tests (scalp video-EEG, structural and functional imaging, and neuropsychological assessment) are typically reviewed in a multidisciplinary surgery conference. The group decides on patient candidacy and anatomical extent of the procedure based on their collective experience. If localization is still elusive, invasive intracranial EEG is recommended. Extensive correlations of invasive and noninvasive findings with outcomes have been identified, and multiple predictive models exist. Yet, outcomes have not been revolutionized: only 60% to 70% maintain seizure-freedom a decade postresection for temporal lobe epilepsy (TLE) and 45% to 50% after extra-TLE (ExTLE) surgery.
3
Intracranial EEG implantation without a resulting resection has grown 10-fold in the past 20 years,
4
exposing patients to excessive risk and cost. We need a paradigm-shift to move past predicting surgical outcomes to “precision epilepsy surgery,” data-driven planning of the ideal surgery for any ONE patient. Fortunately, we may be ready: the computational landscape has steadily matured starting with classical ML, to deep learning, and now to foundation models.
Computational case: the machine learning evolution: Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
5
“Classical” ML usually requires structured data so is very dependent on human experts who determine the set of features used to understand the differences between data inputs. Given the complexity of presurgical tests and the sheer number of “annotations” that need to be done for meaningful analysis, one can intuitively grasp the limitations of such a manpower intensive method. “Deep” machine learning can use labeled datasets to inform its algorithm (in which case it is known as supervised learning), but it doesn’t necessarily require them. Deep learning can ingest unstructured data in its raw form (e.g., text or images), and it can automatically determine the set of features which distinguish different categories of data. This reduces the need for human intervention and greatly facilitates research with large data sets. Deep learning has therefore been extensively applied to imaging data for detection, segmentation, and classification tasks. The problem is that deep learning does not eliminate the need for large datasets. It is estimated that 1000 to 4000 samples per class are required for reliable and reproducible prediction.
6
–9
While improving model performance was limited over the past decade to optimizing architectures and scaling up the training datasets, both strategies are particularly challenging in epilepsy surgery where cohorts are typically small. Foundation AI models were initially introduced in natural language processing in 2018, but have since been applied in computer vision and robotics.
10
–12
Visual transformers revolutionized how AI models approach sequential data and images by allowing complex predictions in smaller datasets. They are enabled by transfer learning and scale. The idea of transfer learning is to take knowledge learned from one task (e.g., object recognition in images) and apply it to another task (e.g., activity recognition in videos). The premise is that different phenotypes are often correlated, so knowledge learned from predicting the one should better prepare us to learn the second. Learning is thus accomplished through first pretraining the model on the surrogate task (often just as a means to an end) in a large dataset (e.g., a large cohort with epilepsy) and then adapting it to the downstream task of interest via fine-tuning in a small boutique dataset (e.g., the specialized smaller subgroup undergoing epilepsy surgery). Conceptually interesting, this approach has yet to be fully explored in epilepsy and epilepsy surgery.
Challenges for ML in Precision Epilepsy Surgery
The prevailing dictum is that surgery “fails” when the brain region causing seizures is inaccurately localized or incompletely resected/ablated.
13
–15
Yet, the notion that focal DRE extends beyond an isolated “focus” to dysfunctional networks causing seizures and cognitive dysfunction has been extensively published.
15
–20
If we hypothesize that multifactorial interactions between electrophysiologic, anatomic, and functional underpinnings of the epileptic network drive surgical outcomes, it follows that optimal surgical outcome prediction and surgical planning improve with integrating multimodality data at the individual patient level. Presurgical tests measure various facets of the epileptic network: structure through high-resolution brain MRI, connectivity through diffusion-weighted imaging and functional MRI (fMRI), and electrophysiological activation through EEG. While research has progressed within each field, precision surgery requires a comprehensive multimodality approach. A 3-fold challenge now prevents this goal:
Current tools for individualized prediction remain oversimplistic. Nomograms are widely used to assist in clinical decision-making, mostly in cancer.
21,22
We pioneered nomograms in epilepsy,
23
creating the first prediction models and online risk calculators for individualized prediction of seizure-freedom, cognitive, and mood outcomes,
24
–28
and excellent clinical utility (available at https://riskcalc.org/, used >6000 times/year in United States alone in 2021). Accuracy is reasonable (c-index from 0.65-0.81, with c-index of 1 indicating perfect discrimination, and 0.5 equivalent to chance). Yet, nomogram inputs are limited to a few options in an online risk calculator interface. This oversimplifies the available data, limits multimodality integration, and caps predictive performance. For example, while a random-forest approach identified that a network of atrophic regions outside the surgical bed significantly influences seizure outcome of TLE and ExTLE surgery (c-index up to 0.89),
29,30
volumetric MRI measurements are too complex to capture in a nomogram, so the subtlety of high-resolution brain MRI is reduced to “normal” versus “abnormal” in the online calculators. Complementing the computational power of ML with the nomograms’ clinical utility would be transformative.
Categorical classification of surgery (temporal vs extra-temporal) in current ML models oversimplifies the rich anatomical variability of epilepsy surgery. Even when advanced imaging or neurophysiology serve as prediction inputs, the surgical resection/ablation, arguably the most crucial outcome driver, is coarsely classified as TLE or ExTLE surgery. Researchers manually segment the surgical lacuna to quantitatively delineate the resection
31
or postablation cavity,
32
but manual segmentation is time-consuming and subject to inter-rater variation, limiting clinical translation. For example, a recent nomogram predicts seizure-freedom and cognitive decline after surgery in patients with a structurally normal hippocampus.
27
Although outcomes varied by extent of hippocampal resection, the final online calculator simply stratifies surgery by “hippocampus spared” versus “hippocampus resected.” A flexible representation of all surgical approaches being considered for a given patient must be easily incorporated into predictive models to optimize clinical value. Some efforts are now focusing on this topic,
33
–35
but more needs to be done.
Machine learning offers a promising approach to Big Data but faces significant challenges in a rare disease like epilepsy. Support Vector Machine (SVM) classifiers
36,37
on T1-weighted MRI, and a neural network classifier for Diffusion Tensor imaging (DTI)-based structural connectomes
38,39
predicted postoperative seizure-freedom with sensitivity and specificity of 85% in small cohorts. Similarly, a SVM classifier applied to functional connectivity from stereo-EEG in 23 patients having TLE surgery, predicted seizure-freedom at 1-year follow-up with a sensitivity of 90% and specificity of 85%.
40
Deep learning algorithms can automatically annotate EEG data and detect seizures.
41
Many publications studied extracted EEG or imaging features.
38,42
–44
Elegant research is attempting to recreate an epileptic network using invasive stereo-EEG data. This is a very promising landscape. Yet, ML-based research in epilepsy surgery is still challenged by the following: Most, if not all, studies rely on cohort group comparisons with unclear translation at the individual patient level: ML features that correlate with risk in the population are identified, but it is unclear how to apply that information for an individual patient.
Deep dive into a single modality, mainly focusing on electrophysiology or imaging data. This approach advances understanding in that particular dimension, but in the bigger picture, provides an isolated view of the epileptic network, or
Employ technologies unavailable or unnecessary in most surgical cases (e.g., intracranial EEG or advanced functional imaging) limiting its relevance to patients evaluated in sophisticated surgical programs already committed to invasive testing,
Are built on small underpowered cohorts, and so are overfit: models are trained too well on development cohorts (show excellent sensitivity and specificity in their index publication on their development cohorts but generalize poorly).
Because the strength of the relationship between brain data and clinical outcomes (effect size) is low, 45 a very large sample size (at least several thousand people) is needed to train predictive algorithms to deliver robust and generalizable predictions. Epilepsy is a rare disease, and only a small fraction undergoes brain surgery. We need innovative ML approaches specifically designed to address this mismatch between data requirements and data availability and to reliably integrate multimodal data. For wide translation, these models should be built using routinely acquired noninvasive data.
Opportunities of ML
Multiple promising innovative ML approaches now exist. A few are chosen here for illustration. Graph Convolutional Neural Networks (CNN), a type of CNN that allows fusing multimodal inputs in a domain-guided fashion particularly useful for ML models with multichannel EEG data because connectivity between multiple channels can be explained with domain knowledge.
46
Wide and Deep learning leverages the strengths of deep learning to train models using high-dimensional data (e.g., MRI images) (deep), while simultaneously incorporating information from lower dimensional data (e.g., demographic or clinical features) that can be represented with simple regression models (wide). On a more technical level, deep learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but nonlinear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level, particularly valuable in discovering intricate structures in high-dimensional data. Wide and deep learning has performed better than wide-only and deep-only models.
47,48
Hypothesizing that the epileptic network has intrinsic high-dimensional structural, electrophysiological, and connectivity asymmetries beyond those identified by standard clinical variables and clear extracted features, this is a potentially attractive approach in epilepsy surgery research. Foundation AI models, pretraining in large datasets then fine-tuning in smaller cohorts have shown remarkable improvements in prediction performance at least doubling the accuracy compared to using the small dataset alone,
49
particularly when the phenotypes evaluated in pretraining and fine-tuning are strongly correlated. Successful applications have been demonstrated in multiple epileptic network measures. Structural MRI: For building a source model generally applicable to various tasks of 3D MRI analysis, a medical transformer model was pretrained in a self-supervised learning manner for masked encoding vector prediction as a proxy task, using a large-scale normal, healthy brain MRI from 3 public datasets (Information eXtraction from Images,
50
Cambridge Centre for Ageing and Neuroscience,
51
and Autism Brain Imaging Data Exchange
52
). The pretrained model outperformed the state-of-the-art learning methods on the downstream tasks of brain disease diagnosis, brain age prediction, and brain tumor segmentation as it efficiently reduced the number of parameters up to 92% for classification and regression tasks, and 97% for segmentation, and still performed well when only partial training samples were used.
49
EEG: Self-supervised speech recognition work called wavevec 2.0
53
was adapted to EEG as arbitrary unlabeled EEG segments from Temple University Hospital (TUH) EEG Corpus
54
(>10 000 people) were first encoded as a sequence of learned vectors, and then successfully transferable to unseen EEG datasets recorded from unseen subjects (sample sizes from 9-105 patients), different hardware, and different tasks (e.g., sleep staging) with up to 87% accuracy. Similarly, the TUH EEG seizure corpus was used to predict multiclass seizure type classification with a weighted F1 score (a measure of precision and recall) of up to 0.901
41
(1 being perfect). Resting State f-MRI (Rs-fMRI): Using 55x55 resting state functional connectivity (RSFC) from the UK Biobank (N = 36 848 people) dataset, He et al
49
recently demonstrated that meta-matching can greatly boost the prediction of new phenotypes in the small independent dataset of the Human Connectome Project (HCP; N = 1019 with 419x419 RSFC matrices) in many scenarios within psychiatry. For example, translating a UK Biobank model to 100 HCP participants yields an 8-fold improvement in variance explained with an average absolute gain of 4.0% (minimum = −0.2%, maximum = 16.0%) across 35 phenotypes. Of note, (i) predictive gain was directly correlated with the strength of the correlation between the phenotypes in the UK Biobank and the phenotype of interest in the HCP; and (ii) A minimum of 10 000 subjects were needed in the large dataset but only 20 to 50 subjects in the HCP sample were required for superiority in absolute prediction emphasizing the importance of appropriately choosing the outputs of pretraining, and the boosting potential of this approach, even in small cohorts.
Conclusion
Significant advances in computational methods can now enable progress in predicting outcomes of epilepsy surgery. This, in turn, can better capture the anatomical subtleties of surgical procedures and facilitate individualized outcome prediction toward an ultimate goal of precision epilepsy surgery. The data-rich scope of work requires collaborative research and is ripe with opportunity.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
