Sage Journals: Discover world-class research

Abstract

Artificial intelligence is the future of clinical practice and is increasingly utilized in medical management and clinical research. The release of ChatGPT3 in 2022 brought generative AI to the headlines and rekindled public interest in software agents that would complete repetitive tasks and save time. Artificial intelligence/machine learning underlies applications and devices which are assisting clinicians in the diagnosis, monitoring, formulation of prognosis, and treatment of patients with a spectrum of neuromuscular diseases. However, these applications have remained in the research sphere, and neurologists as a specialty are running the risk of falling behind other clinical specialties which are quicker to embrace these new technologies. While there are many comprehensive reviews on the use of artificial intelligence/machine learning in medicine, our aim is to provide a simple and practical primer to educate clinicians on the basics of machine learning. This will help clinicians specializing in neuromuscular and electrodiagnostic medicine to understand machine learning applications in nerve and muscle ultrasound, MRI imaging, electrical impendence myography, nerve conductions and electromyography and clinical cohort studies, and the limitations, pitfalls, regulatory and ethical concerns, and future directions. The question is not whether artificial intelligence/machine learning will change clinical practice, but when and how. How future neurologists will look back upon this period of transition will be determined not by how much changed or by how fast clinicians embraced this change but by how much patient outcomes were improved.

Keywords

Neuromuscular Medicine Neuromuscular Imaging Electrodiagnostic Medicine Artificial Intelligence Machine Learning

Introduction

Artificial intelligence (AI) is rapidly changing the face of society and inevitably, clinical practice.^1–7 While the history of AI stretches back decades, most recently, the release of ChatGPT3 in 2022 brought generative AI to the headlines and rekindled public interest in domestic robots, singularity and software agents that would complete repetitive tasks and enable more time for tasks which require dedicated human expertise. Since then, there has been a shift in public perception as industries have scrambled to add ‘genAI’ as a suffix or prefix to their products or work practices. However, this has not always been accompanied by an increase in understanding. The deluge of both tech industry hyperbole perpetuated by opportunists, mixed with genuine technology advancements and publications, has made it challenging for the average clinician to discern innovation from marketing. Particularly in this age of social media, there are a multitude of opinions which range across extremes, either magnifying the impact of the technology or foreshadowing dystopian outcomes. Some of us are privately drowning in a sea of unfamiliar terms, which provide abundant raw fodder for malapropisms in the workplace.

There is no shortage of comprehensive reviews of the use of AI in medicine.^8–12 This primer will review the basics of machine learning (ML) and describe the utility of ML models in neuromuscular and electrodiagnostic medicine, with the hope that by providing an understanding of the fundamentals, opportunities, and pitfalls of ML, a decision to implement such tools can be made from a position of knowledge. In addition to the forthcoming discussion, a glossary of ML terms is provided for reference (Table 1).

Table 1.

Important terminology.

Terminology	Description
Classification	One of two main types of supervised learning techniques (regression the other). Classification models attempt to predict the correct label of given input data e.g., disease present or absent. A prediction task is a classification when the target variable is discrete e.g., disease subtype, and a regression when the target variable is continuous e.g., blood pressure.
Dimensionality Reduction	The process of reducing the number of features or variables in a dataset while preserving its essential information and structure. This reduces the computational complexity of data analysis, helps ML algorithms perform better, removes redundant or highly correlated features, and can improve the efficiency of subsequent data modeling.
Feature	An individual measurable property or characteristic of a dataset that is used as input for a ML model for training.
Foundation Model	A deep learning neural network and form of generative AI trained on enormous datasets able to perform a wide range of disparate tasks such as natural language processing, image classification, image and video generation. Differentiated from traditional ML models by their size and general purpose nature. Use self-supervised training to create labels from input data. Able to continue to learn from inputs during inference. Examples include BERT, OpenAI’s GPT series, Anthropic’s Claude series, Amazon Titan. LLMs are a type of foundation model.
Hyperparameters	Parameters which are set by the model designer and control the learning process of a ML model. Examples include the number of layers of artificial neural networks, the number of neurons in each layer, starting point of every neuron in the network, learning rate. See also, parameters.
Inference	An AI model in action. The process that a trained ML model uses to draw conclusions from brand-new data. The process of running data points into a ML model to calculate an output such as a single numerical score.
Labeling	The process of identifying raw data (images, text files, videos, etc.) with one or more meaningful and informative labels to provide context for a machine learning model to learn from, e.g., the presence of a tumor in a chest x-ray.
Large Language Model (LLM)	A type of machine learning model designed for natural language processing tasks such as language generation. LLMs are trained on vast text datasets and can perform translation, summarization, question answering. E.g., GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers). LLMs are typically associated with text but have been adapted for non-language applications.
Parameters	Internal variables which are learned by the ML model during training. See also, hyperparameters.
Performance Metrics	Performance metrics are used to measure the predictive ability of the ML model. Example metrics include accuracy (proportion of correct predictions), F1-score (harmonic mean of precision and recall), precision (proportion of positive predictions that are correct), recall (sensitivity, i.e., proportion of active positives correctly identified) and Area Under the ROC Curve (plots recall against the false positive rate).
Pre-processing	Preparing data to be used for model training and drawing inferences. The input variables are the parameters that make up the input data. Some of the pre-processing steps include data cleaning (removal of inappropriate entries, choosing the list of input variables for the data etc), feature engineering (assigning integers to categorical variables, statistical measures for variables with continuous values), missing data imputation, standardizing/normalizing (defining a standard range of values for all variables so that every parameter is treated fairly by the model).
Regression	In statistics, regression is a method that attempts to determine the strength and character of the relationship between a dependent variable and one or more independent variables In ML it is similar – a supervised ML technique used to predict a continuous numerical output based on a set of input features. The goal of the regression algorithm is to plot a best-fit line or a curve between the data presented. The three main metrics that are used for evaluating the trained regression model are variance, bias and error. If the variance is high, it leads to overfitting and when the bias is high, it leads to underfitting. Based on the number of input features and output labels, regression is classified as linear (one input and one output), multiple (many inputs and one output) and multivariate (many outputs).
Supervised Learning	A ML technique which uses labelled datasets to train AI algorithm models to identify underlying patterns and relationships between features and outputs. Training data used in supervised learning is manually created and should be similar to the intended input data that the finished model will process. It must be free of data bias to avoid subsequent algorithmic bias.
Testing Data	Data used for (a) estimating the generalizability of a trained model, (b) drawing inferences from a trained model. This is usually a subset of the original data that is neither used in training nor validation. It can be independent data from a distinct data source.
Training Data	A set of information, or inputs, used to teach ML models to recognize patterns and make accurate predictions or decisions.
Unsupervised Learning	A branch of ML that is aimed at learning the distribution of the input data, in the absence of labels.
Validation Data	Data used to decide when to stop the training process. At the end of every training epoch/iteration, the validation data is used to estimate if the model training should cease or continue. In some cases, the training data can be used as the validation data, but that does not allow for estimating the generalizability of the model during the training process.

Key concepts in machine learning

As a grounding, it is worthwhile to first understand how the different terms seen in contemporary publications relate to each other (Figure 1). AI is used to describe the field of computer science determined to create machines that replicate human intelligence.^6,13 Within the field of AI, machine learning refers to a large set of statistical and mathematical methods enabling computers to be trained to recognize patterns and relationships between clinical inputs and outcomes of interest using existing data, without explicit programming. Deep learning falls within ML and refers to multi-layer artificial neural networks that are trained on extremely complex patterns to make predictions.¹³

Figure 1.

Relationship between AI, ML, DL, and generative AI.

While this publication focuses on machine learning, it would be remiss to not mention generative AI. As can be seen in Figure 1, generative AI is borne from machine learning. It recognizes patterns and makes predictions to generate content. Generative AI in the form it is today, whether it be OpenAI's ChatGPT,¹⁴ Anthropic’s Claude,¹⁵ Google’s Gemini,¹ or Meta’s Llama,¹⁶ would not have been possible without prior work such as the landmark research paper from Google researchers in 2017, entitled “Attention Is All You Need”,¹⁷ describing an augmentation to artificial neural networks known as a transformer. The transformer enables longer patterns of data to be analyzed and processed in parallel and is why large language models (LLMs) such as ChatGPT can ‘magically’ comprehend long passages of text, hold conversations, and generate equally long content. Yet, despite this impressive ability to manipulate language, current generative AI lacks genuine understanding of the meaning behind the text they generate. They are no more than stochastic parrots,¹⁸ generating content by statistically stitching together sequences of linguistic forms observed in their training data, relying on probabilistic information about word combinations rather than semantic understanding. This is an important understanding as it explains in part their fundamental flaws, one of which are hallucinations, instances where the model generates text that is factually incorrect or nonsensical, despite appearing grammatically correct and superficially coherent. This phenomenon arises from the probabilistic nature of LLMs and their reliance on statistical patterns in the training data rather than deep semantic understanding of content.

LLMs prioritize fluency and coherence over factual accuracy, causing them to produce plausible-sounding but entirely fabricated statements especially when faced with ambiguous prompts or incomplete information. Hallucinations undermine the reliability of AI in any setting but are particularly dangerous in a clinical setting where factual accuracy drives decision-making.

There are ways to minimize this – prompt engineering, retrieval augmented generation, establishing safety guardrails within the software, but none are adequate solutions. For now, it is perhaps prudent to exercise caution and unwise to rely on LLMs in clinical practice, no matter how fluent they appear to be.

ML algorithms and models

Fundamental to ML are the algorithms - mathematical procedures and techniques that allow computers to learn from data, identify patterns, make predictions, or perform tasks without explicit programming. Which algorithm(s) to use for a particular problem is at the discretion of the data scientist who should be properly armed with knowledge of contemporary techniques. Some options are listed in Table 2. These are far from comprehensive. Many terms will be recognized by those who are familiar with statistical methodologies. In medical applications, it is common to employ several combinations of ML algorithms to enhance accuracy and reliability. Diagnostic imaging may use principal component analysis combined with a neural network for dimensionality reduction and pattern recognition. Disease prediction models may use a combination of gradient boosting and logistic regression, or support vector machine and K-nearest neighbor. Patient monitoring models could use ensemble methods to combine multiple classifiers.

Table 2.

Some commonly-used machine learning algorithms1.^19–122

ML algorithm	Category	Key aspects	Applications	Strengths	Weaknesses
Decision trees	Supervised training	Solves classification and regression by continuously splitting data based on a certain parameter.	Data classification and regression – disease diagnosis, patient risk stratification, treatment decisions.	Suitable for both regression and classification problems. Easily interpreted. Able to handle complex, non-linear relationship between variables. Able to work with both numerical and categorical variables	Prone to overfitting. Sensitivity to data variations – slight changes in input data can lead to different tree structures. Tree structure can become unstable as complexity of data increases. Large trees are difficult to interpret.
Gradient boosting algorithms	Supervised training	Ensemble method which iteratively combines many weaker algorithms (typically decision trees) in a sequential, adaptive learning process. Each new tree corrects the errors made by previous trees and is trained on the residual errors of the ensemble, ultimately minimizing a loss function through gradient descent	Predictive modeling	High predictive accuracy. Flexible and powerful. Works with both categorical and numerical data. Can be used for both classification and regression. Able to handle different types of data structures.	Prone to overfitting. Sensitive to outliers. Computationally demanding. Requires careful hyperparameter tuning.
K-nearest neighbor (KNN)	Supervised training	Nonparametric algorithm that classifies data points based on their proximity and association to other available data	Recommendation engines e,g., disease diagnosis, patient risk assessment, image recognition e.g., tumor classification	Simple, no assumptions on data distribution. Versatile – in multiclass classification problems it can classify data into multiple categories	Processing time increases with size of dataset - less appealing for classification tasks. Dependent on the choice of the ‘k’ value, the number of nearest neighbors considered
Naïve Bayes	Supervised training	Uses principle of class conditional independence from Bayes’ theorem	Text classification, spam identification, recommendation systems	Fast prediction, suitable for solving multi-class prediction problems. Provided the assumption of the independence of features holds true, can perform better than other models and requires much less training data. Suited for categorical input variables than numerical variables	Rare in real-life for all features to be independent. ‘Zero-frequency problem’ where zero probability is assigned to a categorical variable whose category in the test dataset was not available in the training dataset
Random forest	Supervised training	A collection of uncorrelated decision trees (the random forest) which are merged to reduce variance and increase accuracy. This is a type of dimensionality reduction algorithm.	Data classification and regression – disease diagnosis, patient risk stratification, treatment deicisions	Highly accuracy through the use of multiple decision trees. Able to manage large datasets. By averaging results of trees, model tends to generalize better, making it more robust to overfitting.	Complex – less interpretable. Challenging to visualize and understand the entire decision-making process. Higher computational costs than single decision trees, longer training time.
Regression (linear, nonlinear, logistic, polynomial)	Supervised training	Identify the relationship between a continuous dependent variable and one or more independent variables	Predict future outcomes	Regression models are generally easy to implement and computationally efficient. They make no assumptions about distributions	Each type of regression modelling has particular weaknesses. Over-fitting. Sensitivity to outliers, which can skew results, leading to inaccurate models
Support vector machine (SVM)	Supervised training	Separates classes of data points with a decision boundary or hyperplane by maximizing the distance between groups of data	Data classification and regression. Text classification, image recognition.	Excels where there is a clear margin of separation between classes. Performs well in high-dimensional spaces. Good for non-linear relationships. Able to transform data into higher dimensions to find a separating hyperplane	Inefficient when managing very large datasets. Dependent on the choice of parameters. Lacks probability estimates.
Artificial Neural Networks (ANN)	Supervised, Unsupervised, Semi-supervised training	Inspired by the human brain, ANNs comprise layers of interconnected nodes (neurons) organized into input, hidden, and output layers. The network processes data through weighted connections and nonlinear activation functions to learn complex patterns.	Ubiquitous in current AI – computer vision, natural language processing, recommendation systems, disease diagnosis, drug discovery, treatment planning	Excels with high dimensional unstructured data. Versatile – able to handle many types of data (images, text, audio). Excellent at identifying complex patterns. Able to automatically learn hierarchical representations of data through multiple layers of abstraction. Scales well with more data. May be combined with other ANNs. Can work in combination with other ML algorithms.	Requires large amounts of training data. Sensitive to data quality. High computational demands. Long training times. On small data, the same outcome may be achieved with other ML algorithms faster and with less compute. “Black box” – difficult to explain decisions, difficult to debug, challenging to audit. Many hyperparameters to tune. Can be unstable during training. Requires expertise to optimize. Overfitting risk – sensitive to noise in training data
Hierarchical cluster analysis (HCA)	Unsupervised training	Creates a tree-like hierarchy of clusters (dendrogram) without requiring a pre-specified number of clusters. May be agglomerative or divisive. Agglomerative clustering is considered a “bottoms-up approach.” - data points are isolated as separate groupings initially, and then merged together iteratively on the basis of similarity until one cluster has been achieved. Divisive clustering is less commonly used and takes a “top-down” approach - a single data cluster is divided based on the differences between data points.	Phylogenetic analysis and gene expression studies, grouping individuals based on survey responses, image segmentation and object recognition	Does not require predefined clusters, flexible, provides detailed view of the hierarchical structure of the data	Computationally intensive, sensitive to noise and outliers, difficulties choosing the right linkage criterion
K-means clustering	Unsupervised training	Datasets are classified into a particular number of clusters (K) such that all the data points within a cluster are homogenous and heterogeneous from the data in other clusters. The data points closest to a given centroid will be clustered under the same category. Soft or fuzzy k-means clustering allows data points to belong to multiple clusters and is a type of overlapping clustering.	Market segmentation, document clustering, image segmentation, and image compression	Robust, efficient with large datasets, computational light relative to other clustering methods, simple to implement	Assumes spherical clusters, sensitive to initial conditions, requires specification of the number of clusters (k) beforehand.
Principal Component Analysis (PCA)	Unsupervised training	Dimensionality reduction technique finds the most important directions (components) in data by focusing on variance. Unlike most ML algorithms that try to predict or classify, PCA's primary goal is to transform and compress data while preserving its essential structure	Dimensionality reduction. Data pre-processing.	Reduction in feature dimensions. Principal components capture the most variance in the data. The process of transformation to principal components identifies and removes correlated features. New features are independent of each other.	Interpretation can be challenging. Sensitive to the scale of features – normalize data before applying PCA.

ML algorithms can be broadly classified into supervised or unsupervised learning methods although there are additional categories such as semi-supervised, when labelling is incomplete, and reinforcement learning, where algorithms learn through trial and error by interacting with an environment to maximize rewards,

Hyperparameters influence the learning trajectory and the final accuracy of the model and can be manually tuned by the model developer prior to training. Examples of hyperparameters can be found in Table 1. In contrast, parameters are defined as internal variables which are learned by the ML model during training and cannot be manually tuned. Regardless of the method category, each ML algorithm has a set of hyperparameters which control the learning of the model, and these are set before a model is trained.

In research and business meetings today, the phrase “neural network” is often uttered sagely, sometimes without true understanding of its meaning. A neurologist who understands the human biological neural network should be somewhat familiar with its computational equivalent, and if not, at least strongly curious. A primer on machine learning would therefore be remiss without discussing the artificial neural network (ANN). The life sciences equivalent of the ANN in terms of impact might resemble the polymerase chain reaction technique – both methods earned their developers a Nobel Prize – John Hopfield and Geoffrey Hinton were awarded theirs in 2024 for their work advancing the ANN.¹²³ From a history of fluctuating popularity and high computational power requirements, the ANN is now the foundation of global-scale AI tools today across all industries and domains.¹²⁴ The aforementioned transformer which generative AI today relies upon, is itself a type of artificial neural network.

How an ANN works is loosely analogous to the way biological neurons work together. An ANN consists of nodes arranged in layers – an input layer, one of more hidden layers not visible to the user, and an output layer. Each individual node can be viewed as its own linear regression model, composed of input data, weights, a bias (threshold), and an output. A node can be thought of as a thing which calculates a number, a function. Each node is connected to others and has its own associated weight and threshold for activation. A node only activates and transmits data to the next layer if the output reaches a specific threshold value. Thus, activations in one layer determine the activations in the next until they reach the output layer, where each node represents a particular outcome contains the probability of that outcome given the inputs. For example, each node could be a character in the case of an ANN designed for optical character recognition. This is similar to how neurons communicate with each other across neurological synapses. Chemical neurotransmitters bring signals from the terminal axons of preceding neurons across the synaptic gap to the dendrites of a neuron. These neurotransmitter signals may be “weighted” to activate or deactivate the neuron using electricity. The neuron is activated to transmit signal to the following neuron only when the summed signal from all its dendrites passes the electrical threshold for activation, triggering an electrical wave known as an action potential. The action potential of the activated neuron then propagates across the neuron down its axon, to the terminal axons which synapse with the dendrites of the following neurons in the next “layer”. The terminal axons then convert the electrical signal into neurotransmitter signals which cross the synapse and bind to the dendrites of the following neuron in the next “layer”. If the threshold is reached, then that layer is activated.

Backpropagation, short for “backward propagation of errors,” is the method used to train a neural network. It involves calculating the error (the difference between predicted and actual values) and propagating this error backward through the network to update the weights. During training, initial weights and biases are set, then for each input data, a guess is made as to the output. This is compared with the known answer and if incorrect, the weights and biases are updated and the process repeated.

This back-and-forth process is guided by a loss function, and an optimization algorithm which is used to minimize the loss function. The loss function quantifies how well the predictions match the answer. It measures the error between the predicted and true outputs and provides a signal to guide weight updates. Examples of loss functions include mean squared error and cross-entropy loss. Gradient descent, the optimization algorithm commonly used in neural networks, iteratively adjusts a model's parameters in the direction of the steepest descent (negative gradient) of the loss function to minimize error. An analogy would that of a ball rolling around a skateboard park, trying to settle in the deepest bowl (global minimum). The ball naturally rolls downhill (following the gradient), picking up speed on steep ramps (larger gradients) and slowing down on flatter surfaces. If the ball gets stuck in a shallow bowl (local minimum), it might need a push or adjustment to move toward the deepest bowl, ensuring it finds the lowest possible point. At this point the weights and biases for each node are optimal, the network has ‘learned’. This is a topic where informal video^125–128 is more effective for teaching than formal literature.^129,130 Both are provided in the references.

There are many possible augmentations to the neural network – Transformers, the Convolution Neural Network, the Recurrent Neural Network, Generative Adversarial Networks, Graph Neural Networks – these are only a few of the options available. Each advances the traditional ANN in a particular way to produce better results for certain applications.

While variants and augmentations of the classical ANN dominate AI today, it has not rendered other ML algorithms obsolete any more than the tractor eliminated the need for picks and shovels. Sometimes, a simpler algorithm can achieve the same result in less time and computing power. This is especially the case when data is small, or there is enough prior knowledge that deep learning is unnecessary, or a situation requires a method that is explainable and interpretable in its process. It is a question of determining the best tool for the problem.

ML model training

An ML model must be properly trained before it can be deployed. This process is divided into training and testing and shown as a data flow diagram in Figure 2.

Figure 2.

Overview of machine learning model training. Pre-processed data is divided into independent training, validation and testing datasets. Processes 1, 2, and 3 are sequential and dependent upon the completion of the previous process. Labelled training data is run through a supervised machine learning model (1.2). Unlabelled data is run through an unsupervised machine learning model (1.3). Validation data (2) is used to ascertain stopping criteria for training. The training (1) and validation processes (2) run through multiple iterations until satisfactory accuracy as assessed by performance metrics is achieved. The trained model is then tested using the third dataset (3) and the outputs are referred to as inferences.

ML models typically require large amounts of data for modelling the data to draw meaningful inferences from them. Both supervised and unsupervised approaches require input data with sufficient parameters that are descriptive of the problem under study.¹³¹ Examples of parameters include demographic data, laboratory values, drug prescriptions, and imaging data. This data consists of continuous values, such as grayscale units for imaging, or categorical values, such as gender and ethnicity. Complex data such as images or -omics datasets can be used. Simpler machine learning systems use limited input variables, such as age, gender, and other demographic features, and have less information to predict an outcome.

Initially, data used for ML needs to be preprocessed, generating parameters which are the input variables. Parameters can be further processed by feature engineering. Supervised learning methods train the ML algorithms to more accurately “label” or classify data into categories that are determined by experts or gold standard tests. Supervised learning methods require data which is already “labelled” into categories. For example, patients may be labelled with better or worse prognosis,^132–137 as responders or non-responders to treatment,¹³⁸ or with longer or shorter duration of disease.¹³⁹ Other examples of clinical data which may be labelled for supervised learning include labelling imaging findings with their clinical diagnoses^{7,19–22,140–142} and labelling EMG audio or visual data with various EMG waveforms.^7,23 This is the more commonly used form of ML, which requires less data than unsupervised learning (see below), and can achieve human-level performance.

When labelling of data is not possible or impractical due to the volume of data, unsupervised learning approaches are used. The unsupervised ML algorithm processes the unlabelled data to cluster patients with similar characteristics together or learn the distributions of most of the data to identify outliers or patterns. Applications of clustering approaches include clustering similar clinical phenotypes of diseases,^24,136 clustering similar imaging findings,²⁵ and clustering associated gene expression networks.^26,27

Data are always divided into a training dataset, which is used to train the model, and an independent testing set, which is used to assess the accuracy of the model (Figure 1). The training set is used to learn parameters and tune hyperparameters to achieve the best ML accuracy on the independent test set. A small subset of the training set (referred to as the validation set) is used to validate the model during the training process to make decisions on the number of training iterations (consecutive training events) and tuning of hyperparameters. If the dataset is small, a technique known as cross-validation can be used to improve reproducibility. In a cross-validation study, the whole dataset is divided into n-subsets, with each model trained on (n-1) subsets and the remaining subset is used for testing. In this way, each subset will be a test set in at least one model.

One of several issues which may manifest during the training process is overfitting in which the model perfectly fits the data used in training but also mistakenly learns irrelevant features or random fluctuations in that particular dataset as significant concepts influencing the outcomes, when there is actually no logical relationship between them. This could happen when too many parameters are available to fit the data. For example, a model trained too well on a dataset from patients in Europe may be very accurate in answering questions on this particular set of European patients but have very poor performance on a North American database of patients. Regularization techniques can be applied to prevent overfitting which stop the training process early, before the model overfits to the dataset, or increase the effective dataset size by augmenting the model with synthetic data that have been artificially modified so that the model is exposed to a diversity of data distribution and can generalize well. Alternatively, the number of parameters may be reduced.

After training the independent testing dataset is run through the model and results assessed against known outcomes. If the model meets predefined requirements, it is deployed.

Data quality

While an understanding of ML algorithms is important as it helps anticipate the shortcomings of a particular model, it equally important for a clinician to be able to critically assess the quality of the data used to train and validate a model. If possible, it is worthwhile spending time reviewing and discussing the training data with the responsible data scientist – determine whether the training data is from a reputable source, how is it quantified, how was it collected, how was data quality and accuracy of content verified, whether the data been used for other models – this information will impact data quality and subsequent model quality. Check if there is any reason for bias – such as missing data, inaccurate labelling, not including patients of a particular ethnicity. Even if presented with a trained model, these questions are still relevant.

The tech industry claims to have exhausted the global supply of quality training material for AI^28,29 and in certain areas of medicine, big data is inaccurate term – sparse data, fragmented data, poorly documented data is more accurate. Consequently, many advocate and are using synthetic data for training.^30,31 In this context, synthetic data can be defined as data generated by AI for the purpose of training new AI. Synthetic data addresses the challenges associated with using real world data for training – namely sparse data, fragmented data, data privacy regulations, data security, and resource constraints. Examples include - autonomous vehicle training, financial fraud detection models trained using synthetic data, and even health records have been generated for training AI to avoid using patient records. To what extent is this acceptable? Currently there is no regulation around the use of synthetic data, and it is in the interests of the tech industry to foster this practice – it is cheaper and faster to use synthetic data. Synthetic data contains all the biases and flaws of the AI which generated it and these negative qualities are transferred to the next AI. When considering synthetic data is it particularly important to understand how the data was generated, how it was validated, the quality assessment, how the data will be used and how model performance will be assessed. Example questions for each of these areas are presented in Table 3.

Table 3.

Questions to ask of synthetic data.

Category	Details
Data Generation Process	What method was used to generate the synthetic data? How closely does it mirror the statistical properties of real data? What assumptions were made during the generation process? Are there documented biases in the generation method?
Validation	How was the synthetic data validated against real data? What metrics were used to measure similarity? Have privacy-preserving properties been verified? What tests were performed to detect anomalies or artifacts?
Quality Assessment	Does the synthetic data maintain important relationships between variables? How well does it preserve rare but significant cases? Are the edge cases and outliers represented appropriately? What is the level of noise compared to real data?
Usage Context	Is the synthetic data appropriate for the specific use case? What are the known limitations of the dataset? How well does it generalize to different scenarios? Are there documented success cases with similar applications?
Model Performance	How do models trained on synthetic data perform compared to real data? What is the transfer performance to real-world applications? Are there specific tasks where the synthetic data performs poorly?

AI applications in clinical medicine

The progression of AI into healthcare applications has been predicated on quality data for training such as high-resolution medical imaging, biosensors with continuous output of physiologic metrics, -omics (epigenomics, genomics, transcriptomics, proteomics), and electronic medical records, as well as advancements in computer processing and cloud computing. At present AI processing power is greatest though cloud computing and enables moderately powerful devices with a good Internet connection to access the power of AI inferencing servers in a distant location.

AI has been applied in clinical decision support tools to improve diagnostic and prognostic accuracy. AI tools developed by Zebra Medical Vision and Aidoc are FDA-approved to triage or enhance radiological diagnoses in neurology, pulmonology and cardiology from X-rays, mammograms and CT scans. Google’s DeepMind for eye diseases operates under CE marking for European Union usage and is utilized at Moorfields Eye hospital to analyze optical coherence tomography (OCT) for the detection of macular disease. AI tools also have good potential for treatment optimization to make more cost-effective treatment decisions, optimization of resource utilization, and improvement in quality of life. For example, the AI-based Sepsis Immunoscore is FDA-approved to identify patients at risk of sepsis. The MiniMed 670G system is FDA-approved as the first hybrid closed-loop insulin delivery system. Watson for Oncology (WFO) is an artificial intelligence assistant decision system developed by IBM with Memorial Sloan Kettering Cancer Center (MSK) to recommend appropriate chemotherapy regimens for specific cancer patients.^32,33 While it is not FDA-approved, it is widely deployed internationally.

Uses of machine learning in clinical practice and research in neuromuscular and electrodiagnostic medicine

In neuromuscular and electrodiagnostic medicine, there are many opportunities to use ML - in analyzing clinical data gathered from subspecialty tests such as nerve conduction studies (NCS),^34–36 electromyography (EMG),^23,37–39 electrical impendence myography (EIM),^40,41 muscle MRI,^{19,25,42–47,139–142} nerve ultrasound,^129–132 muscle ultrasound,^22,37,48,140 muscle biopsies,^49–51 patient biorepository, and from electronic medical records (EMR).^{1,24,27,52–65,132–136}

EMG, NCS, neuromuscular ultrasound, muscle MRIs and muscle biopsies are sub-specialty tests which are routinely used by neuromuscular specialists in clinical practice for diagnosis and monitoring. It is well-known that EMG, which is currently manually performed and interpreted by trained electromyographers, is labor intensive and has significant inter-operator variability. The accuracy of analysis of the EMG waveform data depends heavily on the training and experience of the electromyographer. Consistency may become an issue when there is a need for repeat studies for monitoring or diagnosis, and the same electromyographer is not available to perform the study. ML tools have the potential to fill this gap by enhancing the accuracy and consistency of EMG.^23,38 Unlike complex and unstructured EMG data, which is a time-series signal with high frequency and changes in amplitudes, collected in various muscles decided by the electromyographer based on clinical acumen, at rest and on voluntary movement, routine NCS data is structured in the form of discrete values relating to parameters like amplitude and conduction velocity, usually collected from routine nerves in the upper and lower limbs. This lends itself well to using ML to automatically detect common conditions such as polyneuropathy³⁴ or entrapment neuropathies and ensure they are not missed out on diagnosis or reporting. Neuromuscular ultrasound is another tool used in clinical practice by trained neuromuscular sonographers in the diagnosis of nerve or muscle diseases. It is a newer subspecialty compared to NCS/EMG and there are fewer widely accepted guidelines on diagnostic parameters such as nerve cross sectional area (CSA) and diameter, and echogeneity of muscle and nerve fascicles. ML tools, which are already in clinical practice for other forms of imaging such as MRI and CT diagnosis for other diseases, can address this gap and provide clinical insights for neuromuscular specialists to develop diagnostic criteria for various neuropathies and myopathies. ML can automate the precise drawing of boundaries of nerves and muscles in images or videos for correct analysis,^46,66 and correctly identify common entrapment neuropathies.^{20–22,67,68} Muscle MRIs are generally used to distinguish between inherited and autoimmune muscle diseases and monitor disease activity; the use of ML to objectively screen and analyze multiple muscles may provide a practical and accurate way to diagnose and monitor muscle diseases which often affect the whole body, and eschew the need for muscle biopsies which are invasive and prone to sampling bias.^{19,42–44,141,142} None of the neuromuscular specialty tests currently have FDA-approved devices in clinical practice, but have shown promise in research.

ML can also aid clinicians in the diagnosis and monitoring, clinical prognostication, personalized treatment, classifying patients and understanding diseases mechanisms of neuromuscular diseases. Again, the field has not embraced these new technologies in clinical practice but research is promising. These are summarized in Figure 3 and elaborated in the sections below. Key studies are summarized in the Appendix.

Figure 3.

Machine learning applications in neuromuscular diseases.

ML can be used to improve diagnosis and monitoring of neuromuscular diseases

ML may be useful for prediagnosis (pre-screening patients), peridiagnosis (improving the accuracy and efficiency of diagnosis by real-time assistance at the time of diagnosis), and postdiagnosis (acting as quality control to detect diagnostic errors before patients are affected), and may complement clinician evaluations while reducing clinical workload. ML and clinician evaluation can be more accurate and efficient in combination.

Pre-diagnosis may be used to screen highest risk patients for further evaluation and management at a higher level of care. It can also identify new patient risk factors for clinicians to take note of. Supervised ML on clinical data has been used in Taiwan to detect patients admitted with myasthenia gravis who are more likely to have prolonged and severe disease and require intensive care interventions with high accuracies,^63,137 suggesting that these models have the potential to be developed into clinical predictive tools. In these studies, the presence of thymoma predicted ICU admission and prolonged hospitalization stay. Interestingly, the use of rescue treatments such as intravenous steroids and immunoglobulin (IVIG) was also associated with prolonged hospitalization stay. These models have to be validated in other international patient cohorts as the Asian myasthenia gravis disease phenotype may differ from patients in Europe or the United States. Supervised machine learning has also been used on ALS clinical data to predict survival and progression rate with good accuracy,^53,69,70,134 which opens avenues for patients to be screened remotely for inclusion in clinical trials. It also enables the enrolment in clinical trials of fast progressors or short survivors – patients who are most likely to show treatment response in the shortest period of time.

During diagnosis (peri-diagnosis), a ML model has potential to improve the accuracy or efficiency of diagnosis by assisting clinicians in real time to detect abnormalities more quickly and consistently. In diseases where early diagnosis is challenging due to heterogeneous clinical presentations and etiology, and routine clinical diagnosis requires clinical history and examination by a neuromuscular specialist, usually accompanied by electrophysiological tests, supervised ML can facilitate early diagnosis and access to treatment, by the automatic analysis of routine data ranging from clinical to blood analytes to electrophysiological to imaging either alone or in combination.^{57,60,70,71,132} It can also distinguish between subsets of similar diseases to optimize clinical management – supervised ML has been used to distinguish between ALS and lower motor neuron disease using patient blood analytes with good accuracy, and identified immunological markers as important discriminators of these two diseases,¹³² which suggests that they should be managed as two diseases and not two spectrums of one disease as these diseases are commonly viewed.

Genomics, transcriptomics, proteomics and metabolomics, and molecular networks, have also been used as data for ML, which give insights into the molecular complexity of diseases and propel the discovery of network biomarkers and new therapeutic targets.^{51,54,55,58,64,134} Supervised ML identified diagnostic biomarkers with good accuracy from proteomics performed in patient-derived stem-cell differentiated motor neurons,¹³⁴ ALS-related genes taken from brain biopsies⁵⁵ and RNA taken from post-partum samples.⁵⁸ Although patient sample sizes were invariably small, as expected for rare neuromuscular diseases, with some datasets being imbalanced with much more or fewer disease cases,⁵⁸ it is promising that ML was able to extract important disease associated features even from highly heterogenous -omics data.

ML image analysis lends itself well to peri-diagnosis and post-diagnosis. Ultrasound has been used to assess for the common entrapment neuropathy, CTS, through the evaluation of median nerve morphology, but operator dependency and lack of standard protocols limit its widespread use. Supervised ML can be used in ultrasound analysis to detect CTS^{20,21,36,66,67} by detecting median nerve cross sectional area²¹ – commonly used in clinical practice, or other variables not commonly used in clinical practice such as the volume,⁶⁷ echogenicity and thickness of the epineurium and surrounding tissues.²⁰ The median nerve can be effectively delineated by ML,⁷⁰ with automatic cross sectional area measurement aligning well with manual measurement, implying the potential of reducing dependence on sonographer and catching human errors. Supervised ML models have identified variables in predicting CTS severity,³⁶ which can assist clinical decision making about surgical decompression.

Muscle imaging, together with detailed clinical examination and muscle biopsy, is one of the main tools for deep phenotyping and diagnosis of neuromuscular disorders. It can give clues to the underlying pathogenicity of variants of unknown significance and facilitate diagnosis in cases with inconclusive genomics, but again is limited by clinician/operator experience and hardware, and it is time and labor-intensive to analyze large amounts of imaging data.^68,140 Limitations with visual image interpretation by human operators can be mitigated with ML. Supervised ML has been used to support quantitative whole-body MRI analysis for diagnosis of myopathies,⁴² sonographic diagnosis of inherited and acquired myopathies³⁷ MRI analysis of muscle involvement patterns and muscle imaging texture to distinguish between inherited and acquired myopathies such as muscular dystrophies, congenital myopathies and idiopathic inflammatory myopathies,^{42,44,139,141,142} and to correlate with disease duration and disability.^42,71,139 In some studies, disease rarity contributed to a low sample size,³⁷ which did not cover all the various stages of disease, or few myopathies were covered, which compromised the practical utility of the model unless the clinical suspicion of the particular rare myopathy was already high.⁴² Other studies were limited by the data – when images of only certain muscles⁴² or parts of muscles were available.¹⁴¹ Accuracies ranged from moderate to good, suggesting potential for improvement by including more homogenous and whole body data from more patients across international neuromuscular centres. Interestingly, one study used unsupervised ML on MRI to discriminate between muscles with fatty replacement, edema or neither in patients with STIM1 tubular aggregate myopathy with good accuracy,²⁵ and the model was able to identify alterations in muscle classified as normal by human operators – suggesting good potential in post-diagnosis, to reduce human error.

Supervised ML has been used with EIM to diagnose motor neuron diseases and muscular dystrophies as well as to correlate with muscle mass,^40,41,71 and to characterize EMG signals that would be helpful for diagnosis.^23,38 In addition, it has potential in automating the analysis and reporting of routine NCS to save time, for common conditions such as diabetic polyneuropathy.³⁴

ML may be used to improve prognostication

There is potential for ML in the design of clinical decision support systems (CDSS), to assist clinicians with counselling and decision-making based on prior successful diagnoses, treatment, and prognostication.⁴ ML has the advantage of providing results without variability from fatigue or environmental factors and having the ability to exhaustively review every part of the data every time. Such models may be used to stratify patients in clinical trials, predict individual patient disease trajectories and drug responses, and predict quality of life and caregiver burden. Rare diseases, however, are limited by already small patient databases and sparse longitudinal data may be further affected by selection and attrition bias, which makes it challenging to accurately analyze the disease course of heterogenous clinical populations.

In ALS and spinal muscular atrophy (SMA), disease progression may be complex and non linear.^65,71–73 This poses challenges to clinicians, patients and caregivers when trying to make plans - preparing for disability, loss of income, expensive personalized disease-modifying treatment and care plans, and death. As these diseases have considerable clinical and biological heterogeneity, patient stratification is also necessary to enrol a homogenous patient population in clinical trials. Many models of ALS progression have used supervised ML to predict survival, weight change, and progression rate, with the most common predictors consistently being age, ALS Functional Rating Scale (ALSFRS) score, site of onset and disease duration.^60,69,70 However, only the European Network for the Cure of ALS (ENCALS) model has been relatively reliable⁷⁴ and is available to registered clinicians as a personalized prediction tool, which suggests that future research needs to pay attention to methodological pitfalls and external validation in ML models. Clinical laboratory predictors, such as creatinine, creatine kinase, and phosphorus, have been identified as potential biomarkers of ALS progression and clinical outcome.^69,72,75 Novel prognostic and monitoring biomarkers from genomics, transcriptomics, proteomics, lipidomics to serum metabolomics from patient-derived tissues have likewise been identified by supervised and unsupervised ML and may additionally shed light on pathological mechanisms relevant to both pre-symptomatic and symptomatic phenotypes of disease.^{52,54,56,59,62,64,132,134} Unsupervised ML approaches can cluster ALS patients in an unbiased manner to allow clinicians to draw insights from clinical progression patterns⁷³ or gene expression²⁶ to predict prognosis.

SMA treatment has been revolutionized by innovative survival motor neuron (SMN) protein repleting disease modifying drugs which are however extremely expensive.⁷⁶ Being able to predict individual trajectories can help shed light on clinical efficacies and durability of such drugs for particular patient subsets and create personalized treatment plans – when to start or stop a treatment, and whether to combine or switch therapies as children with SMA grow.^65,71 In ALS, the clinical trials on edaravone used cohort enrichment to select for patients most likely to show progression to elucidate treatment effects. Interestingly, ML was used to show a statistically significant treatment effect in a cohort of participants with broader disease characteristics than the inclusion criteria, to mimic real-world clinical practice where edaravone is administered to all ALS patients.⁷⁷ Supervised and unsupervised ML have been used on multi-modal clinical, laboratory, electrophysiological and imaging data to predict response to edaravone treatment in ALS,⁷⁷ pulse corticosteroid treatment in chronic inflammatory demyelinating polyradiculoneuropathy (CIDP)¹³⁶ and intravenous immunoglobulin (IVIG) in inflammatory myopathies,⁷⁸ as well as to repurpose drugs for neuromuscular diseases.⁵⁰ Such research tools require validation in real-world settings to confirm their accuracies across racially heterogenous populations in different clinical centers, before they can be appropriately deployed to support clinicians to select treatments based on anticipated clinical response.

For diseases like ALS for which there is no cure and current therapies can only modestly reduce the rate of disease progression, the focus is on managing expectations and maintaining quality of life for patients and their caregivers.⁷⁹ ML models have been used to identify predictors of quality of life and caregiver burden in ALS^1,80 in order to personalize support for patients and their caregivers. Using supervised ML, a CDSS was designed to notify clinicians when an individual with ALS is experiencing low quality of life so that support measures can be extended to them.⁸¹ Supervised ML also suggested that implementing telehealth-based interventions can help with caregiver burden.⁸⁰

ML may be used to classify patients in new ways and shed light on underlying molecular mechanisms of neuromuscular diseases

The way clinicians classify patient subtypes based on clinical findings and diagnostic tests may not meaningfully identify patient subgroups and instead represent human constructs applied to the data based on empirical observations. ALS patients may be classified as fast progressors or slow progressors based on their clinical phenotype, however among the fast progressors there may still be a number of different biological processes underlying pathology, which brings biological heterogeneity to patients of the same clinical phenotype.¹³⁴ In an unbiased manner, patients could instead be grouped by affected biological pathways followed by clinical phenotyping of the biological groups to correlate molecular mechanisms with disease progression and identify patients most likely to benefit from drugs that target those affected biological pathways.⁸² The ability to determine the correct number and nature of subgroups would aid in understanding the disease and support clinical care and clinical trial design.

Many newly designed therapeutic strategies undergoing clinical trials, such as antisense oligonucleotides (ASO), target pathology associated with specific genetic causes, such as SOD1 and C9orf72 ALS.^83,84 In one study, unsupervised ML was used to identify ALS subtypes in deeply-phenotyped, population-based collections of patients from Italy, then supervised ML was used to build predictor models that could accurately classify individual patients.²⁴ Although the ML models were robust with usage of independent development and validation datasets, and the clinical parameters used were standard across the ALS field, all data originated from the Northern Italian population, and studies in other countries are required to test the models’ generalizability.

While identifying specific genetic-phenotype correlations identifies druggable targets for specific groups of patients and responsive genetic subtypes of patients from clinical trials, identification of common alterations across the ALS spectrum improves understanding of the mechanisms underlying common neurodegenerative pathways and opens new therapeutic scenarios for large portions of patients. In one study, ML was used to integrate layers of biological information from transcriptomics and deep-sequencing analysis. ML was used to analyze whole blood and spinal cord transcriptomes of ALS patients to identify top predictor gene sets, as well as data from whole genome sequencing of samples from the AnswerALS database to identify signatures that could discriminate between ALS and controls samples.⁸⁵ This study showed that ALS significantly correlates with ageing and DNA damage signatures and provides insights into how different genetic mutations and divergent molecular mechanisms can converge over time into a singular presentation of ALS.

Machine learning may be used for drug discovery

ML may be used in combination with computational methods in high-throughput drug screening (HTDS) to test large libraries of compounds quickly and at low cost and aid lead compound optimisation.^86,87 Using computational biology, millions of drugs can be tested for a predicted effect, such as binding a particular molecular target, and for other virtual parameters such as central nervous system penetrance, off-target effects, and pharmacokinetics/pharmacodynamics. One such study screened 1.5 million compounds for binding SOD1 in a way that can stabilize its dimer form, which is postulated to be neuroprotective in familial ALS, and found fifteen possible leads which also prevented SOD1 aggregation in lab-based assays.⁸⁸ To reduce off-target effects, the same group screened another 2.2 million compounds which could dock specific SOD1 sites with limited off-target binding.⁸⁹ ML approaches like Alphafold (AF), a 3D geometry modelling algorithm, can predict 3D protein confirmation at high accuracies and precisely estimate protein interactions.⁹⁰ Supervised and unsupervised ML has been used to combine -omics data analysis to predict drug properties, such as with PandaOmics, a cloud-based software platform that applies ML to multimodal -omics for therapeutic target and biomarker discovery.⁹¹ PandaOmics has been applied on expression patterns of central nervous system samples and patient stem-cell derived motor neurons to discover novel therapeutic targets.⁹¹ The acceleration of drug discovery and development by PandaOmics can be seen in the development of a new ALS drug which underwent AI-assisted drug discovery, FB1006.⁹² It took less than two years from target identification to the completion of investigated-initiated clinical trial enrolment. Other AI-driven ways to improve drug discovery and development include incorporating ML in molecular simulations, de novo drug design, drug repurposing, prediction of drug-target interactions⁹³ and synthesis of new molecules (synthesis pathway generator).⁹⁴

Pitfalls of machine learning

A summary of potential barriers to the adoption of machine learning is in Table 4. Of these, some points will be emphasized. To “first do no harm”, clinicians need to be aware of challenges which may limit their usage of ML, and implications on medical malpractice liability. Some ML models supply recommendations without directly explaining the underlying reasons for those results because of the “black box” of parameters which the model automatically derives and uses.⁹⁵ The opacity of “black box” medicine has implications on medical malpractice liability – how would liability apply to clinicians who are unable to understand the underlying mechanisms of the ML algorithms which recommend patient treatments?⁹⁶ Should they themselves understand how ML algorithms are developed and verified before using them, or is it sufficient to rely on the assurances of the technology developer? As it would not be practical to expect clinicians to be experts in ML, this primer serves to educate clinicians in the basics of ML so that they are equipped to evaluate the expertise of the developer.

Table 4.

Potential barriers to adoption of ML.

Category	Challenges
Data	Data scarcity, bias within training data, heterogeneity, and issues with rare disorders
Model Interpretability	Black-box nature and lack of explainable ML
Clinical Integration	Workflow disruption, software interoperability and integration issues with hospital systems, and training requirements for using AI safely and effectively within clinical practice.
Regulation	Validation in real-world settings and lengthy regulatory approval processes
Ethics	Patient data privacy, fairness, bias, and accountability for ML-driven management decisions
Technical	Generalization, noise in neuromuscular data, and multimodal data integration
Adoption	Clinical skepticism, resistance to change
Economic	High costs, infrastructure requirements, and global disparities

There are ways to make ML models more transparent and allow clinicians to understand how inputs influence predictions, such as by highlighting the region in a medical image the model uses to make predictions, so it is apparent what the predictions are based on. Such “interpretable AI” models can make predictions understandable and traceable and allow the clinician experts to manually modify mistaken concepts in the ML model’s decision-making pipeline. This can enhance clinical adoption, regulatory compliance, and informed decision-making.⁹⁷

With big data, there are risks of publishing false positive research findings,⁹⁸ particularly where studies conducted in a field are smaller - as might be the case in rare neuromuscular disorders, or when effect sizes are smaller – such as with heterogenous clinical populations in ALS, and when there is a greater number of tested relationships – for example using ML to test many parameters in a small number of patients. Clinicians need to be aware that while ML can accelerate the generation of hypotheses from multi-omics data, any single study merely provides a partial picture which can only be properly understood in the context of more testing outside this study and in the relevant field.

Overfitting (explained previously) and training on incomplete data may limit generalizability to the appropriate patient population. Data quality (explained previously) is critical for the performance, generalization, and trustworthiness of machine learning models, as poor-quality data can lead to inaccurate predictions and inefficiencies. Tools such as PROBAST can be used to assess the risk of bias and diagnostic or prognostic prediction model studies.⁹⁹

Risk mitigation

Clinicians need to know how to evaluate an ML model for clinical practice and research.

First, there should be independent comparison to an appropriate reference or gold standard. Gold standard is usually expert opinion on the diagnosis or prognosis of disease, or gold standard diagnostic tests.

Second, data should be from patients to whom the diagnostic test will be applied in clinical practice. If data is derived only from clinical trials, for example the ProACT database which amalgamates clinical data from ALS clinical trials,^{62,77,100,134} then this dataset may be biased and therefore should undergo validation in the general clinic patient population.^{24,56,59,81,133}

Third, data from the training and validation sets should be independent from the testing set. As in all studies, the methods for obtaining the datasets and the procedure of analysis should be described in sufficient detail to be reproduced.

Fourth, results should be reported in the form of performance metrics as described in Table 1.

Fifth, the ML results should be repeatable and reproducible in the patient population of interest, and factors that may affect reproducibility in different institutions with different hardware and clinical practitioners should be considered.

Importantly, the clinician needs to be aware of institutional mechanisms and international guidelines on assessing validity of ML models and making direct comparisons of ML models.

Ethical use of ML models

ML models may amplify societal biases and exacerbate healthcare inequalities by underperforming in groups that are already disadvantaged by factors such as race, gender and socioeconomic background.¹⁰¹ Patient privacy is also a concern, as recent public–private partnerships that collaborate on AI may have poorly protected privacy, and highly sophisticated algorithmic systems may cause data breaches even in anonymized datasets.¹⁰²

The international agreement on key principles for AI in healthcare currently include control of bias, explainability, transparency, systems of oversight and validation, and is reflected in European Union (EU)¹⁰³ and US regulatory frameworks.¹⁰⁴ Developers of AI technologies in healthcare should provide evidence of safety and validation which is approved by regulators.

Regulatory considerations

Regulators are rarely popular when there is a new technology that does not quite fit into existing categories. Some authors advocate regulatory guidelines on how to safely implement and assess AI, and understand the specific capabilities and limitations of its medical use.¹⁰⁵ However, it is more true to say that regulators such as the U.S. Food and Drug Administration (FDA) have the responsibility to ensure high standards of therapeutics and medical devices in the interest of projecting public health and safety¹⁰⁶ and while the FDA does produce guidelines, they are typically to explain their current thinking to guide those applying to them for marketing approval. Guidelines on how to implement approved products is not their responsibility.

The FDA classifies AI software as “Clinical Decision Support (CDS) software. If a CDS software is classified as a medical device, it requires FDA marketing approval. However, a CDS software may be excluded from their definition of a medical device and may be classified as a Non-Device CDS if it meets the following criteria:¹⁰⁴

The software function does NOT acquire, process, or analyze medical images, signals, or patterns.

The software function displays, analyzes, or prints medical information normally communicated between health care professionals (HCPs).

The software function provides recommendations (information/options) to a HCP rather than provide a specific output or directive.

The software function provides the basis of the recommendations so that the HCP does not rely primarily on any recommendations to make a decision.

Any software which provides a diagnosis or categorizes a patient's risk based on processing clinical information is likely to be classified as a medical device. Despite this stance, the FDA has approved over 950 medical devices with artificial intelligence features between 1995 and the end of 2024.¹⁰⁷ This however has been inadequate for many industry developers who feel it is inappropriate for AI software to be considered with the same rigid parameters as a traditional medical device. Additionally, contention has arisen around requirements that software be unchanged following approval – something which is reasonable for most hardware and software devices but would prevent an AI system from adjusting its underlying machine learning model weights to adapt to local conditions. Altering the ML model weights will result in inconsistent outputs, and it is not unreasonable to expect that a device produces the same outcome from the same inputs consistently.

In response to this, in June 2024, the FDA published some “guiding principles” for ML-enabled medical devices intended to support the development of “safe, effective and high-quality artificial intelligence/machine learning technologies that can learn from real-world use and, in some cases, improve device performance”¹⁰⁸ intended to address concerns that AI-enabled software did not fit well into the existing paradigm of medical device approval. This was followed by guidance in December 2024 which indicated that for software where modifications to the AI model are implemented automatically by software, also known as “continuous learning”, manufacturers can use a Predetermined Change Control Plan (PCCP) to prospectively specify and seek premarket authorisation for modifications to an AI-enabled device software function (AI-DSF).¹⁰⁹ This appears to address prior criticisms that AI software that continues to learn or adjust its ML model weights would require repeated FDA submissions and approvals. The use of a PCCP appears to allow modifications to be implemented to an AI-DSF without triggering the need for a new marketing submission. Premarket authorization for an AI-DSF with a PCCP must be established through the either the PMA pathway, 510(k) pathway, or De Novo pathway, as appropriate, as a PCCP must be reviewed and established as part of a marketing authorization for a device prior to a manufacturer implementing any modifications under that PCCP.

A draft guidance issued the next month on January 7, 2025 entitled “Artificial Intelligence-Enabled Device Software Functions: Lifecycle 3 Management and Marketing Submission Recommendations”¹¹⁰ requests vendors provide detailed information about the technical characteristics of the underlying AI model(s) themselves and the algorithms and methods that were used in their development. This includes optimization methods, training paradigms (e.g., supervised, unsupervised or semi-supervised learning, federated learning, active learning); regularization techniques employed; training hyperparameters; and summary training performance.

The FDA has emphasized the importance of transparency in AI model training and operations, and the need to identify and reduce bias in AI models. Bias is defined as “a potential tendency to produce incorrect results in a systematic, but sometimes unforeseeable way, which can impact safety and effectiveness of the device within all or a subset of the intended use population (e.g., different healthcare settings, different input devices, sex, age, etc.,).¹¹⁰ To guard against bias, the FDA recommends that companies “addressing representativeness in data collection for development, testing, and monitoring throughout the product lifecycle, as well as evaluating performance across subgroups of intended use.”. This will entail collecting evidence to evaluate whether a device benefits all relevant demographic groups similarly.

These publications issued in the final days of the FDA under the Biden Administration, and the discussions with industry that surround it, indicate a desire by the current FDA to develop guidance and resources for a total product life cycle approach to the oversight of AI-enabled devices. Stakeholders will naturally be observing carefully how this regulatory position develops under the new administration.

Future trends

In an ideal future, ML will impact positively upon a neuromuscular clinician’s practice, enhancing diagnostics, improving treatment, and driving research among other application areas summarized in Table 5. What is more likely though is that change will be led by the progression of AI in consumer technology and neurology is at risk of lagging behind other specialties which have historically been faster to embrace technology and devices, such as radiology and surgery.

Table 5.

AI/ML application areas.

Domain	Examples
Administration	Documentation automation, speech transcription, consultation summaries, resource allocation
Diagnostics	Neuroimaging, EMG signal analysis, early disease detection
Drug Discovery	AI-assisted drug development, clinical trial optimization
Education	AI-based specialists training tools, ethical frameworks
Monitoring	Wearables, telemedicine, digital biomarkers
Patient Engagement	Chatbots, cognitive training tools, mental health apps
Rehabilitation	Robotic therapy, brain-computer interface therapy, gamified rehabilitation
Research	Genomics, proteomics, epidemiological insights
Risk Assessment	Disease risk prediction, prognostic modeling
Surgical Assistance	Robotic surgery, preoperative planning
Treatment Personalization	Precision medicine, predictive analytics

There are some obvious trends, such as the impact of generative AI. Large language models like ChatGPT and Foundation Models have already pervaded consumer technology. It is likely that agentic workflows, AI which operates mostly without supervision to execute a pre-defined role or function, will play a role in clinical workplaces of the future, for tasks which do not require FDA medical device approval such as document summarization, research topic monitoring and summarization, drug-drug interaction alerts based on literature mining, and patient education.

In the introduction, the known shortcomings of current generative AI, in particular its lack of true understanding of material and tendency to confabulate (“hallucinations”) were mentioned. AI-driven CDSS will benefit significantly from the introduction of logic and reasoning, and causality.¹¹¹ Discussing the research in automated reasoning^112,113 is out of scope of this primer but is essential for AI to move to the next stage of evolution. Today, cloud-computing vendors such as Amazon are offering automated reasoning checks as a service to prevent factual errors from LLM hallucinations.¹¹⁴ Error checking such as this may become mandatory for clinical software in the future.

With respect to early disease identification and tracking, perhaps ambient AI models,¹¹⁵ which use information from contactless sensors in physical spaces such as the clinic or at home, can be used to analyze voice recordings, physical movement, and gait, and become approved for use in tracking clinical progression of motor neuron diseases such as ALS and SMA, or chronic muscular dystrophies, at the clinic or at home. Similarly, ambient AI models may be used to monitor and prevent impending respiratory failure of inpatients with myasthenia gravis or CIDP in the general wards and high dependency unit.

CDDS based on multi-modal ML models with neuromuscular ultrasound or MRI imaging and NCS/EMG data may become commonplace and automate routine diagnoses, screen for urgent conditions, distinguish between neuromuscular diseases and mimickers, anticipate progression, and predict response to therapies like corticosteroids or biologics.

Collaboration between clinicians and data scientists would be key to making ML useful for healthcare and medical research. Standardization and open sourcing of ML models by college boards and tertiary referral centers, and sharing of patient databases, could provide low-resource hospitals the means to improve the quality of their care.

Recent FDA guidance documents indicate an understanding of the significance of AI in clinical decision support but also a desire for transparency in how AI is trained and acute awareness of the risks of bias. It is likely there will be a strong preference in the clinic for ML models where output predictions are explainable and trustworthy. This may necessitate modifications to the training process or more detailed documentation in how training is conducted, and bias is detected and managed. Safety features to detect model performance drift and degradation are likely to also be necessary. A future where AI are also required to maintain continuing education certification is certainly plausible.

The use of cloud-based AI services is associated with data privacy and data transmission latency issues. The future will most likely see institutions use a mix of cloud-based AI, and AI developed to run using only local computing resources, so-called ‘inferencing at the edge’.^83,84 Inferencing at the edge will allow hospitals and clinics to incorporate ML/AI tools into their workflow while still maintaining compliance with best practices in patient data privacy and data security.

Conclusion

ML underlies applications and devices which are assisting clinicians in the diagnosis, monitoring, formulation of prognosis, and treatment of patients with a spectrum of neuromuscular diseases. The prevalence of this will only grow further. However, how future practitioners of clinical medicine will look back upon this period of transition will be determined not by how much changed or by how fast clinicians embraced this change but by how much patient outcomes were improved and how safeguards are applied to prevent harm to patients when using these new technologies. These are key metrices which differentiate clinical practice from the other industries. For this to occur the same degree of rigor must be applied to AI as it is to any new clinical practice, and this must be done with understanding of the impact but also the underlying science. Clinicians should be able to assess the validity and impact of AI applications just as they do for other diagnostic or prognostic tools, and to do this requires some degree of understanding of machine learning processes. Understanding the relevance to clinical practice and research in neuromuscular and electrodiagnostic medicine alone may not be sufficient.

The history of medicine has a number of examples where new technologies, rules and tools actually have - looking back – have not improved patient outcomes but even caused harm. Concerns about AI in general, which have already manifest in some form include AI-driven automation resulting in loss of jobs, the spread of fake news, invasion of privacy, and AI-powered weaponry. In healthcare, failures in medical AI – breaches in patient confidentiality, erroneous medical evaluations, racial bias due to flawed training and limitations of current machine learning, could erode public trust in healthcare. Both technical and humanistic challenges regarding the social and ethical implications of AI must be discussed at each stage of clinical trial and implementation.

The tech industry grapples with ethical issues only when forced to and is unlikely to ever accept responsibility for the negative impacts of AI or the casualties or improperly-used AI. It therefore falls to the clinician to ensure their understanding of machine learning is of sufficient breadth and depth to be able to assess the specific challenges and ethics surrounding the medical use of these powerful technologies, in a fair and balanced manner.

Supplemental Material

sj-docx-1-jnd-10.1177_22143602251329240 - Supplemental material for A neuromuscular clinician’s primer on machine learning

Supplemental material, sj-docx-1-jnd-10.1177_22143602251329240 for A neuromuscular clinician’s primer on machine learning by Crystal Jing Jing Yeo, Savitha Ramasamy, F Joel Leong, Sonakshi Nag and Zachary Simmons in Journal of Neuromuscular Diseases

Footnotes

Funding

The authors report no financial disclosures and conflicts of interest. We confirm that we have read the Journal’s position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.

Supplemental material

Supplemental material for this article is available online.

References

Antoniadi

Galvin

Heverin

, et al. A Clinical Decision Support System for the Prediction of Quality of Life in ALS. J Pers Med 2022; 12: 435.

Khaliq

Oberhauser

Wakhloo

, et al. Decoding degeneration: the implementation of machine learning for clinical detection of neurodegenerative disorders. Neural Regen Res 2022; 18: 1235–1242.

Khosravi

Zare

Mojtabaeian

, et al. Artificial Intelligence and Decision-Making in Healthcare: a Thematic Analysis of a Systematic Review of Reviews. Health Serv Res Manag Epidemiol 2024; 11: 23333928241234863.

Lysaght

Lim

Xafis

, et al. AI-Assisted Decision-making in Healthcare: The Application of an Ethics Framework for Big Data in Health and Research. Asian Bioeth Rev 2019; 11: 299–314.

Weissman

. FDA Regulation of Predictive Clinical Decision-Support Tools: What Does It Mean for Hospitals? J Hosp Med 2021; 16: 244–246.

Emanuel

Wachter

. Artificial Intelligence in Health Care: Will the Value Match the Hype? JAMA 2019; 321: 2281–2282.

Nodera

Osaki

Yamazaki

, et al. Deep learning for waveform identification of resting needle electromyography signals. Clin Neurophysiol Off J Int Fed Clin Neurophysiol 2019; 130: 617–623.

Lee

Bubeck

Petro

. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med 2023; 388: 1233–1239.

Rajpurkar

Lungren

. The Current and Future State of AI Interpretation of Medical Images. N Engl J Med 2023; 388: 1981–1990.

10.

Haug

Drazen

. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N Engl J Med 2023; 388: 1201–1208.

11.

Beam

Drazen

Kohane

, et al. Artificial Intelligence in Medicine. N Engl J Med 2023; 388: 1220–1221.

12.

Carini

Seyhan

. Tribulations and future opportunities for artificial intelligence in precision medicine. J Transl Med 2024; 22: 411.

13.

LeCun

Bengio

Hinton

. Deep learning. Nature 2015; 521: 436–444.

14.

ChatGPT LLM – OpenAI. https://openai.com/index/chatgpt/ (accessed 8 January 2025).

15.

Claude LLM – Anthropic. https://www.anthropic.com/claude (accessed 8 January 2025).

16.

Llama LLM - Meta. Meta Llama, https://www.llama.com/ (accessed 8 January 2025).

17.

Vaswani

Shazeer

Parmar

, et al. Attention is All you Need. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed 8 January 2025) 2017.

18.

Bender

Gebru

McMillan-Major

, et al.

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , New York, NY, USA, pp.610–623: Association for Computing Machinery.

19.

Yang

Zheng

Xie

, et al. A deep learning model for diagnosing dystrophinopathies on thigh muscle MRI images. BMC Neurol 2021; 21: 13.

20.

Shinohara

Inui

Mifune

, et al. Using deep learning for ultrasound images to diagnose carpal tunnel syndrome with high accuracy. Ultrasound Med Biol 2022; 48: 2052–2059.

21.

Di Cosmo

Fiorentino

Villani

, et al. A deep learning approach to median nerve evaluation in ultrasound images of carpal tunnel inlet. Med Biol Eng Comput 2022; 60: 3255–3264.

22.

Burlina

Billings

Joshi

, et al. Automated diagnosis of myositis from muscle ultrasound: Exploring the use of machine learning and deep learning methods. PloS One 2017; 12: e0184059.

23.

Nodera

Osaki

Yamazaki

, et al. Classification of needle-EMG resting potentials by machine learning. Muscle Nerve 2019; 59: 224–228.

24.

Faghri

Brunn

Dadu

, et al. Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study. Lancet Digit Health 2022; 4: e359–e369.

25.

Lupi

Spolaor

Favero

, et al. Muscle magnetic resonance characterization of STIM1 tubular aggregate myopathy using unsupervised learning. PloS One 2023; 18: e0285422.

26.

Cong

Shintani

Imanari

, et al. A New Approach to Drug Repurposing with Two-Stage Prediction, Machine Learning, and Unsupervised Clustering of Gene Expression. Omics J Integr Biol 2022; 26: 339–347.

27.

Placek

Benatar

Wuu

, et al. Machine learning suggests polygenic risk for cognitive dysfunction in amyotrophic lateral sclerosis. EMBO Mol Med 2021; 13: e12595.

28.

Yann LeCun [@ylecun]. Sources of reliable data are getting exhausted. The cost of manual ‘post-training’ is growing quickly. Yet, the performances on bemchmarks are clearly saturating. So no, Auto-Regressive LLMs in their current form will not take us to human-level AI. That doesn’t mean they are not. Twitter, https://x.com/ylecun/status/1829972956518166881 (accessed 10 January 2025), 2024.

29.

Wiggers

. Elon Musk agrees that we’ve exhausted AI training data. TechCrunch, https://techcrunch.com/2025/01/08/elon-musk-agrees-that-weve-exhausted-ai-training-data/ (accessed 10 January 2025), 2025.

30.

Gonzales

Guruswamy

Smith

. Synthetic data in health care: A narrative review. PLOS Digit Health 2023; 2: e0000082.

31.

Giuffrè

Shung

. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023; 6: 186.

32.

Jie

Zhiying

. A meta-analysis of Watson for Oncology in clinical application. Sci Rep 2021; 11: 5792.

33.

Cavallo

Confronting the Criticisms Facing Watson for Oncology, https://ascopost.com/issues/september-10-2019/confronting-the-criticisms-facing-watson-for-oncology/ (accessed 8 January 2025), 2019.

34.

Haque

Reaz

MBI

Chowdhury

MEH

, et al. Performance Analysis of Conventional Machine Learning Algorithms for Diabetic Sensorimotor Polyneuropathy Severity Classification Using Nerve Conduction Studies. Comput Intell Neurosci 2022; 2022: 9690940.

35.

Hernandez-Torruco

Canul-Reich

Frausto-Solis

, et al. Towards a predictive model for Guillain-Barré syndrome. Annu Int Conf IEEE Eng Med Biol Soc IEEE Eng Med Biol Soc Annu Int Conf 2015; 2015: 7234–7237.

36.

Park

Kim

Lee

S-E

, et al. Machine learning-based approach for disease severity classification of carpal tunnel syndrome. Sci Rep 2021; 11: 17464.

37.

Nodera

Sogawa

Takamatsu

, et al. Texture analysis of sonographic muscle images can distinguish myopathic conditions. J Med Investig JMI 2019; 66: 237–247.

38.

Yousefi

Hamilton-Wright

. Characterizing EMG data using machine-learning tools. Comput Biol Med 2014; 51: 1–13.

39.

Uncini

Aretusi

Manganelli

, et al. Electrodiagnostic accuracy in polyneuropathies: supervised learning algorithms as a tool for practitioners. Neurol Sci Off J Ital Neurol Soc Ital Soc Clin Neurophysiol 2020; 41: 3719–3727.

40.

Pandeya

Nagy

Riveros

, et al. Using machine learning algorithms to enhance the diagnostic performance of electrical impedance myography. Muscle Nerve 2022; 66: 354–361.

41.

Cheng

K-S

Y-L

Kuo

L-C

, et al. Muscle Mass Measurement Using Machine Learning Algorithms with Electrical Impedance Myography. Sensors 2022; 22: 3087.

42.

Fabry

Mamalet

Laforet

, et al. A deep learning tool without muscle-by-muscle grading to differentiate myositis from facio-scapulo-humeral dystrophy using MRI. Diagn Interv Imaging 2022; 103: 353–359.

43.

Morrow

Sormani

. Machine learning outperforms human experts in MRI pattern analysis of muscular dystrophies. Neurology 2020; 94: 421–422.

44.

Felisaz

Colelli

Ballante

, et al. Texture analysis and machine learning to predict water T2 and fat fraction from non-quantitative MRI of thigh muscles in Facioscapulohumeral muscular dystrophy. Eur J Radiol 2021; 134: 109460.

45.

Monforte

Bortolani

Torchia

, et al. Diagnostic magnetic resonance imaging biomarkers for facioscapulohumeral muscular dystrophy identified by machine learning. J Neurol 2022; 269: 2055–2063.

46.

Wang

Zhou

Hou

, et al. Assessment of idiopathic inflammatory myopathy using a deep learning method for muscle T2 mapping segmentation. Eur Radiol 2023; 33: 2350–2357.

47.

GóMez-Andrés

Díaz-Manera

Alejaldre

, et al. Muscle imaging in laminopathies: Synthesis study identifies meaningful muscles for follow-up. Muscle Nerve 2018; 58: 812–817.

48.

Srivastava

Darras

, et al. Machine learning algorithms to classify spinal muscular atrophy subtypes. Neurology 2012; 79: 358–364.

49.

Pinal-Fernandez

Casal-Dominguez

Derfoul

, et al. Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis. Ann Rheum Dis 2020; 79: 1234–1242.

50.

Cong

Shintani

Imanari

, et al. A New Approach to Drug Repurposing with Two-Stage Prediction, Machine Learning, and Unsupervised Clustering of Gene Expression. Omics J Integr Biol 2022; 26: 339–347.

51.

Amici

Pinal-Fernandez

Christopher-Stine

, et al. A network of core and subtype-specific gene expression programs in myositis. Acta Neuropathol (Berl) 2021; 142: 887–898.

52.

Bean

Al-Chalabi

Dobson

RJB

, et al. A Knowledge-Based Machine Learning Approach to Gene Prioritisation in Amyotrophic Lateral Sclerosis. Genes 2020; 11: 668.

53.

Beaulieu

Berry

Paganoni

, et al. Development and validation of a machine-learning ALS survival model lacking vital capacity (VC-Free) for use in clinical trials during the COVID-19 pandemic. Amyotroph Lateral Scler Front Degener 2021; 22: 22–32.

54.

Das

Kaur

Gour

, et al. Intersection of network medicine and machine learning towards investigating the key biomarkers and pathways underlying amyotrophic lateral sclerosis: a systematic review. Brief Bioinform 2022; 23: bbac442.

55.

Founta

Dafou

Kanata

, et al. Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol Med Camb Mass 2023; 29: 12.

56.

Goutman

Boss

Guo

, et al. Untargeted metabolomics yields insight into ALS disease mechanisms. J Neurol Neurosurg Psychiatry 2020; 91: 1329–1338.

57.

Imamura

Yada

Izumi

, et al. Prediction Model of Amyotrophic Lateral Sclerosis by Deep Learning with Patient Induced Pluripotent Stem Cells. Ann Neurol 2021; 89: 1226–1233.

58.

Karim

West

, et al. Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values. Genes 2021; 12: 1754.

59.

Lee

Stingone

Chan

, et al. Utilizing machine learning and lipidomics to distinguish primary lateral sclerosis from amyotrophic lateral sclerosis. Muscle Nerve 2023; 67: 306–310.

60.

Pancotti

Birolo

Rollo

, et al. Deep learning methods to predict amyotrophic lateral sclerosis disease progression. Sci Rep 2022; 12: 13738.

61.

Zhang

, et al. Prognostic models for amyotrophic lateral sclerosis: a systematic review. J Neurol 2021; 268: 3361–3370.

62.

Zhou

Manser

. Does including machine learning predictions in ALS clinical trial analysis improve statistical power? Ann Clin Transl Neurol 2020; 7: 1756–1765.

63.

Chang

C-C

Yeh

J-H

Chen

Y-M

, et al. Clinical Predictors of Prolonged Hospital Stay in Patients with Myasthenia Gravis: A Study Using Machine Learning Algorithms. J Clin Med 2021; 10: 4393.

64.

Lam

Arif

Song

, et al. Machine Learning Analysis Reveals Biomarkers for the Detection of Neurological Diseases. Front Mol Neurosci 2022; 15: 889728.

65.

Coratti

Lenkowicz

Patarnello

, et al. Predictive models in SMA II natural history trajectories using machine learning: A proof of concept study. PLoS ONE 2022; 17: e0267930.

66.

Wang

J-C

Shu

Y-C

Lin

C-Y

, et al. Application of deep learning algorithms in automatic sonographic localization and segmentation of the median nerve: A systematic review and meta-analysis. Artif Intell Med 2023; 137: 102496.

67.

Kuroiwa

Jagtap

Starlinger

, et al. Deep Learning Estimation of Median Nerve Volume Using Ultrasound Imaging in a Human Cadaver Model. Ultrasound Med Biol 2022; 48: 2237–2248.

68.

Wijntjes

van Alfen

. Muscle ultrasound: Present state and future opportunities. Muscle Nerve 2021; 63: 455–466.

69.

Din Abdul Jabbar

Guo

Nag

, et al. Predicting amyotrophic lateral sclerosis (ALS) progression with machine learning. Amyotroph Lateral Scler Front Degener 2024; 25: 242–255.

70.

Zhang

, et al. Prognostic models for amyotrophic lateral sclerosis: a systematic review. J Neurol 2021; 268: 3361–3370.

71.

Srivastava

Darras

. Machine learning algorithms to classify spinal muscular atrophy subtypes. Neurology 2012; 79: 358–364.

72.

Din Abdul Jabbar

Guo

, et al. Describing and characterising variability in ALS disease progression. Amyotroph Lateral Scler Front Degener 2024; 25: 34–45.

73.

Ramamoorthy

Severson

Ghosh

, et al. Identifying patterns in amyotrophic lateral sclerosis progression from sparse longitudinal data. Nat Comput Sci 2022; 2: 605–616.

74.

Westeneng

H-J

Debray

TPA

Visser

, et al. Prognosis for patients with amyotrophic lateral sclerosis: development and validation of a personalised prediction model. Lancet Neurol 2018; 17: 423–433.

75.

Gordon

Lerner

. Insights into Amyotrophic Lateral Sclerosis from a Machine Learning Perspective. J Clin Med 2019; 8: 1578.

76.

Yeo

CJJ

Simmons

De Vivo

, et al. Ethical Perspectives on Treatment Options with Spinal Muscular Atrophy Patients. Ann Neurol 2022; 91: 305–316.

77.

Brooks

Pioro

Beaulieu

, et al. Evidence for generalizability of edaravone efficacy using a novel machine learning risk-based subgroup analysis tool. Amyotroph Lateral Scler Front Degener 2022; 23: 49–57.

78.

Danieli

Tonacci

Paladini

, et al. A machine learning analysis to predict the response to intravenous and subcutaneous immunoglobulin in inflammatory myopathies. A proposal for a future multi-omics approach in autoimmune diseases. Autoimmun Rev 2022; 21: 103105.

79.

Yeo

CJJ

Simmons

. Discussing edaravone with the ALS patient: an ethical framework from a U.S. perspective. Amyotroph Lateral Scler Front Degener 2018; 19: 167–172.

80.

Antoniadi

Galvin

Heverin

, et al. Prediction of caregiver burden in amyotrophic lateral sclerosis: a machine learning approach using random forests applied to a cohort study. BMJ Open 2020; 10: e033109.

81.

Antoniadi

Galvin

Heverin

. A Clinical Decision Support System for the Prediction of Quality of Life in ALS. J Med 2022; 12: 435.

82.

Morimoto

Takahashi

Ito

, et al. Phase 1/2a clinical trial in ALS with ropinirole, a drug candidate identified by iPSC drug discovery. Cell Stem Cell 2023; 30: 766–780.e9.

83.

Tran

Moazami

Yang

, et al. Suppression of MUTANT C9ORF72 Expression by a Potent Mixed Backbone Antisense Oligonucleotide. Nat Med 2022; 28: 117–124.

84.

Miller

Cudkowicz

Shaw

, et al. Phase 1-2 Trial of Antisense Oligonucleotide Tofersen for SOD1 ALS. N Engl J Med 2020; 383: 109–119.

85.

Catanese

Rajkumar

Sommer

, et al. Multiomics and machine-learning identify novel transcriptional and mutational signatures in amyotrophic lateral sclerosis. Brain J Neurol 2023; 146: 3770–3782.

86.

Cavasotto

Di Filippo

. Artificial intelligence in the early stages of drug discovery. Arch Biochem Biophys 2021; 698: 108730.

87.

McGown

Stopford

. High-throughput drug screens for amyotrophic lateral sclerosis drug discovery. Expert Opin Drug Discov 2018; 13: 1015–1025.

88.

Ray

Nowak

Brown

, et al. Small-Molecule-Mediated Stabilization of Familial Amyotrophic Lateral Sclerosis-Linked Superoxide Dismutase Mutants against Unfolding and Aggregation. Proc Natl Acad Sci U S A 2005; 102: 3639–3644.

89.

Nowak

Cuny

Choi

, et al. Improving binding specificity of pharmacological chaperones that target mutant superoxide dismutase-1 linked to familial amyotrophic lateral sclerosis using computational methods. J Med Chem 2010; 53: 2709–2718.

90.

Mubeen

Masood

Zafar

, et al. Insights into AlphaFold’s breakthrough in neurodegenerative diseases. Ir J Med Sci 2024; 193: 2577–2588.

91.

Pun

Liu

BHM

Long

, et al. Identification of Therapeutic Targets for Amyotrophic Lateral Sclerosis Using PandaOmics - An AI-Enabled Biological Target Discovery Platform. Front Aging Neurosci 2022; 14: 914017.

92.

FB1006: AI-discovered drug advances to clinical trials for ALS treatment. News-Medical, https://www.news-medical.net/news/20240227/FB1006-AI-discovered-drug-advances-to-clinical-trials-for-ALS-treatment.aspx (accessed 12 January 2025), 2024.

93.

Qiu

Cheng

. Artificial intelligence for drug discovery and development in Alzheimer’s disease. Curr Opin Struct Biol 2024; 85: 102776.

94.

Aal E Ali

Meng

Khan

MEI

, et al. Machine learning advancements in organic synthesis: A focused exploration of artificial intelligence applications in chemistry. Artif Intell Chem 2024; 2: 100049.

95.

Streu

. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) - IEEE Access %. IEEE Access, https://ieeeaccess.ieee.org/featured-articles/survey_explainablexai/ (accessed 12 January 2025). 2023.

96.

Big Data, Health Law, and Bioethics. https://www.cambridge.org/core/books/big-data-health-law-and-bioethics/CE1C789E3DCF36BEED020A31A76EB48D (accessed 12 January 2025).

97.

Murdoch

Singh

Kumbier

, et al. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci U S A 2019; 116: 22071–22080.

98.

Ioannidis

JPA

. Why most published research findings are false. PLoS Med 2005; 2: e124.

99.

Wolff

Moons

KGM

Riley

, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med 2019; 170: 51–58.

100.

Beaulieu

Berry

Paganoni

101.

Chen

Szolovits

Ghassemi

. Can AI Help Reduce Disparities in General Medical and Mental Health Care? AMA J Ethics 2019; 21: E167–179.

102.

Murdoch

. Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med Ethics 2021; 22: 122.

103.

Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC (Text with EEA relevance.), https://eur-lex.europa.eu/eli/reg/2017/745/oj/eng (accessed 8 January 2025).

104.

FDA. Clinical Decision Support Software. Guidance for Industry and Food and Drug Administration Staff, https://www.fda.gov/regulatory-information/search-fda-guidance-documents/clinical-decision-support-software (accessed 8 January 2025), 2022.

105.

Aung

YYM

Wong

DCS

Ting

DSW

. The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br Med Bull 2021; 139: 4–15.

106.

Fleming

Demets

McShane

. Discussion: The role, position, and function of the FDA—The past, present, and future. Biostat Oxf Engl 2017; 18: 417–421.

107.

FDA. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. FDA, https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (accessed 10 January 2025), 2024.

108.

FDA. Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles. FDA, https://www.fda.gov/medical-devices/software-medical-device-samd/transparency-machine-learning-enabled-medical-devices-guiding-principles (accessed 10 January 2025), 2024.

109.

FDA. Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions, https://www.fda.gov/regulatory-information/search-fda-guidance-documents/marketing-submission-recommendations-predetermined-change-control-plan-artificial-intelligence (accessed 10 January 2025), 2024.

110.

FDA. Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations, https://www.fda.gov/regulatory-information/search-fda-guidance-documents/artificial-intelligence-enabled-device-software-functions-lifecycle-management-and-marketing (accessed 10 January 2025), 2025.

111.

Jiao

Wang

Liu

, et al. Causal Inference Meets Deep Learning: A Comprehensive Survey. Research 7: 0467.

112.

Blaauwbroek

Cerna

Gauthier

, et al. Learning Guided Automated Reasoning: A Brief Survey. Log Type Syst Theory Pract. Epub ahead of print 2024. DOI: https://doi.org/10.48550/arxiv.2403.04017.

113.

What is Automated Reasoning? - Automated Reasoning Explained - AWS. Amazon Web Services, Inc. https://aws.amazon.com/what-is/automated-reasoning/ (accessed 11 January 2025).

114.

Barth

. Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview) | AWS News Blog, https://aws.amazon.com/blogs/aws/prevent-factual-errors-from-llm-hallucinations-with-mathematically-sound-automated-reasoning-checks-preview/ (accessed 11 January 2025), 2024.

115.

Illuminating the dark spaces of healthcare with ambient intelligence. Nature, https://www.nature.com/articles/s41586-020-2669-y (accessed 12 January 2025).

116.

What Is Edge AI? | IBM. https://www.ibm.com/think/topics/edge-ai (accessed 8 January 2025), 2023.

117.

Vayner

. What is AI inference at the edge, and why is it important for businesses? TechRadar, https://www.techradar.com/pro/what-is-ai-inference-at-the-edge-and-why-is-it-important-for-businesses (accessed 8 January 2025), 2024.

118.

Taneja

. The Era of “Move Fast and Break Things” Is Over. Harvard Business Review 2019. https://hbr.org/2019/01/the-era-of-move-fast-and-break-things-is-over . (2019, accessed 8 January 2025).

119.

What Is Supervised Learning? | IBM. https://www.ibm.com/think/topics/supervised-learning (accessed 8 January 2025).

120.

What Is Unsupervised Learning? | IBM. https://www.ibm.com/think/topics/unsupervised-learning (2021, accessed 8 January 2025).

121.

Thiele

Windebank

Siddiqui

. Motivation for using data-driven algorithms in research: A review of machine learning solutions for image analysis of micrographs in neuroscience. J Neuropathol Exp Neurol 2023; 82: 595–610.

122.

Woodman

Mangoni

. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res 2023; 35: 2363–2397.

123.

The Nobel Prize in Physics 2024. NobelPrize.org, https://www.nobelprize.org/prizes/physics/2024/press-release/ (accessed 10 January 2025).

124.

Wang

Wyble

. Hopfield and Hinton’s neural network revolution and the future of AI. Patterns 2024; 5: 101094.

125.

Neural Networks Explained in 5 minutes – YouTube. https://www.youtube.com/watch?v=jmmW0F0biz0 (accessed 10 January 2025).

126.

But what is a neural network? | Deep learning chapter 1. https://www.youtube.com/watch?v=aircAruvnKk (accessed 10 January 2025). 2017.

127.

Gradient descent, how neural networks learn | DL2. https://www.youtube.com/watch?v=IHZwWFHWa-w (accessed 10 January 2025). 2017.

128.

Backpropagation, step-by-step | DL3. https://www.youtube.com/watch?v=Ilg3gGewQ5U (accessed 10 January 2025). 2017.

129.

Hinton

. How Neural Networks Learn from Experience. Epub ahead of print 13 September 2002. DOI: https://doi.org/10.7551/mitpress/1888.003.0011.

130.

Rumelhart

Hinton

Williams

. Learning representations by back-propagating errors. Nature 1986; 323: 533–536.

131.

LeCun

Bengio

Hinton

. Deep learning. Nature 2015; 521: 436–444.

132.

Greco

Chiesa

Da Prato

, et al. Using blood data for the differential diagnosis and prognosis of motor neuron diseases: a new dataset for machine learning applications. Sci Rep 2021; 11: 3371.

133.

Grollemund

Chat

Secchi-Buhour

M-S

, et al. Development and validation of a 1-year survival prognosis estimation model for Amyotrophic Lateral Sclerosis using manifold learning algorithm UMAP. Sci Rep 2020; 10: 13378.

134.

Huber

Pandey

Chhangani

, et al. Identification of potential pathways and biomarkers linked to progression in ALS. Ann Clin Transl Neurol 2023; 10: 150–165.

135.

Zhong

Ruan

Yan

, et al. Short-term outcome prediction for myasthenia gravis: an explainable machine learning model. Ther Adv Neurol Disord 2023; 16: 17562864231154976.

136.

Chang

C-W

L-S

Lyu

R-K

, et al. Establishment of a new classification system for chronic inflammatory demyelinating polyneuropathy based on unsupervised machine learning. Muscle Nerve 2022; 66: 603–611.

137.

Chang

C-C

Yeh

J-H

Chiu

H-C

, et al. Utilization of Decision Tree Algorithms for Supporting the Prediction of Intensive Care Unit Admission of Myasthenia Gravis: A Machine Learning-Based Approach. J Pers Med 2022; 12: 32.

138.

Brooks

Pioro

Beaulieu

, et al. Evidence for generalizability of edaravone efficacy using a novel machine learning risk-based subgroup analysis tool. Amyotroph Lateral Scler Front Degener 2022; 23: 49–57.

139.

Gómez-Andrés

Díaz

Munell

, et al. Disease duration and disability in dysfeRlinopathy can be described by muscle imaging using heatmaps and random forests. Muscle Nerve 2019; 59: 436–444.

140.

Gómez-Andrés

Oulhissane

Quijano-Roy

. Two decades of advances in muscle imaging in children: from pattern recognition of muscle diseases to quantification and machine learning approaches. Neuromuscul Disord NMD 2021; 31: 1038–1050.

141.

Nagawa

Suzuki

Yamamoto

, et al. Texture analysis of muscle MRI: machine learning-based classifications in idiopathic inflammatory myopathies. Sci Rep 2021; 11: 9821.

142.

Verdú-Díaz

Alonso-Pérez

Nuñez-Peralta

, et al. Accuracy of a machine learning muscle MRI-based tool for the diagnosis of muscular dystrophies. Neurology 2020; 94: e1094–e1102.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.10 MB