DP-MDLA Net: Detection of smooth pursuit abnormalities in Parkinson’s disease

Abstract

Objectives:

To develop and validate a smartphone video-based framework using deep learning for quantifying smooth-pursuit abnormalities in Parkinson’s disease.

Methods:

Smartphone videos (N = 54) from 18 patients with confirmed Parkinson’s disease were rigorously annotated to identify 1767 event-level samples (2-second windows), comprising 941 normal and 826 abnormal smooth-pursuit events. Ocular landmarks were extracted using MediaPipe FaceLandmarker. Preprocessing steps included canthus-referenced spatial normalization, Kalman smoothing, and blink filtering. Event samples were encoded as kinematic feature sequences and classified using DP-MDLA Net, a dual-path multi-scale dilated-LSTM attention architecture that fuses convolutional and recurrent representations.

Results:

Under a random split regimen for event samples, the framework achieved 96.59% accuracy, 97.50% precision, 95.12% recall, 96.03% F1-score, and an AUC of 0.9939 on the test set (n = 176). Five-fold cross-validation yielded a mean accuracy of 93.04% (SD 1.86%) and a mean AUC of 0.9735 (SD 0.0102). Subject-independent validation (disjoint split by patient) produced an accuracy of 93.57% and an AUC of 0.9693. Ablation without normalization decreased accuracy to 84.09% and AUC to 0.9323, indicating the critical role of landmark-based spatial alignment.

Conclusion:

The framework enables robust event-level quantification of smooth-pursuit abnormalities from smartphone video, supporting portable bedside assessment and standardized longitudinal monitoring of Parkinson’s disease without specialized equipment.

Keywords

Parkinson’s disease smooth pursuit smartphone deep learning eye movement analysis computer vision

Introduction

Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting an estimated 10 million people worldwide.¹ In addition to the core motor symptoms—tremor,² rigidity,³ and bradykinesia⁴—visual impairments and oculomotor abnormalities are also prevalent,^5,6 yet often overlooked in clinical practice. Compared with other neurological assessment modalities, eye-movement examination is non-invasive, rapid, quantitative, and less subjective. With advances in eye-tracking technology, its auxiliary diagnostic value for PD and other neurological disorders has attracted increasing attention.^6–8

Smooth-pursuit eye movement (SPEM) refers to the ability of the eyes to continuously and uniformly follow a slowly moving target.⁹ Numerous studies have confirmed that this function is markedly impaired in PD patients.^6,10,11 A typical manifestation is saccadic pursuit,¹² in which the patient fails to generate sustained pursuit and instead compensates for the pursuit with a series of catch-up saccades.¹³ These compensatory saccades are frequently dysmetric,¹⁴ predominantly hypometric.¹⁵ Such abnormalities mainly reflect basal ganglia dysfunction¹⁰ and can indicate the severity of motor and cognitive deficits.¹⁶

Current quantitative detection of SPEM abnormalities relies primarily on high-precision eye trackers.¹⁷ Although these devices offer high angular resolution and sampling rates and can systematically evaluate saccades, pursuit, and vestibulo-ocular reflex,¹⁸ they are expensive, require complex operation and laboratory conditions, and demand skilled personnel for data annotation and interpretation. These factors limit their large-scale deployment in primary care and home settings.^14,19

Driven by advances in computer vision and deep learning, researchers have attempted to replace dedicated eye trackers with generic camera devices.^8,14,20 This approach substantially lowers hardware costs and improves portability; however, existing methods still face challenges in data stability, detection reliability, and clinical usability due to illumination variation, camera angle, head motion, tremor interference, and limited capacity for temporal feature mining.

To address these challenges, the present study focuses on the automatic detection of abnormal eye movements during the smooth-pursuit task in PD patients and proposes a solution based on smartphone video acquisition and deep learning analysis. Specifically, we employ a canthus-normalization algorithm to enhance the comparability and robustness of mobile data, design a multi-level feature engineering pipeline to enrich the quality and representativeness of the signals, and construct a deep model that integrates multi-scale dilated convolution with a dual-channel attention mechanism for accurate identification of complex abnormalities. Compared to traditional methods, the proposed approach lowers detection cost, improves portability and applicability, and better meets clinical requirements in terms of data processing and model performance. This work provides a new technological pathway for quantitative monitoring of smooth pursuit abnormalities and demonstrates promising potential for tracking disease progression and treatment response.

Materials and methods

Participants

We enrolled inpatients with an established clinical diagnosis of PD from the Department of Neurology, Affiliated Zhongshan Hospital of Dalian University. All cases had been diagnosed prior to this study in routine clinical care. At enrollment, a senior neurologist reviewed clinical history and examination findings to confirm eligibility for inclusion in the PD cohort. Inclusion criteria were age 40 years or older; prior PD diagnosis; hospitalization for at least 1 month at the time of enrollment; ability to understand and perform fixation and smooth pursuit tasks; unobstructed periocular region with key landmarks clearly visible in study videos; adequate visual acuity with correction; and written informed consent. Exclusion criteria were acute ocular conditions affecting oculomotor assessment or marked ptosis; severe cognitive impairment precluding cooperation; uncorrected visual impairment preventing clear fixation; coexisting major movement disorders; or refusal to participate.

A total of 18 patients were included (age 45–72 years; mean 60.1, standard deviation 7.8; 11 men and 7 women). Disease duration ranged from 2 to 15 years (mean 4.9, standard deviation 2.9). One participant wore a face mask during recording, with both eyes fully unobstructed. According to clinical workflow and participant tolerance, several patients underwent multiple recording sessions, yielding 54 raw video segments. All participants provided written informed consent, and the study was approved by the Ethics Committee of the Affiliated Zhongshan Hospital of Dalian University. Demographic, disease course, and functional information were obtained at the bedside through routine clinical interviews conducted by the attending neurologist, based on reports from patients or their family members.

Clinically, all participants exhibited bradykinesia to varying degrees; three showed severe rigidity; one reported persistent muscle pain²¹; and one had leg tremor that prevented ambulation. Three patients walked slowly with a cane, while the remainder could walk independently but with reduced gait speed and difficulties in sit-to-stand transitions and turning. Tremor was observed in six cases, including mild kinetic tremor of the hands²² and marked rest tremor²³ involving, respectively, the right hand, both legs, tongue, and chin (n = 4), which persisted during recording.

Video acquisition

Recordings were captured with a smartphone (iPhone 14 Pro Max). As illustrated in Figure 1, participants sat on the bedside $\sim$ 40 cm from a 16-inch laptop display (2560 $\times$ 1600, 240 Hz, 16:10). A custom program presented a white circular target (diameter $\sim 0.6 \circ$ ) for a horizontal smooth-pursuit task comprising 12 cycles (one cycle: left-to-right sweep and return). The target moved at constant velocity; each left-to-right sweep lasted 3.2 seconds, with a 1-second pause at each edge before reversal. The motion covered nearly the full screen width, corresponding to a horizontal excursion of $\sim 41 \circ$ peak-to-peak and a mean pursuit speed of $\sim 6.4 \circ$ /s during motion segments. Each session lasted 100 seconds. During the task, the rear camera recorded the face at 1080 p/60 fps. No chin rest was used; participants were instructed to minimize head motion.

Figure 1.

Workflow of video acquisition and eye movement coordinate extraction. The left panel shows the data acquisition setup during a smooth pursuit tracking task performed by the subject. The right panel illustrates the automatic extraction of facial landmarks using the MediaPipe FaceLandmarker model, followed by the selection of eye region landmarks to obtain eye movement coordinates.

Clip segmentation and annotation

Raw videos were annotated using 2-second temporal windows as the basic analysis unit. The first and last 5 seconds of each recording were excluded to avoid start-up and task-disengagement effects.

Each 2-second window was independently reviewed in slow motion (frame-by-frame) by two raters: an experienced neurologist specializing in movement disorders and a research assistant trained in oculomotor pattern recognition. Annotation was performed on continuous video streams using custom timestamping software. Windows were assigned to one of two categories: abnormal event or non-abnormal event. Discrepancies between raters were resolved through consensus discussion.

Abnormal events were defined as the presence of catch-up saccades or other saccadic intrusions interrupting smooth pursuit. Catch-up saccades were identified as rapid eye movements that either advanced the gaze position to “catch up” with the moving target (single-step saccade) or overshot the target followed by a corrective saccade in the opposite direction (two-step pattern). Non-abnormal events were characterized by sustained, uniform horizontal smooth pursuit matching the target velocity without saccadic interruptions or abrupt velocity changes.

Representative examples are illustrated in Figure 2. The non-abnormal event (upper panel) shows synchronized sinusoidal trajectories of both eyes maintaining smooth tracking over the 10-second recording period. The abnormal events (lower panels) demonstrate three characteristic patterns: Abnormal Sample 1 exhibits a single-step corrective saccade; Abnormal Sample 2 shows rapid gaze lag followed by catch-up; and Abnormal Sample 3 displays multiple corrective saccades with overshoot.

Figure 2.

Representative examples of annotated eye movement patterns. Upper panel: Non-abnormal event. Lower panels: Three abnormal events showing different abnormal patterns. Data shown after preprocessing.

Windows containing obvious noncompliance (e.g. participant looking away from the screen, excessive talking, or head movement exceeding the camera frame) were excluded prior to category assignment.

Following annotation and quality control procedures, the final dataset comprised 1767 labeled windows from 18 PD patients, of which 826 were classified as abnormal events and 941 as non-abnormal events.

Ocular landmark extraction

Accurate landmark localization is critical for quantitative oculomotor analysis. We therefore adopted Google’s MediaPipe FaceLandmarker,²⁴ which combines FaceDetector and FaceMesh-V2 to provide stable three-dimensional facial landmarks under varying illumination and viewpoints. The overall workflow for ocular landmark extraction is illustrated in Figure 1.

Raw videos were recorded at 1080 p/60 fps. Landmark extraction was performed by configuring MediaPipe’s video mode to process at 30 fps, which balances temporal resolution with cross-device compatibility (most consumer smartphones default to 30 fps recording) and reduces inter-frame jitter from the landmark detector; consequently, each 2-second analysis window contains 60 annotated frames.

For each frame, the model outputs 478 three-dimensional facial landmarks. To focus on oculomotor analysis, we selected 42 key points for each frame, including 16 peri-ocular points, four iris points, and one pupil point for each eye.

Canthus-based spatial normalization

Static tremor, posture changes, head motion, and variable camera distance introduce substantial noise into PD eye-movement videos. To mitigate these factors we devised a canthus-based normalization scheme (Figure 3(a)). For the right eye, the outer canthus (landmark 33) and inner canthus (landmark 133) define the horizontal bounds, mapped to 0 and 1, respectively. Given the raw horizontal coordinate $x$ of any right-eye landmark, the normalized value $x^{'}$ is

x^{'} = \frac{x - x_{min}}{x_{max} - x_{min}}

(1)

where

x_{min}

and

x_{max}

are the horizontal coordinates of the outer and inner canthi in the same frame, respectively. The left eye is normalized independently using its own canthi. Because the reference is updated frame-wise, global disturbances such as head pose, facial expression and tremor are effectively suppressed, yielding more comparable data across sessions and subjects.

Figure 3.

Illustration of the data preprocessing workflow. (a) Normalization of eye movement coordinates based on canthus (eye corner) positions. (b) Blink detection and filtering to remove invalid data segments. (c) Extraction of eye movement kinematic parameters, including displacement, velocity, and acceleration, from the preprocessed data.

Trajectory smoothing

High-frequency spikes caused by camera shake and landmark jitter hinder reliable detection of abnormal movements. Each of the 42 landmark trajectories was therefore smoothed with a one-dimensional (1D) linear Kalman filter.²⁵ By tuning process and measurement noise parameters, the filter suppresses random noise while preserving genuine large-amplitude fluctuations such as saccades.

Blink detection

Blinks introduce abrupt coordinate changes and may confound feature extraction; previous studies routinely exclude blink frames.^14,19 We propose a simple vertical-coordinate rule (Figure 3(b)): a blink is flagged when the $y$ -coordinate of the upper-eyelid center approaches or surpasses that of the inner canthus. Both complete and incomplete blinks are detected, and the entire 2-second (60-frame) segment is discarded, thereby improving the signal-to-noise ratio of subsequent analysis.

Eye-movement feature extraction

For each 2-second sample (60 frames), we engineered a multi-dimensional kinematic feature set (Figure 3(c)). The horizontal pupil position forms the primary time series. From successive frames, we derived instantaneous displacement, velocity, and acceleration to characterize smoothness and transient fluctuations as follows:

d_{t} = | x_{t} - x_{t - 1} |

(2a)

v_{t} = \frac{| x_{t} - x_{t - 1} |}{Δ t}

(2b)

a_{t} = \frac{| v_{t} - v_{t - 1} |}{Δ t}

(2c)

where

x_{t}

is the normalized horizontal coordinate in frame

t

N = 60

and

Δ t = 1 / 30 seconds

(t = 2, \dots, N)

. To summarize overall behavior within the window, mean velocity, mean acceleration, and peak acceleration were further computed:

\bar{v} = \frac{1}{N} \sum_{t = 1}^{N} v_{t}

(3a)

\bar{a} = \frac{1}{N} \sum_{t = 1}^{N} a_{t}

(3b)

a_{max} = max_{1 \leq t \leq N} a_{t}

(3c)

The resulting feature vector supplies rich, targeted information for subsequent abnormality-recognition model training.

DP-MDLA Net for horizontal pursuit-abnormality detection

We propose a Dual-Path Multi-scale Dilated-LSTM Attention Network (DP-MDLA Net) tailored to recognize abnormal horizontal pursuit in eye-movement recordings (Figure 4). The model accepts the pre-processed time series as input and is designed as a dual-path network,^26,27 in which two parallel branches extract complementary features. The first path combines multi-scale dilated convolutions (dilated convolutional neural network (CNN))²⁸ with a bidirectional long-short term memory (BiLSTM) network²⁹ and local attention³⁰; the second path directly models the sequence by BiLSTM plus local attention. The two path-wise representations are fused through a global attention module³⁰ and passed to a classifier for anomaly detection.

Figure 4.

The architecture of our proposed Dual-Path Multi-scale Dilated-LSTM Attention Network (DP-MDLA Net) model.

Design motivation

Horizontal SPEM possesses strong temporal dependence and complex dynamics. In normal tracking the eye moves at (quasi-)constant velocity, whereas abnormal behavior is characterized by abrupt velocity changes such as catch-up saccades. Dilated convolution can capture motion patterns at multiple temporal scales without a large computational overhead.³¹ By varying the dilation rate the network attends to both fine-grained variations and long-range trends,³² which is well suited for sequences containing sudden onsets of abnormality.

BiLSTM excels at modeling long-term dependencies in sequential data.³³ Applying BiLSTM on top of convolutional features reinforces temporal integration while the bidirectional structure captures context from both past and future, alleviating the vanishing gradient problem of vanilla recurrent neural networks (RNNs).³⁴

Since the raw coordinate series, instantaneous derivatives and aggregated statistics differ markedly in information density and difficulty, DP-MDLA Net adopts a dual-path design. Local attention allows each path to focus adaptively on the most discriminative timestamps or dimensions, whereas global attention merges the complementary information, thus balancing richness and tractability and enhancing the detection of complex anomalies.

Input definition

X \in R^{T \times D}

(4)

where

T

denotes the sequence length and

D

denotes the feature dimension (raw coordinates, instantaneous, and aggregated parameters). The same

X

is fed in parallel to the two extraction paths: one undergoes multi-scale dilated convolutions, the other is encoded directly by BiLSTM.

Multi-scale dilated-convolution module

Eye-movement recordings are highly sequential: smooth transitions and sudden jumps jointly determine whether the behavior is normal. Therefore anomaly detection must exploit both short-term variations and longer temporal trends. Path 1 of DP-MDLA Net adopts a multi-scale dilated CNN to capture features at different time scales in parallel.

Let $X \in R^{T \times D}$ be the input sequence for a single sample. After a transpose operation, three 1D convolutions with dilation rates $d_{1} = 1$ , $d_{2} = 2$ , and $d_{3} = 3$ are applied:

F_{i} = ReLU ({Conv1D}_{d_{i}} (X^{⊤})), i \in {1, 2, 3}

(5)

The resulting feature maps are concatenated along the channel axis,

F_{CNN} = Concat (F_{1}, F_{2}, F_{3})

(6)

yielding a representation that simultaneously retains fine-grained details and broader contextual cues—essential for detecting abrupt but context-dependent pursuit failures.

BiLSTM architecture

Eye-movement sequences exhibit strong temporal dependencies and non-stationary dynamics. To capture both short- and long-range patterns DP-MDLA Net employs a long-short term memory (LSTM) network³⁵ and its bidirectional extension (BiLSTM). By means of cell state and multiple gating mechanisms, LSTM can model temporal information more effectively than conventional RNNs.

As illustrated in Figure 5, an LSTM cell contains a forget gate $f_{t}$ , input gate $i_{t}$ , output gate $o_{t}$ , and a candidate memory ${\tilde{c}}_{t}$ . The gate computations are unified as follows:

z_{t} = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z}), z \in {f, i, o}

(7)

where

σ (\cdot)

denotes the sigmoid function,

x_{t} \in R^{D}

is the input vector at time

t

W_{z}

, and

U_{z}

are trainable weight matrices, and

b_{z}

is the bias.

Figure 5.

Schematic diagram of the long-short term memory (LSTM) cell structure.

The candidate memory is

{\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(8)

and the states are updated by

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

(9)

h_{t} = o_{t} ⊙ \tanh (c_{t})

(10)

where

⊙

is the element-wise product and

\tanh (\cdot)

is the hyperbolic tangent.

Because understanding pursuit failure requires context from both past and future frames, we deploy a bidirectional variant in which forward ( $\vec{\cdot}$ ) and backward ( $\overset{\leftarrow}{\cdot}$ ) hidden states are concatenated:

h_{t}^{(BiLSTM)} = [\vec{h_{t}}; \overset{\leftarrow}{h_{t}}]

(11)

Although BiLSTM already captures sequential dependencies, it may still overlook the few critical instants when the subject fails to follow the stimulus. Therefore, a subsequent attention mechanism³⁶ is employed to highlight the most discriminative time steps and improve anomaly detection accuracy.

Attention mechanisms

Attention redistributes focus across a sequence by assigning weights to its elements.^37,38 In horizontal smooth-pursuit anomaly detection this allows the network to emphasize segments that contain catch-up saccades or velocity drops while down-weighting normal portions. DP-MDLA Net integrates a two-tier attention scheme: local attention is applied within each path, and global attention fuses the two paths.

Local attention

Given the BiLSTM outputs ${h_{t}}_{t = 1}^{T}$ of one path, the importance of frame $t$ is computed by

α_{t} = softmax (v^{⊤} \tanh (W_{attn} h_{t}))

(12)

where

W_{attn}

and the context vector

v

are learnable. The path-level context vector is

a = \sum_{t = 1}^{T} α_{t} h_{t}

(13)

which selectively aggregates the most informative hidden states.

Global attention

Let $a_{CNN}$ and $a_{direct}$ denote the local contexts of the dilated-CNN path and the direct path, respectively. They are merged via

β = softmax (u^{⊤} \tanh (Q [a_{CNN}; a_{direct}]))

(14)

g = β ⊙ [a_{CNN}; a_{direct}]

(15)

where

Q

and the global context vector

u

are trainable. Through the attention mechanism, the resulting vector

g

combines complementary fine- and coarse-scale information, which is then forwarded to the classifier for final predictions.

Classifier and training procedure

After obtaining the global attention vector $g$ , a fully connected layer followed by a softmax activation outputs the probability of each class as follows:

\hat{y} = softmax (W_{cls} g + b_{cls})

(16)

where

W_{cls}

and

b_{cls}

are trainable weights and bias, respectively, and

\hat{y} \in R^{C}

denotes the predicted class distribution.

To provide a systematic view of the end-to-end optimization, Algorithm 1 details the training loop. Step 1 initializes all parameters with Xavier initialization. Steps 2 and 3 iterate over the data for $N_{epoch}$ epochs. Each mini-batch undergoes canthus-based normalization, multi-scale dilated convolution, BiLSTM encoding, local attention, global attention, and, finally, classification. Cross-entropy loss is computed and the network is updated via back-propagation until convergence.

Experiments

To evaluate the effectiveness of DP-MDLA Net in detecting horizontal smooth-pursuit abnormalities, we conducted experiments on a dataset collected from patients with Parkinson’s disease. This section details the dataset, evaluation metrics, baseline methods, training protocol, and experimental results, along with interpretation and analysis.

Dataset

Through the preprocessing steps outlined earlier, we obtained a total of 1767 samples, each corresponding to a 2-second eye-movement sequence. Among these, 941 samples were labeled as normal (negative) and 826 as abnormal (positive). The dataset was randomly split into training, validation, and test sets in an 8:1:1 ratio, using a fixed random seed (42) to ensure reproducibility. Specifically, the training set contains 1415 samples (753 negative and 662 positive), while both the validation and test sets comprise 176 samples each (94 negative and 82 positive).

Evaluation metrics

We adopted five standard classification metrics: accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC). All metrics were computed using scikit-learn v1.6.0. Let $T P$ , $T N$ , $F P$ , and $F N$ denote the counts of true positives, true negatives, false positives, and false negatives, respectively. The metrics are defined as follows:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(17a)

Precision = \frac{T P}{T P + F P}

(17b)

Recall = \frac{T P}{T P + F N}

(17c)

F1-score = \frac{2 Precision Recall}{Precision + Recall}

(17d)

The AUC is computed as the integral under the ROC curve.

Baseline models and training configuration

To provide a comprehensive comparison, we implemented several representative deep learning baselines as follows:

1D-CNN. This model is a conventional 1D CNN with 64 channels and a kernel size of 5, serving as a purely convolutional reference. A dropout rate of 0.1 is used before the final fully connected layer.

BiLSTM. A BiLSTM network with hidden size 128 (dropout 0.1) is employed, enabling the model to capture temporal dependencies in both forward and backward directions.

Transformer. This baseline is a lightweight transformer encoder with four attention heads and a hidden dimension of 32, designed to leverage global self-attention for sequence modeling.

DCNN-BiLSTM-Attn. This hybrid model integrates three dilated convolutions (dilation rates 1, 2, and 3; 128 channels; and kernel size 3) for multi-scale feature extraction, followed by a BiLSTM layer (hidden size 256) and a local attention mechanism to enhance sequential modeling.

BiLSTM-Attn. In this variant, a BiLSTM (hidden size 256) is directly applied to the raw feature sequence, with a local attention layer, omitting any convolutional front-end.

DP-MDLA Net. Our proposed model adopts the same dilation rates and hidden sizes as previously described, including 128 CNN channels (kernel size 3), BiLSTM hidden size 256, two stacked layers, and a dropout rate of 0.1.

All models were trained under an identical protocol: batch size of 2, initial learning rate $1 \times 10^{- 5}$ , AdamW optimizer (weight decay $1 \times 10^{- 5}$ ),³⁹ cross-entropy loss, and a cosine-annealing learning rate scheduler with warm restarts every 20 epochs.⁴⁰ Training was conducted in multiple consecutive stages, each consisting of 200 epochs. At the end of each epoch, the model was evaluated on the validation set, and the checkpoint with the highest validation accuracy was saved. In total, each model was trained for $\sim$ 800 epochs, and the model with the highest validation accuracy was selected for final evaluation on the test set.

Results and analysis

Table 1 summarizes the quantitative results, while Figure 6 provides a visual comparison of the test-set metrics across models. DP-MDLA Net consistently outperforms all baselines by a clear margin across all evaluation metrics.

Figure 6.

Performance comparison of baseline models and the proposed DP-MDLA Net on the test set across five evaluation metrics: accuracy, precision, recall, F1-score, and AUC. DP-MDLA Net achieves the highest accuracy, precision, F1-score, and AUC, and ranks third in recall. DP-MDLA Net: Dual-Path Multi-scale Dilated-LSTM Attention Network; AUC: area under the ROC curve.

Table 1.

Performance comparison of the proposed model and baselines on smooth-pursuit abnormality detection in Parkinson’s disease.

Model	Accuracy	Precision	Recall	F1	AUC
1D-CNN	0.8409	0.8375	0.8171	0.8272	0.9375
BiLSTM	0.8636	0.8452	0.8659	0.8554	0.9580
Transformer	0.9205	0.8696	0.9756	0.9195	0.9808
DCNN-BiLSTM-Attn	0.8920	0.8316	0.9634	0.8927	0.9729
BiLSTM-Attn	0.9205	0.8864	0.9512	0.9176	0.9789
DP-MDLA Net	0.9659	0.9750	0.9512	0.9603	0.9939

1D-CNN: one-dimensional convolutional neural network; BiLSTM: bidirectional long-short term memory; DCNN-BiLSTM-Attn: dilated convolutional neural network and bidirectional long-short term memory attention; DP-MDLA Net: Dual-Path Multi-scale Dilated-LSTM Attention Network.

Examining the accuracy results reveals a clear trend. The convolution-only 1D-CNN achieves 0.8409, reflecting its limited capacity for modeling long-range temporal dependencies. Adding bidirectionality raises the BiLSTM’s accuracy to 0.8636. The transformer, benefiting from global self-attention, achieves 0.9205. Integrating attention into the dilated CNN and BiLSTM hybrid (DCNN–BiLSTM-Attn) yields an accuracy of 0.8920, while attaching attention directly to BiLSTM results in 0.9205. In comparison, DP-MDLA Net achieves an accuracy of 0.9659, an improvement of 4.54 percentage points over the best-performing baseline. Similar patterns are observed for precision (0.9750 vs. 0.8864), recall, and F1-score, confirming that our architecture not only reduces false positives but also maintains a high true-positive rate and balanced overall performance.

Figure 7 shows the ROC curves and corresponding AUC values for all models. All models achieve an AUC >0.93, suggesting that the task is intrinsically separable. Notably, DP-MDLA Net obtains an AUC of 0.9939; its ROC curve is nearly indistinguishable from the ideal top-left boundary, demonstrating near-perfect discrimination between normal and abnormal pursuit patterns in PD. These results demonstrate that while attention mechanisms and global context modeling improve performance, the integration of multi-scale dilated convolutions with sequential and attention-based modules in DP-MDLA Net brings further gains, particularly in reducing false positives and improving robustness to subtle abnormal events.

Figure 7.

ROC curves of all methods on the test set. DP-MDLA Net almost coincides with the ideal top-left corner (AUC $=$ 0.9939). ROC: receiver operating characteristic; DP-MDLA Net: Dual-Path Multi-scale Dilated-LSTM Attention Network; AUC: area under the ROC curve.

Five-fold cross-validation

To validate the robustness of DP-MDLA Net across different data partitions, we conducted five-fold stratified cross-validation on the entire dataset (1767 samples: 941 normal and 826 abnormal). Each fold was trained independently for 200 epochs using batch size 2, AdamW optimizer (learning rate $1 \times 10^{- 5}$ and weight decay $1 \times 10^{- 5}$ ), and cosine annealing scheduler ( $T_{max} = 20$ and $η_{min} = 1 \times 10^{- 6}$ ). For each fold, the checkpoint with the highest validation accuracy was retained for evaluation.

Table 2 summarizes the performance across all five folds. The model achieved a mean accuracy of 93.04% $\pm$ 1.86%, with precision, recall, F1-score, and AUC of 95.36% $\pm$ 2.00%, 89.47% $\pm$ 2.72%, 92.31% $\pm$ 2.07%, and 97.35% $\pm$ 1.02%, respectively. The low standard deviations across all metrics demonstrate stable performance regardless of data partitioning, confirming strong generalization capability.

Table 2.

Five-fold cross-validation results and summary for DP-MDLA Net.

Fold	Accuracy	Precision	Recall	F1-score	AUC
1	0.9294	0.9732	0.8735	0.9206	0.9702
2	0.9492	0.9623	0.9273	0.9444	0.9862
3	0.9093	0.9290	0.8727	0.9000	0.9622
4	0.9490	0.9682	0.9212	0.9441	0.9822
5	0.9150	0.9355	0.8788	0.9062	0.9670
Mean	0.9304	0.9536	0.8947	0.9231	0.9735
SD	0.0186	0.0200	0.0272	0.0207	0.0102

DP-MDLA Net: Dual-Path Multi-scale Dilated-LSTM Attention Network; AUC: area under the ROC curve.

To gain insights into the model’s failure modes, we analyzed the distribution of prediction errors across the five-fold cross-validation. Table 3 summarizes the confusion matrix statistics for each fold. On average, the model produced 7.2 false positives (FP) and 17.4 false negatives (FN) per fold, corresponding to a false-positive rate (FPR) of 3.83% and false-negative rate (FNR) of 10.53%. The higher FNR indicates that the model occasionally fails to detect subtle abnormal pursuit patterns, while the low FPR demonstrates reliable recognition of normal smooth pursuit.

Table 3.

Error distribution across five-fold cross-validation.

Fold	TP	FP	FN	TN	FPR/FNR
1	145	4	21	184	2.13%/12.65%
2	153	6	12	183	3.17%/7.27%
3	144	11	21	177	5.85%/12.73%
4	152	5	13	183	2.66%/7.88%
5	145	10	20	178	5.32%/12.12%
Mean	147.8	7.2	17.4	181.0	3.83%/10.53%
SD	4.15	3.03	4.16	3.08	1.50%/2.65%

TP: true positives; FP: false positives; FN: false negatives; TN: true negatives; FPR: false-positive rate; FNR: false-negative rate.

Figure 8 visualizes the average confusion matrix. The high values along the main diagonal (TN $=$ 181.0 and TP $=$ 147.8) and low off-diagonal errors (FP $=$ 7.2 and FN $=$ 17.4) confirm the model’s robust discriminative ability. The higher FNR (FN>FP) reflects a classification bias that prioritizes specificity over sensitivity.

Figure 8.

Average confusion matrix across five-fold cross-validation. The higher FNR (10.53%) relative to FPR (3.83%) reflects a conservative classification tendency. FPR: false-positive rate; FNR: false-negative rate.

The fold-to-fold variability in FNR (7.27%–12.73%, SD $=$ 2.65%) is higher than in FPR (2.13%–5.85%, SD $=$ 1.50%), likely reflecting heterogeneity in abnormal pursuit presentations across patient subsets. Overall, the error analysis shows that DP-MDLA Net maintains a conservative classification tendency with consistently low FPRs across all folds.

Subject-independent evaluation

To further assess the model’s generalization capability on unseen patient data, we reorganized the dataset using a strict subject-independent split strategy: the training set comprised 12 patients (1178 samples: 586 normal and 588 abnormal), the validation set included three patients (247 samples: 144 normal and 103 abnormal), and the test set contained three patients (342 samples: 207 normal and 135 abnormal). This ensured that no samples from the same patient appeared across the training, validation, and test sets, thereby eliminating the influence of patient-specific cues on model performance.

The training configuration consisted of a batch size of 2, the AdamW optimizer (learning rate $1 \times 10^{- 5}$ and weight decay $1 \times 10^{- 5}$ ), and a cosine annealing learning rate scheduler ( $T_{max} = 20$ and $η_{min} = 1 \times 10^{- 6}$ ). After training for 200 epochs, the checkpoint with the best validation performance was selected for test set evaluation.

On the test set, the model achieved an accuracy of 93.57%, a precision of 96.69%, a recall of 86.67%, and an F1-score of 91.38%. Figure 9 presents the model’s ROC curve, with an AUC of 96.93%, demonstrating that the model maintains excellent classification performance and generalization capability in the subject-independent scenario.

Figure 9.

ROC curve of the model on the subject-independent test set. The model achieves an AUC of 96.93%. ROC: receiver operating characteristic; AUC: area under the ROC curve.

Ablation study: Impact of canthus-based normalization

To validate the effectiveness of the proposed canthus-based spatial normalization method, we conducted an ablation study by training and evaluating the DP-MDLA Net without this preprocessing step. The experimental setup was identical to the baseline comparison: an 8:1:1 random split (1415 training, 176 validation, and 176 test samples), 800-epoch training with batch size 2, AdamW optimizer (learning rate $1 \times 10^{- 5}$ and weight decay $1 \times 10^{- 5}$ ), and cosine annealing scheduler ( $T_{max} = 20$ and $η_{min} = 1 \times 10^{- 6}$ ).

In the ablation variant, raw eye-movement coordinates extracted from MediaPipe were directly used without canthus-based normalization (equation (1)), meaning the model received absolute pixel coordinates susceptible to camera distance, head pose, facial expression, and tremor variations.

Table 4 presents the quantitative comparison between the full model (with normalization) and the ablation variant (without normalization). Removing the canthus-based normalization resulted in substantial performance degradation: accuracy dropped from 96.59% to 84.09% (a decrease of 12.50 percentage points), precision decreased from 97.50% to 88.57%, recall fell from 95.12% to 75.61%, F1-score declined from 96.03% to 81.58%, and AUC reduced from 0.9939 to 0.9323. These results clearly demonstrate that the canthus-based normalization is crucial for suppressing noise introduced by global disturbances and enhancing the model’s ability to detect subtle abnormal eye movement patterns in PD patients.

Table 4.

Ablation study on canthus-based normalization.

Variant	Acc.	Prec.	Rec.	F1	AUC
w/o Norm.	0.8409	0.8857	0.7561	0.8158	0.9323
w/ Norm.	0.9659	0.9750	0.9512	0.9603	0.9939
Gain (%)	+12.50	+8.93	+19.51	+14.45	+6.16

Acc.: accuracy; Prec.: precision; Rec.: recall; F1: F1-score; AUC: area under the ROC curve.

Discussion

Factors contributing to high precision

The high precision (97.50%) achieved by DP-MDLA Net reflects its ability to capture the brief pursuit interruptions that characterize parkinsonian oculomotor control. Wu et al.⁴¹ documented frequent “non-compensatory” saccades during pursuit-intrusions that arise even when gaze is already aligned with, or ahead of, the target—together with progressive binocular misalignment. These transient intrusions are the type of events our model targets and can be overlooked by metrics that average pursuit gain or spectral content over longer windows. The dual-path architecture represents both the slow-phase envelope and the sharp kinematics of catch-up saccades, and hierarchical attention aligns the two streams to improve discrimination in borderline windows while limiting false positives.

AUC performance characteristics

The AUCs are uniformly high across evaluated time windows, suggesting that discriminative information is present throughout the tracking epoch rather than being confined to initiation or a particular phase. Early instrumented studies⁴² noted that saccadic peak velocity is largely preserved in PD, whereas amplitudes are reduced (hypometria) and pursuit gain varies widely across individuals. Subsequent reviews⁴³ cautioned that medication state and task design compound this variability, limiting the diagnostic value of aggregate pursuit metrics. In contrast, event-level features that capture the microstructure of pursuit interruptions—beyond gain alone—appear more stable. Multi-scale integration helps accommodate inter-individual differences by pooling evidence across temporal resolutions, while canthus-referenced normalization reduces acquisition-specific drift. Preprocessing steps such as Kalman filtering and blink removal further enhance separability, and the dual-path design contributes modest, consistent gains in the most difficult windows.

Clinical utility and implications

Beyond a binary label, DP-MDLA Net enables continuous, event-level phenotyping of pursuit control. Recent work by Li et al.⁴⁴ showed that automated analysis of pursuit velocity stability and catch-up saccade frequency can separate patients with PD from healthy controls using engineered features. We apply the same idea to smartphone video, avoiding specialized eye-tracking hardware while maintaining strong discriminative performance under ward-side conditions. The model yields window-level scores that can be aggregated into session-level indices to track change over time. Because recordings can be repeated with minimal burden, clinicians can monitor within-patient trajectories and medication-state fluctuations, complementing examiner-rated scales with standardized, reproducible measurements. In outpatient and home settings, a rising rate or amplitude of catch-up saccades across sessions could flag deteriorating pursuit control and prompt timely review.

Distinct neurodegenerative disorders show different patterns of impairment across pursuit and saccadic control, and task- and feature-level choices can help differentiate these profiles. In this setting, a low-cost, event-level digital marker is well suited for research stratification and trial endpoints while remaining usable at the bedside. Digital oculomotor markers also complement fluid biomarkers in PD monitoring: blood-based measures such as neurofilament light chain⁴⁵ reflect axonal injury but require sampling and laboratory infrastructure, whereas smartphone-based pursuit assessment provides non-invasive functional readouts that can be obtained frequently at low cost. Combining both modalities may improve longitudinal monitoring by linking structural disease burden with functional control.

Feasibility and innovation

Several design choices distinguish DP-MDLA Net and support practical use. First, reliance on smartphone video removes the need for infrared eye trackers, chin rests, or controlled lighting. Whereas classical studies used magnetic search coils⁴² or clinical-grade oculography, our approach trades absolute positional precision for accessibility and shows that velocity-derived features carry sufficient signal for robust event detection. Second, canthus-referenced normalization compensates for camera distance, view angle, and involuntary head motion without prior calibration, lowering barriers in resource-constrained clinics and for patients with motor disability. Finally, the dual-path, multi-scale architecture with hierarchical attention targets nuisance variation at its source and supports event-level detection from commodity video. The resulting accuracy (96.59%) and AUC (0.9939) are within the range reported by instrumented studies for related tasks, motivating prospective validation and pilot integration into clinical workflows.

Limitations and future directions

This work should be interpreted in light of its scope and design. Our focus was event-level detection of pursuit abnormalities in PD, analogous to quantifying the oculomotor items in rating scales such as the UPDRS, rather than differential diagnosis. Trained raters labeled 2-second windows as abnormal or normal based on dysmetric catch-up saccades, gaze jumps, and tracking instability. We did not collect concurrent infrared eye-tracker recordings as a reference standard. Because all recordings were obtained from confirmed PD patients, the training data do not include healthy controls or disease controls such as essential tremor, progressive supranuclear palsy, or cerebellar ataxia.

This design fits the primary objective of building a robust detector to support longitudinal monitoring and complement examiner ratings, but it limits our ability to assess diagnostic specificity or to distinguish PD from other movement disorders. Future work will include same-session validation against clinical-grade eye tracking to establish quantitative correspondence between smartphone-derived and reference measurements, and the recruitment of healthy and disease-control cohorts to test generalization to differential diagnosis.

Although canthus-based normalization mitigates many sources of variation, video quality still depends on illumination, hand-held stability, and device type, which in turn affects landmark detection by MediaPipe or similar frameworks. The present study focuses on horizontal pursuit; vertical tracking and other oculomotor abnormalities remain to be explored. Moreover, the network outputs a binary label and does not grade severity. A clinically useful severity scale will need to be defined with movement-disorder specialists and incorporated into the learning objective. The dataset contains 1767 samples from 18 individuals, limiting demographic diversity. Finally, two mobile devices are currently required—one for stimulus presentation and one for video capture. As mobile hardware evolves, merging these functions into a single unit would further simplify the setup and enhance portability.

Despite these constraints, our results show that smartphone-based video analysis can provide a reliable and scalable alternative to equipment-intensive eye tracking for assessing Parkinsonian smooth-pursuit deficits.

Conclusion

We present a smartphone-based framework for detecting smooth-pursuit abnormalities in PD that combines canthus-referenced normalization, targeted preprocessing, and DP-MDLA Net, a dual-path, multi-scale dilated-LSTM attention model. Evaluated on 1767 2-second eye-movement sequences from 18 patients, the method achieved 96.59% accuracy, 97.50% precision, 96.03% F1-score, and an AUC of 0.9939 on held-out test data, outperforming CNN, LSTM, and transformer baselines.

Generalization was supported by five-fold cross-validation (mean accuracy 93.04% and AUC 97.35%) and a subject-independent split (accuracy 93.57% and AUC 96.93%). An ablation study showed that canthus-based normalization is critical, increasing accuracy by 12.50 percentage points and improving all other metrics.

By removing the need for specialized eye trackers, the framework supports low-cost, portable bedside assessment and event-level phenotyping from commodity video. Window-level outputs can be aggregated into session indices to track within-patient change and medication-state fluctuations, providing a standardized, reproducible complement to examiner-rated scales for longitudinal monitoring of smooth-pursuit impairment in PD.

Footnotes

Acknowledgements

We are deeply grateful to all participants who generously took part in this study.

ORCID iD

Zhiyuan Tan

Ethical approval

This study was approved by the Ethics Review Committee for Scientific Research Projects of Affiliated Zhongshan Hospital of Dalian University (REC number: KY2023-103-1). The study complied with the principles outlined in the Declaration of Helsinki, ensuring that all participants were treated ethically, their rights were respected, and their privacy was protected throughout the research process. Written informed consent was obtained from all participants prior to enrollment.

Consent to participate

Written informed consent was obtained from all participants prior to the study. Participants were informed that their participation was voluntary, that they could withdraw at any time without consequences, and that their data would be anonymized and stripped of identifying information during analysis.

Author contributions

ST and SSL provided the conceptual framework for this research and played a leading role in guiding its direction. ZYT conducted the experiments, collected and processed the data, and designed the models. JZ contributed expertise in the medical field and provided suggestions for experiment design. LWK provided additional assistance, including literature review, experimental support, and guidance on manuscript preparation. QFS provided funding support. YNL and QJZ participated in data annotation. ZYT drafted the initial manuscript. All authors reviewed the manuscript and approved the final version.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Interdisciplinary Project of Dalian University (grant numbers DLUXK-2024-QN-014 and DLUXK-2025-FX-001).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Guarantor

Use of generative AI

Generative artificial intelligence tools (ChatGPT-4 by OpenAI and Claude by Anthropic) were used exclusively for language editing, translation, and improving the clarity and readability of the manuscript text. These tools were not involved in data collection, analysis, interpretation, figure generation, or any substantive intellectual contribution to the research.

Data availability statement

The data supporting the findings of this study are available on request from the corresponding author.

References

Parkinson’s Foundation. Statistics, 2025. Available from: https://www.parkinson.org/understanding-parkinsons/statistics (2025, accessed 1 Jun 2025).

Zhong

Liu

, et al. Association of motor subtype and tremor type with Parkinson’s disease progression: an exploratory longitudinal analysis. J Parkinson’s Disease 2025; 15: 111–121.

Baradaran

Tan

Liu

, et al. Parkinson’s disease rigidity: relation to brain connectivity and motor performance. Front Neurol 2013; 4: 67.

Gao

Smith

Lones

, et al. Objective assessment of bradykinesia in Parkinson’s disease using evolutionary algorithms: clinical validation. Translat Neurodegener 2018; 7: 18.

Sun

Beylergil

Gupta

, et al. Monitoring eye movement in patients with Parkinson’s disease: What can it tell us?. Eye and Brain 2023; 15: 101–112.

Zhang

Yang

, et al. Abnormal eye movements in Parkinson’s disease: from experimental study to clinical application. Parkinson Relat Disord 2023; 115: 105791.

Sekar

Panouillères

Kaski

. Detecting abnormal eye movements in patients with neurodegenerative diseases—current insights. Eye and Brain 2024; 16: 3–16.

Meng

Zhao

. Webcam-based eye movement analysis using CNN. IEEE Access 2017; 5: 19581–19587.

Fujita

Kawaguchi

Toyomoto

, et al. Istradefylline improves impaired smooth pursuit eye movements in Parkinson’s disease. Neurol Therapy 2023; 12: 1791–1798.

10.

Visser

Bour

Lee

, et al. Eye movement abnormalities in essential tremor versus tremor dominant Parkinson’s disease. Clin Neurophysiol 2019; 130: 683–691.

11.

Zhou

Wang

Lin

, et al. Oculomotor impairments in de novo Parkinson’s disease. Front Aging Neurosci 2022; 14: 985679.

12.

Frei

. Abnormalities of smooth pursuit in Parkinson’s disease: a systematic review. Clin Parkinson Related Disord 2021; 4: 100085.

13.

Hermann

Robert

Lagadec

, et al. Correction to: catch-up saccades in vestibular hypofunction: a contribution of the cerebellum?. The Cerebellum 2024; 23: 144–144.

14.

Chang

Chen

Stephen

, et al. Accurate detection of cerebellar smooth pursuit eye movement abnormalities via mobile phone video and machine learning. Scient Rep 2020; 10: 18641.

15.

Lage

Sánchez-Rodríguez

Rivera-Sánchez

, et al. Oculomotor dysfunction in idiopathic and LRRK2-Parkinson’s disease and at-risk individuals. J Parkinson’s Disease 2024; 14: 797–808.

16.

Wang

Zhao

, et al. Clinical and oculomotor correlates with freezing of gait in a chinese cohort of Parkinson’s disease patients. Front Aging Neurosci 2020; 12: 237.

17.

Wang

, et al. Smooth pursuit and reflexive saccade in discriminating multiple-system atrophy with predominant Parkinsonism from Parkinson’s disease. J Clin Neurol 2024; 20: 194.

18.

Berndt

Kirkpatrick

Taviano

, et al. Tertiary eye movement classification by a hybrid algorithm. arXiv preprint arXiv:1904.10085, 2019. 10.48550/arXiv.1904.10085.

19.

Azami

Chang

Arnold

, et al. Detection of oculomotor dysmetria from mobile phone video of the horizontal saccades task using signal processing and machine learning approaches. IEEE Access 2022; 10: 34022–34031.

20.

Koch

Voss

Cisneros-Franco

, et al. Eye movement function captured via an electronic tablet informs on cognition and disease severity in Parkinson’s disease. Scient Rep 2024; 14: 9082.

21.

Lei

Tang

Jing

, et al. Antinociceptive role of the thalamic dopamine D3 receptor in descending modulation of intramuscular formalin-induced muscle nociception in a rat model of Parkinson’s disease. Exper Neurol 2024; 379: 114846.

22.

Parihar

Alterman

Papavassiliou

, et al. Comparison of VIM and STN DBS for Parkinsonian resting and postural/action tremor. Tremor Other Hyperkinet Movem 2015; 5: 321.

23.

Duanmu

Wen

Qin

, et al. Differential influences of rest tremor on brain fiber architecture in essential tremor and Parkinson’s disease. Parkinson Related Disord 2024; 123: 106559.

24.

Google. Facelandmarker [internet], 2025. [cited 2025 June 1]. Available from: https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker?hl=zh-cn.

25.

Wadehn

Weber

Mack

, et al. Model-based separation, detection, and classification of eye movements. IEEE Trans Biomed Eng 2020; 67: 588–600.

26.

Jang

Kim

Cho

. Dual path denoising network for real photographic noise. IEEE Signal Process Lett 2020; 27: 860–864.

27.

Wang

Chen

. Dual-path decoder architecture for semantic segmentation of wheat ears. Appl Intell 2024; 55: 128.

28.

Koltun

. Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations (ICLR). http://arxiv.org/abs/1511.07122. Published as a conference paper at ICLR 2016.

29.

Schuster

Paliwal

. Bidirectional recurrent neural networks. IEEE Trans Signal Process 1997; 45: 2673–2681.

30.

Luong

Pham

Manning

. Effective approaches to attention-based neural machine translation. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP). pp.1412–1421. http://arxiv.org/abs/1508.04025. Published in EMNLP 2015.

31.

Jia

Peng

, et al. A multi-scale dilated residual convolution network for image denoising. Neural Process Lett 2023; 55: 1231–1246.

32.

Hou

, et al. C-BDCLSTM: a false emotion recognition model in micro blogs combined char-CNN with bidirectional dilated convolutional LSTM. Appl Soft Comput 2022; 130: 109659.

33.

Shu

Duan

Shao

, et al. Precipitation spatio-temporal forecasting in china via DC-CNN-BiLSTM. Water 2025; 17: 1381.

34.

Balaji

Brindha

Elumalai

, et al. Automatic and non-invasive Parkinson’s disease diagnosis and severity rating using LSTM network. Appl Soft Comput 2021; 108: 107463.

35.

Hochreiter

Schmidhuber

. Long short-term memory. Neural Comput 1997; 9: 1735–1780.

36.

Bahdanau

Cho

Bengio

. Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations (ICLR). https://arxiv.org/abs/1409.0473. Presented as an oral presentation at ICLR 2015.

37.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inform Process Syst 2017; 30: 5998–6008.

38.

Brauwers

Frasincar

. A general survey on attention mechanisms in deep learning. IEEE Trans Knowled Data Eng 2023; 35: 3279–3298.

39.

Loshchilov

Hutter

. Decoupled weight decay regularization. In: Proceedings of the international conference on learning representations (ICLR). https://arxiv.org/abs/1711.05101. Published in ICLR 2019.

40.

Loshchilov

Hutter

. Sgdr: Stochastic gradient descent with warm restarts. In: Proceedings of the international conference on learning representations (ICLR). https://arxiv.org/abs/1608.03983. Published in ICLR 2017.

41.

Cao

Dali

, et al. Eye movement control during visual pursuit in Parkinson’s disease. PeerJ 2018; 6: e5442.

42.

Rottach

Riley

DiScenna

, et al. Dynamic properties of horizontal and vertical eye movements in Parkinsonian syndromes. Ann Neurol: Off J Am Neurol Assoc Child Neurol Soc 1996; 39: 368–377.

43.

Antoniades

Kennard

. Ocular motor abnormalities in neurodegenerative disorders. Eye 2015; 29: 200–207.

44.

Butala

Moro-Velazquez

, et al. Automating the analysis of eye movement for different neurodegenerative disorders. Comput Biol Med 2024; 170: 107951.

45.

Virata

MCA

Catahay

Lippi

, et al. Neurofilament light chain: a biomarker at the crossroads of clarity and confusion for gene-directed therapies. Neurodegener Dis Manag 2024; 14: 227–239.