Sage Journals: Discover world-class research

Abstract

Background

Sleep stage identification is critical in multiple areas (e.g. medicine or psychology) to diagnose sleep-related disorders. Previous studies have reported that the performance of machine learning algorithms can be changed depending on the biosignals and feature-extraction processes in sleep stage classification.

Methods

To compare as many conditions as possible, 414 experimental conditions were applied, considering the combination of different biosignals, biosignal length, and window length. Five biosignals in polysomnography (i.e. electrocardiogram (ECG), electroencephalogram (EEG), electromyogram (EMG), electrooculogram left, and electrooculogram right) were used to identify optimal signal combinations for classification. In addition, three different signal-length conditions and six different window-length conditions were applied. The validity of each condition was examined via classification performance from the XGBoost classifiers trained using 10-fold cross-validation. Furthermore, results considering feature importance were examined to validate the experimental results in terms of model explanation.

Results

The combination of EEG + EMG + ECG with a 40 s window and 120 s signal length resulted in the best classification performance (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853). Compared to other conditions and feature importance results, EEG signals showed a relatively higher importance for classification in the present study.

Conclusion

We determined the optimal biosignal and window conditions for the feature-extraction process in machine learning algorithm-based sleep stage classification. Our experimental results inform researchers in the future conduct of related studies. To generalize our results, more diverse methodologies and conditions should be applied in future studies.

Keywords

Sleep stage classification polysomnography biosignal machine learning classification algorithm

Introduction

The number of people with sleep-related disorders is increasing continuously, while the underlying causes can be diverse.^1–3 To date, and especially during the coronavirus disease 2019 (COVID-19) pandemic, the prevalence of sleep disturbances has widely increased, affecting various subpopulations. Al Maqbali et al. ⁴ in a meta-analytic study examined the psychological impact of stress and sleep disturbances associated with the COVID-19 pandemic on nurses working in hospitals. The authors suggested that in the context of the COVID-19 pandemic, experiences of sleep disturbance and depression among nurses were found to be higher than those related to previous Middle East respiratory syndrome and severe acute respiratory syndrome pandemics. Deng et al. ⁵ investigated rates of sleep disturbance in college students in a systematic review. The prevalence of sleep disturbances and associated risk of mental illness was found to be increased in association with the duration of the ongoing pandemic as well as higher age. Contrary to the aforementioned previous studies, however, Ara et al.⁶ conducted a web-based survey to identify sleep disturbance during the COVID-19 lockdown in the general population, including 1128 individuals from Bangladesh. They found various factors, such as working from home or doing online classes, to be linked with the presence of sleep disorders during the pandemic.

The classification of sleep stages is important when examining sleep disorders and disturbances. Various methodologies have been applied to measure the depth or stage of sleep. Haythornthwaite et al.⁷ attempted to develop a sleep diary assessment for patients with chronic pain based on diverse categories of questions for evaluation (e.g. difficulties falling asleep, early awakening, and quality of sleep). Currie et al.⁸ collected sleep reports from patients with alcohol-dependency to evaluate sleep problems. They found similar difficulties falling asleep among alcoholics with short-term and long-term abstinence.

In recent research, physiological data collected from participants have been widely used to overcome biases associated with self-reports in the form of sleep diaries. Yong et al.⁹ conducted polysomnography studies including 124 participants with Parkinson's disease, and revealed altered sleep architecture and reduced sleep duration in these patients. They analyzed variations in several biosignals, including electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), and electrocardiogram (ECG). In addition, Goyal et al.¹⁰ identified risk factors related to obstructive sleep apnea in COVID-19 patients using a polysomnography dataset. Data from EEG, EOG, EMG, and body position were used to compare apnea levels and sleep depth.

In previous studies, machine learning algorithms have widely been used to find latent patterns in multiple biosignals and variables. Arslan et al.¹¹ used machine learning classification models to automatically score sleep stages using multichannel data from polysomnography. The framework proposed by the authors showed superior performance in sleep stage classification compared with other models used in previous studies. Furthermore, Satapathy et al.¹² proposed machine learning models for the classification of sleep stages. Their systems focused on sleep irregularities based on abnormal sleep patterns. They used a polysomnographic dataset and evaluated the performance of their model. The respective framework achieved a higher classification accuracy than the models proposed previously.

Similar to the existing studies mentioned above, we confirmed that diverse biosignal data in the context of polysomnography can be applied to develop sleep stage classification using machine learning models. In addition, we reviewed several studies related to feature extraction conditions and types of biosignals. Wongsirichot and Hanskunatai¹³ compared four machine learning algorithms (k-means clustering, k-nearest neighbor, support vector machine, and multilayer perceptron) based on four biosignals (EEG, muscle movement, ECG, and thoracic respiratory efforts) in a polysomnography dataset. The authors determined that the classification performance of the machine learning algorithms changed with the combinations of biosignals. Based on their results, they suggested the importance of investigating the optimal features for sleep-level detection. The influence of EEG features on machine learning classifiers was validated by Satapathy et al.¹⁴ Twelve features were calculated from the EEG signals in polysomnography datasets. The classification performances of the machine learning algorithms were compared in three different sets of feature conditions (of 12, 9, and 5 features). Each result for the three feature conditions showed different performance through combinations of EEG features. Santaji et al.¹⁵ applied three different epoch lengths (of 1, 2, and 10 s) in the EEG feature-extraction process to identify the effects of EEG signal length on the sleep scoring performance of machine learning classifiers. The authors verified that the classification performance of three classifiers (decision tree, support vector machine, and random forest) can be altered based on the feature extraction conditions. Based on associated studies, including the aforementioned three studies, we evaluated the classification performance of machine learning algorithms using several combinations of biosignals (i.e. ECG, EEG, EOG left (EOGL), EOG right (EOGR), and EMG) in this study. Furthermore, different combinations of windows and signal lengths during feature extraction were compared. Finally, the performance of each model was validated by examining feature importance.

Methods

Overview

To compare the influences of different biosignals and several conditions for feature extraction on the performance of machine learning algorithms for sleep stage classification, we composed a five-step research scheme. First, a total of five biosignals (ECG, EEG, EMG, EOG from left-eye-movements, and EOG from right-eye-movements) were extracted from the polysomnography dataset (sleep heart health study dataset). Second, 64 features were calculated from the 5 previously selected biosignals, with diverse window and signal length conditions. Third, utilizing the extracted 64 feature sets, we created all possible combinations based on the biosignals (e.g. combinations with two signals: ECG + EEG or ECG + EMG). Fourth, each dataset of predefined conditions was used to train and evaluate the machine learning classification algorithm (XGBoost classifier). Finally, the classification performance of the XGBoost classifier was evaluated using four performance indices. In addition, the feature importance results of the experimental conditions with the highest classification performance are based on the trained algorithms. A detailed depiction of the present research scheme is presented in Figure 1.

Figure 1.

Overview of the research scheme. ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Data source

An open-source polysomnography dataset from the sleep heart health study (SHHS) was used.^16,17 The SHHS is a multicenter cohort study conducted by the National Heart Lung and Blood Institute in the United States aiming to investigate cardiovascular and other consequences of sleep-disordered breathing. A total of 9736 participants (mean age: 40 years) were tested for associations between sleep-related breathing and the risk of heart disease, stroke, and hypertension. In addition, the respective dataset was collected over two cycles. The first cycle (SHHS visit 1) included surveys from 6441 participants enrolled between November 1, 1995, and January 31, 1998. Second-cycle surveys (SHHS visit 2) were conducted from January 2001 to June 2003 including 3295 participants. The final included datasets included polysomnography and survey data. In the case of polysomnography datasets, several collected biosignals were saved in EDF file format. One EDF file was used per participant. Consequently, 9736 EDF files were included in the SHHS dataset. Each biosignal in EDF files was labeled with sleep level scores in 30 s intervals. A total of six sleep level scores were included in polysomnography datasets (e.g. awake level and 1 ∼ 5 levels based on the sleep depth). For the survey results datasets, each response to the survey questions was included in two Excel files (i.e. SHHS1.xlsx and SHHS2.xlsx). The detailed subcategories of the variables of the SHHS dataset are listed in Table 1.

Table 1.

Categories of variables in the SHHS dataset.

No.	Category	No.	Category	No.	Category
1	Demographics	5	Medication	9	Family history CVD
2	SES	6	Smoking	10	Diabetes
3	Obesity/overweight	7	Alcohol intake	11	Lipids
4	Blood pressure/hypertension	8	Subclinical CVD	12	Respiratory diseases and symptoms

CVD: chronic vascular disease; SHHS: sleep heart health study.

Feature extraction from biosignals

An overview of the feature-extraction process is shown in Figure 2 . Among the two categories of data (i.e. biosignal and survey data) in the SHHS dataset, besides survey data we only used five biosignals (ECG, EEG, EMG, EOGL, and EOGR) from the polysomnography dataset. To examine the influence of signal length and window length during the feature-extraction process on classification, we created 18 conditions (i.e. 3 signal length conditions × 6 window length conditions = total 18 conditions). In the case of signal length, 3 length conditions were applied (60, 90, and 120 s). Sliced signals were used for feature extraction based on 6 window-width conditions (15, 20, 30, 40, 50, and 60 s). Before feature extraction, we extracted consecutive intervals of the biosignals with the same sleep labels to apply the aforementioned conditions considering each biosignal (e.g. 2 min lengths of biosignals with the same sleep level, i.e., 4 intervals with 30 s lengths and the same sleep score level). The sampling frequencies of each of the five different biosignals were also considered in the feature-extraction process. For example, ECG signals were measured at a sampling frequency of 250 Hz. Based on the sampling frequency of the ECG signals (250 Hz), 15,000 samples were sliced into 60 s signal length conditions. The overall feature-extraction process is depicted in Figure 2. Additionally, to validate the diverse features extracted from the 5 biosignals, we extracted 64 features from the signals. Detailed lists of the features are listed in Appendices A and B (additional descriptions of each feature are included in Appendices C and D). Furthermore, 23 combinations of biosignals were used to validate the usability of each signal. The combinations are listed in Table 2. As a result, we evaluated the classification performance from 414 conditions (23 combinations of biosignals × 18 conditions in signal and window lengths = 414 conditions) in this study. To reflect the characteristics of the biosignals as much as possible, samples of all biosignals utilized in this study were normalized to a range of 0 to 1 before feature extraction.

Figure 2.

Example of feature-extraction process (15 s length window and 120 s length ECG signals). ECG: electrocardiogram.

Table 2.

Combinations of five biosignals in the SHHS dataset.

No.	Number of signals	Combination	No.	Number of signals	Combination
1	4	ECG + EEG + EMG + EOGL	13	2	EEG + ECG
2		ECG + EEG + EMG + EOGR	14		EEG + EMG
3	3	ECG + EMG + EOGL	15		EEG + EOGL
4		ECG + EMG + EOGR	16		EEG + EOGR
5		EEG + ECG + EMG	17		EMG + EOGL
6		EEG + ECG + EOGL	18		EMG + EOGR
7		EEG + ECG + EOGR	19	1	ECG
8		EEG + EMG + EOGL	20		EEG
9		EEG + EMG + EOGR	21		EMG
10	2	ECG + EMG	22		EOGL
11		ECG + EOGL	23		EOGR
12		ECG + EOGR

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Machine learning algorithm for validation

Among various machine learning classification algorithms, we selected extreme gradient boosting (XGBoost) classifiers based on previous studies on similar research topics.^18–20 These supervised algorithms can be used for both regression and classification problems. In our cases, we applied the XGBoost algorithm to classify sleep stages. Because these algorithms are ensembles of decision tree models, the classification and regression tree (CART) algorithm is the basis of the XGBoost algorithms. Predicted values from multiple CART algorithms (i.e. the decision tree model) were summarized to calculate the final prediction. The final prediction of the XGBoost algorithm is calculated using the following equation:

\hat{y_{i}} = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(1)where

\hat{y_{i}}

indicates the summed prediction for each decision tree from

x_{i}

f_{k}

denotes the decision tree models with CART algorithms. Based on the predicted value

\hat{y_{i}}

, the objective function of the XGBoost classifier checks the differences between the prediction and target using the loss function. The objective function of the XGBoost classifier is as follows:

O b j = \sum_{i = 1}^{n} l (y_{i}, \hat{y_{i}}) + \sum_{k = 1}^{K} Ω (f_{k})

(2)In Equation (2), l means loss function for comparing the target value

(y_{i})

and prediction value

(\hat{y_{i}})

. In addition, to prevent overfitting of the algorithms, a regularization term for the decision tree model was added in the function. In conclusion, the XGBoost algorithms determine the prediction values from the predicted values of the trained multiple decision tree models.

Evaluation metrics

To evaluate the classification performance of the XGBoost classifiers, we utilized four evaluation metrics (precision, recall, F1-score, and accuracy). To obtain the four aforementioned metrics, we obtained confusion matrices from the trained classifiers. From each confusion maix, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) were calculated. The TP and TN values indicate the ratio of samples correctly classified. FP and FN denote the ratio of the incorrectly classified samples. Finally, we obtain four evaluation indices using the following equations:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

R e c a l l = \frac{T P}{T P + F N}

(4)

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N} .

(6)

Tools

XGBoost classifiers were built, and data preprocessing was performed using Python (version 3.7.1; scikit-learn, version 2.4.1) and R (version 4.0.3).

Results

Classification performances of machine learning classifier

Based on the extracted features, 414 final datasets for the experimental conditions were used to apply the XGBoost classifier model. In each final dataset, an average of 1,200,000 rows was included with features of signal combinations. For example, in the case of a dataset with a 120 s length signal and a 15 s length window from ECG signals, the dimension of the dataset was (1,233,053, 18). Here, 1,233,053 denotes the number of rows, and 18 indicates the number of ECG features in the dataset.

Using the aforementioned 414 datasets for evaluation, we compared the classification performances between the 414 experimental conditions to determine the optimized window and biosignal length for sleep stage identification. Among the tested experimental conditions, classification performance of “EEG + EMG + ECG” with 40 s window and 120 s signal length showed the highest evaluation metric values (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853). Table 3 and Appendices E, F, G, H, and I present the full results.

Table 3.

Averaged classification performances with features from a 40 s length window.

Signal condition (combinations)	Window and signal length : 40 s and 120 s				Window and signal length : 40 s and 90 s				Window and signal length : 40 s and 60 s
Signal condition (combinations)	Precision	Recall	F1-score	Accuracy	Precision	Recall	F1-score	Accuracy	Precision	Recall	F1-score	Accuracy
ECG + EEG + EMG + EOGL	0.850	0.841	0.845	0.849	0.835	0.832	0.835	0.833	0.733	0.735	0.731	0.734
ECG + EEG + EMG + EOGR	0.851	0.843	0.839	0.846	0.835	0.833	0.832	0.833	0.732	0.734	0.730	0.734
ECG + EMG + EOGL	0.848	0.853	0.854	0.853	0.805	0.805	0.802	0.802	0.453	0.471	0.458	0.471
ECG + EMG + EOGR	0.847	0.843	0.840	0.854	0.811	0.816	0.825	0.817	0.459	0.478	0.465	0.478
EEG + ECG + EMG	0 . 853	0.855	0.853	0.853	0.826	0.821	0.811	0.821	0.735	0.732	0.730	0.731
EEG + ECG + EOGL	0.849	0.847	0.844	0.848	0.828	0.828	0.820	0.818	0.684	0.687	0.683	0.687
EEG + ECG + EOGR	0.848	0.845	0.845	0.840	0.832	0.830	0.831	0.824	0.680	0.673	0.671	0.673
EEG + EMG + EOGL	0.852	0.847	0.847	0.843	0.831	0.832	0.835	0.824	0.745	0.740	0.737	0.739
EEG + EMG + EOGR	0.853	0.848	0.851	0.845	0.833	0.832	0.837	0.831	0.741	0.737	0.734	0.736
ECG + EMG	0.446	0.463	0.452	0.463	0.439	0.458	0.445	0.458	0.429	0.450	0.434	0.450
ECG + EOGL	0.463	0.469	0.462	0.462	0.330	0.340	0.333	0.340	0.451	0.463	0.460	0.463
ECG + EOGR	0.463	0.465	0.474	0.466	0.331	0.344	0.334	0.344	0.457	0.461	0.461	0.451
EEG + ECG	0.840	0.839	0.842	0.844	0.829	0.827	0.829	0.826	0.682	0.682	0.679	0.681
EEG + EMG	0.829	0.829	0.831	0.835	0.819	0.823	0.824	0.818	0.740	0.734	0.732	0.734
EEG + EOGL	0.832	0.828	0.823	0.827	0.818	0.822	0.825	0.821	0.683	0.685	0.681	0.684
EEG + EOGR	0.831	0.837	0.836	0.830	0.822	0.820	0.822	0.818	0.673	0.672	0.669	0.671
EMG + EOGL	0.475	0.470	0.482	0.490	0.497	0.503	0.492	0.495	0.426	0.447	0.431	0.447
EMG + EOGR	0.493	0.479	0.485	0.497	0.503	0.504	0.495	0.491	0.432	0.453	0.437	0.453
ECG	0.302	0.308	0.303	0.308	0.302	0.308	0.303	0.308	0.281	0.289	0.282	0.289
EEG	0.814	0.816	0.817	0.822	0.819	0.823	0.820	0.818	0.700	0.703	0.698	0.702
EMG	0.397	0.418	0.403	0.418	0.397	0.421	0.402	0.421	0.401	0.426	0.405	0.426
EOGL	0.455	0.450	0.451	0.449	0.460	0.461	0.452	0.458	0.265	0.278	0.264	0.278
EOGR	0.451	0.453	0.456	0.453	0.461	0.455	0.462	0.470	0.267	0.290	0.246	0.290

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Feature importance of experimental condition with best classification performances

To validate model performance in terms of features important for classification, we examined feature importance considering the trained classifiers. The feature importance considering the model with the highest classification performance (i.e. “EEG + ECG + EMG” condition with 40 s window and 120 s signals) was examined. Ten different sets of results on feature importance were compared because 10-fold cross-validation was used for model training and evaluation. For important features from top 1 to 4 ranks, the same trends were confirmed. Only features extracted from EEG signals (“DELTA,” “Higuchi_Fractal_Dimension,” “Petrosian_Fractal_Dimension,” and “Detrended_Fluctuation_Analysis”) were included in the top four features. Features extracted from other signals (ECG and EMG) showed different rank trends considering importance results. Results considering feature importance from the top 1 to 10 features are detailed in Table 4.

Table 4.

Top 10 feature importance of EEG + ECG + EMG condition with 40 s window and 120 s signals.

1CV		2CV		3CV		4CV		5CV
Feature	F-score	Feature	F-score	Feature	F-score	Feature	F-score	Feature	F-score
DELTA	132	DELTA	134	DELTA	125	DELTA	133	DELTA	136
Higuchi_Fractal_Dimension	67	Higuchi_Fractal_Dimension	76	Higuchi_Fractal_Dimension	75	Higuchi_Fractal_Dimension	71	Higuchi_Fractal_Dimension	63
Petrosian_Fractal_Dimension	59	Petrosian_Fractal_Dimension	66	Petrosian_Fractal_Dimension	56	Petrosian_Fractal_Dimension	55	Petrosian_Fractal_Dimension	59
Detrended_Fluctuation_Analysis	44	Detrended_Fluctuation_Analysis	51	Detrended_Fluctuation_Analysis	42	Detrended_Fluctuation_Analysis	48	Detrended_Fluctuation_Analysis	41
HRV_TINN	39	HRV_TINN	48	HRV_RMSSD	41	WAMP	45	WAMP	39
HRV_MCVNN	38	Hjorth_mobility	41	WAMP	36	HRV_SDNN	23	HRV_TINN	32
Hurst_Exponent	34	WAMP	27	HRV_MeanNN	24	Hurst_Exponent	23	Hjorth_mobility	27
WAMP	26	HRV_MCVNN	26	Hurst_Exponent	23	Hjorth_mobility	23	Hurst_Exponent	20
HRV_SDNN	26	Hurst_Exponent	20	HRV_SDNN	19	HRV_MCVNN	18	HRV_SDNN	16
HRV_MeanNN	25	WL	16	Hjorth_mobility	15	HRV_IQRNN	16	PKF	16
6CV		7CV		8CV		9CV		10CV
Feature	F-score	Feature	F-score	Feature	F-score	Feature	F-score	Feature	F-score
DELTA	134	DELTA	128	DELTA	123	DELTA	133	DELTA	134
Higuchi_Fractal_Dimension	79	Higuchi_Fractal_Dimension	71	Higuchi_Fractal_Dimension	80	Higuchi_Fractal_Dimension	85	Higuchi_Fractal_Dimension	74
Petrosian_Fractal_Dimension	71	Petrosian_Fractal_Dimension	60	Petrosian_Fractal_Dimension	60	Petrosian_Fractal_Dimension	68	Petrosian_Fractal_Dimension	61
Detrended_Fluctuation_Analysis	42	Detrended_Fluctuation_Analysis	49	Detrended_Fluctuation_Analysis	39	Detrended_Fluctuation_Analysis	63	Detrended_Fluctuation_Analysis	54
WAMP	35	HRV_TINN	34	WAMP	29	WAMP	38	Hjorth_mobility	30
Hurst_Exponent	32	WAMP	29	HRV_TINN	21	Hurst_Exponent	34	HRV_TINN	27
Hjorth_mobility	25	HRV_SDNN	25	HRV_MCVNN	21	Hjorth_mobility	28	HRV_SDNN	26
WL	19	WENT	23	Hurst_Exponent	20	HRV_SDNN	23	HRV_MCVNN	19
HRV_SDNN	18	Hjorth_mobility	19	HRV_IQRNN	14	MDF	20	Hurst_Exponent	18
Hjorth_complexity	18	HRV_MCVNN	19	HRV_CVNN	13	HRV_MadNN	20	RMS	18

ECG: electrocardiogram; EEG: electroencephalogram; EMG: electromyogram; EOGL: electrooculogram left; EOGR: electrooculogram right.

Discussion

In this study, we tested the classification performance of XGBoost classifiers under diverse conditions to determine the optimal window and signal conditions for sleep stage classification. Before conducting our research, we attempted to propose reasonable evidence regarding our research topics (i.e. identifying optimal signal and window conditions for sleep stage classification with machine learning algorithms). First, in the case of two keywords related to machine learning and sleep stage classification, Surantha et al.²¹ utilized an extreme learning machine and support vector machine (SVM) to classify sleep stages with diverse class conditions. Aboalayon et al.²² compared five supervised classification machine learning algorithms (decision tree, neural network, k-nearest neighbors, naive Bayes, and SVM) for sleep stage classification tasks.

Second, regarding optimal feature extraction conditions (third keyword), Satapathy et al.²³ validated 12 statistical features from input biosignals to find optimal feature sets in sleep-level identification. The usability of each feature was verified using the accuracy results of the random forest algorithms. Santaji and Desai²⁴ extracted nine EEG features, including time and frequency domain features, to detect rapid eye movement (REM) sleep and non-REM sleep stages. In addition, different conditions for the amplitude and frequency ranges of EEG signals were used in the feature-extraction process. For several EEG features with various conditions, the utility of each feature was compared with the performance of machine learning algorithms. Based on the aforementioned studies, we concluded that our research aims were appropriate.

To construct a research scheme for our study, we considered several previous studies with similar research topics. Şen et al.²⁵ applied five different machine learning classification algorithms (random forest, feed-forward neural network, decision tree, support vector machine, and radial basis function neural network) to identify the sleep levels. Their study consisted of three stages: feature extraction from EEG signals, feature selection, and classification using machine learning algorithms. In the first stage, 41 features in 4 different categories (time, nonlinear, frequency-based, and entropy) were extracted from the EEG signals. Among 41 features, the highest effective features were selected with associated 5 algorithms (“fast correlation based filter,” “mRMR algorithm,” “fisher score algorithm,” “t-test algorithm,” and “ReliefF algorithm”) in the feature selection stage. In the last stage, five machine learning classifiers were used to compare the classification performance in sleep scoring.

Ugi et al.²⁶ proposed a sleep stage classification framework with a machine learning classifier in two classes (awake and sleep). Four phases were included in their research (“segmentation and filtering,” “feature extraction,” “estimation,” and “performance check”). The ECG signal collected from each participant was segmented using 30 s epochs and filtered using a finite impulse response filter at a band frequency of 0.05 ∼ 35 Hz in the first phase. In the second phase (feature-extraction process), three ECG features (mean, variance, and standard deviation) were extracted from each segment signal with 30 s length. The three extracted features were applied to the SVM classifiers for sleep stage classification in the estimation phase. Finally, the classification performance of the optimized SVM models was evaluated using three metrics (accuracy, precision, and recall).

Satapathy and Loganathan²⁷ suggested a classification methodology that uses dual-channel EEG signals for automated sleep staging. Their research was composed of three steps (“feature extraction from EEG signals,” “feature selection,” and “classification with machine learning algorithms”). In the first step, linear and nonlinear features are extracted from the input signals. In the second step, the optimal features were selected from the extracted feature sets using the ReliefF weight algorithm. Random forest classification model was trained and evaluated using a 10-fold cross-validation strategy in the final step.

Similar to previous studies, including those mentioned above, we included several common steps (“feature extraction from biosignals,” “classification with machine learning algorithm,” and “performance evaluation with metrics”) in our research scheme. However, in our research, we focused on validating the influence of window length and biosignals in the feature-extraction process. To compare the effects of window length and biosignal length in feature extraction, 6 window length conditions (15, 20, 30, 40, 50, and 60 s) and 3 biosignal length conditions (60, 90, and 120 s) were used. Additionally, a total of five biosignals (ECG, EEG, EMG, EOGL, and EOGR) and their combinations were utilized to check the optimal combination of biosignals in sleep stage classification. Furthermore, unlike previous studies that compared several machine learning classifiers, only a single machine learning algorithm was used in this research to concentrate on the effects of the signal and window length for classification. Among the diverse set of choices of available machine learning classifiers, based on previous studies, we utilized XGBoost classifiers. Siyuan et al.²⁸ compared three machine learning algorithms (XGBoost, AdaBoost, and SVM) in sleep staging research. In their experimental results, XGBoost classifiers showed better performance (accuracy: 90.6%) than AdaBoost and SVM classifiers, which have been widely applied in related studies. In addition, Choi et al.²⁹ used XGBoost classifiers to develop a framework for detecting extreme drowsiness using short-time segment EEG signals. The authors showed the possibilities of these algorithms for classification with a relatively insufficient biosignal length.

To interpret our experimental results, we compared our results with those of previous studies. First, the condition with EEG + EMG + ECG, 40 s window, and 120 s signal length showed the best classification performance (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853) under all experimental conditions. Choi et al.²⁹ observed similar trends (accuracy: 0.788, sensitivity: 0.788, and specificity: 0.787) in the classification performances of the XGBoost classifiers in similar research settings. Similar to our study, the authors have only applied filtering methods without detailed preprocessing steps. They extracted features in a 2 s window from EEG signals. Additionally, their framework classifies binary classes (extremely drowsy and normal). Hei et al.³⁰ also suggested an XGBoost algorithm-based sleep stage classification framework with similar performance levels (average accuracy: 0.830). They used similar research designs as that in this study. They applied only filtering methods to preprocess the EEG and EOG signals. Furthermore, each feature was calculated using a 30 s window.

Second, in three biosignals (EEG, ECG, and EMG), we compared the relative importance of each biosignal through the performance of other combinations (e.g. EEG + EMG or EEG + ECG) in the same window and signal length. In the case of combinations with two biosignals, the ECG + EMG condition showed precision: 0.446, recall: 0.463, F1-score: 0.452, and accuracy: 0.463, whereas, conditions including EEG (EEG + EMG and EEG + ECG) showed better performance (EEG + EMG showed precision: 0.829, recall: 0.829, F1-score: 0.831, and accuracy: 0.835 / EEG + ECG had precision: 0.840, recall: 0.839, F1-score: 0.842, and accuracy: 0.844) than ECG + EMG conditions. Similarly, the best performance was observed in the EEG condition of a single biosignal (precision: 0.814, recall: 0.816, F1-score: 0.817, and accuracy: 0.822). Bin Heyat et al.³¹ compared the performance of several combinations of ECG, EMG, and EEG signals. The EEG signal conditions exhibited the best classification performance under the experimental conditions.

Finally, in feature importance results, four EEG features (“DELTA,” “Higuchi_Fractal_Dimension,” “Petrosian_Fractal_Dimension,” and “Detrended_Fluctuation_Analysis”) were commonly included in the top 1 to 4. It is associated with trends related to the aforementioned results that EEG signals are most important for classification. Furthermore, the delta wave of the EEG signal is related to sleep.^32,33 These results further support the validity of the present results.

Strengths and limitations

This study has several strengths and limitations. As regards the strengths, the diverse combinations (i.e. 414 experimental conditions) of 5 biosignals and feature extraction conditions (signal and window length) were compared to determine the optimal conditions for sleep stage classification. Second, we validated our results using the feature importance of the trained XGBoost classifiers with the highest classification performances. However, our study also had some limitations. First, we used only one machine learning algorithm (i.e. XGBoost classifier) to investigate the optimal conditions for sleep levels. Although only one algorithm was applied, this algorithm has been widely applied in previous studies and has attained higher performance than that of other algorithms. Second, other latent patterns in biosignals for sleep stage classification can be identified using other data-driven algorithms (e.g. deep learning algorithms). Third, rather than using all six stages, different sleep stage combinations can be applied to find meaningful features for classification (e.g. classify stages between awake and stage 1). We considered that our experimental results can be used as a preliminary data for associated studies. Furthermore, we plan to examine other sleep stage conditions and patterns in future studies.

Conclusion

Accurate sleep stage classification is critical for various fields, including medicine and psychology. In this study, we compared several window lengths and biosignal conditions in feature extraction to determine the optimal combination of biosignals and respective conditions for sleep stage classification using machine learning algorithms. To examine the influence of each condition on the classification performance, 414 experimental conditions, including different biosignal combinations, were applied. We found that EEG, ECG, and EMG combinations with a 40 s length window and 120 s signal length show the best classification performance (precision: 0.853, recall: 0.855, F1-score: 0.853, and accuracy: 0.853) considering all evaluation metrics used in the present research setting. In addition, we found that the importance of EEG features was higher than that of ECG and EMG features based on the present results. Moreover, we validated the importance of EEG features for sleep stage classification by comparing the results with those of previous studies. In conclusion, we confirmed that our results are reasonable in terms of both quantitative (i.e. classification performance) and qualitative aspects (i.e. feature importance). Our research can provide appropriate evidence regarding window and signal length for researchers who want to conduct similar studies. Furthermore, to generalize our experimental results, we will conduct additional analyses in future studies.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076231163783 - Supplemental material for Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification

Supplemental material, sj-docx-1-dhj-10.1177_20552076231163783 for Validation of the influence of biosignals on performance of machine learning algorithms for sleep stage classification by Junggu Choi, Seohyun Kwon, Sohyun Park and Sanghoon Han in Digital Health

Footnotes

Acknowledgements

We would like to thank Editage () for editing and reviewing this manuscript for English language.

Contributorship

JC and SK contributed to the conception, and design of the study. JG, SK, and SP contributed to the acquisition of data. JC, SK, SP, and SH contributed to the analysis and interpretation of the data. JC contributed to the drafting of the manuscript.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

Not applicable.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the The Yonsei Signature Research Cluster Program of 2021 (grant number 2021-22-0005).

ORCID iD

Junggu Choi

Supplemental material

Supplemental material for this article is available online.

Guarantor

SHH.

Appendix

References

Zhang

, et al. An updated of meta-analysis on the relationship between mobile phone addiction and sleep disorder. J Affect Disord 2022; 305: 94–101.

Abad

Guilleminault

Sleep and psychiatry. Dialogues Clin Neurosci 2022; 7: 291–303.

Anderson

Pavese

Longitudinal studies of sleep disturbances in Parkinson’s disease. Curr Neurol Neurosci Rep 2022; 10: 635–655.

Al Maqbali

Al Sinani

Al-Lenjawi

Prevalence of stress, depression, anxiety and sleep disturbance among nurses during the COVID-19 pandemic: A systematic review and meta-analysis. J Psychosom Res 2021; 141: 110343.

Deng

Zhou

Hou

, et al. The prevalence of depressive symptoms, anxiety symptoms and sleep disturbance in higher education students during the COVID-19 pandemic: A systematic review and meta-analysis. Psychiatry Res 2021; 301: 113863.

Ara

Rahman

Hossain

, et al. Identifying the associated risk factors of sleep disturbance during the COVID-19 lockdown in Bangladesh: A web-based survey. Front Psychiatry 2020; 11: 580268.

Development of a sleep diary for chronic pain patients. J Pain Symptom Manage 1991; 6: 65–72.

Currie

Clark

Rimac

, et al. Comprehensive assessment of insomnia in recovering alcoholics using daily sleep diaries and ambulatory monitoring. Alcoholism Clin Exp Res 2003; 27: 1262–1269.

Yong

Fook-Chong

Pavanni

, et al. Case control polysomnographic studies of sleep disorders in Parkinson's disease. PLoS One 2011; 6: e22511.

, et al. Obstructive sleep apnea is highly prevalent in COVID19 related moderate to severe ARDS survivors: Findings of level I polysomnography in a tertiary care hospital. Sleep Med 2022; 91: 226–230.

, et al. Automated sleep scoring system using multi-channel data and machine learning. Comput Biol Med 2022; 146: 105653.

Accurate machine learning-based automated sleep staging using clinical subjects with suspected sleep disorders. In: Emergent converging technologies and biomedical systems. Singapore: Springer. 2021, pp. 363–379.

13.

Wongsirichot

Hanskunatai

A classification of sleep disorders with optimal features using machine learning techniques. J Health Res 2017; 31: 209–217.

, et al. Performance analysis of machine learning algorithms on automated sleep staging feature sets. CAAI Transact Intell Technol 2021; 6: 155–174.

15.

Santaji

Desai

Automatic sleep stage classification with reduced epoch of EEG. Evol Intell 2022; 15: 2239–2246.

, et al. The national sleep research resource: Towards a sleep data commons. J Am Med Inform Assoc 2018; 25: 1351–1358.

, et al. The sleep heart health study: design, rationale, and methods. Sleep 1997; 20: 1077–1085.

, et al. Design and FPGA implementation of an high efficient XGBoost based sleep staging algorithm using single channel EEG. In: International Conference on Cognitive Systems and Signal Processing. Springer. 2018, pp. 294–303.

, et al. Automatic sleep staging based on XGBOOST physiological signals. In Proceedings of the 11th International Conference on Modelling, Identification and Control (ICMIC2019). Springer. 2020, pp. 1095–1106.

, et al. Classifying sleep stages automatically in single-channel against multi-channel EEG: A performance analysis. In: Disruptive technologies for big data and cloud applications. Singapore: Springer. 2022, pp. 527–537.

Sleep stage classification using extreme learning machine and particle swarm optimization for healthcare big data. J Big Data 2021; 8: 1–17.

A comparison of different machine learning algorithms using single channel EEG signal for classifying human sleep stages. In: 2015 Long island systems, applications and technology. IEEE. 2015, pp. 1–6.

, et al. Performance analysis of machine learning algorithms on automated sleep staging feature sets. CAAI Transact Intell Technol 2021; 6: 155–174.

24.

Santaji

Desai

Analysis of EEG signal to classify sleep stages using machine learning. Sleep Vigil 2020; 4: 145–152.

, et al. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. J Med Syst 2014; 38: 1–21.

Electrocardiogram feature selection and performance improvement of sleep stages classification using grid search. Bull Electr Eng Informat 2022; 11: 2033–2043.

27.

Satapathy

Loganathan

A study of human sleep stage classification based on dual channels of EEG signal using machine learning techniques. SN Comput Sci 2021; 2: 1–16.

, et al. Sleep staging prediction model based on XGBoost. In: 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS). IEEE. 2021, pp. 350–353.

29.

Choi

Kim

, et al. XGBoost-based instantaneous drowsiness detection framework using multitaper spectral information of electroencephalography. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. 2018, pp. 111–121.

, et al. Sleep staging classification based on a new parallel fusion method of multiple sources signals. Physiol Meas 2022; 43: 045003.

, et al. A novel hybrid machine learning classification for the detection of bruxism patients using physiological signals. Appl Sci 2020; 10: 7410.

32.

Amzica

Steriade

Electrophysiological correlates of sleep delta waves. Electroencephalogr Clin Neurophysiol 1998; 107: 69–83.

Alpha-delta sleep. Electroencephalogr Clin Neurophysiol 1973; 34: 233–237.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB