Abstract
Most studies of sleep staging in rats use both multichannels electroencephalogram (EEG) and electromyogram (EMG), so it would be convenient and meaningful in some fields if sleep staging in rats could be realized using a single EEG channel. In this study, we used a single bipolar cortical EEG electrode at the frontal–parietal location with a 0.5–30 Hz filter band and a clustering sleep-staging algorithm including seven classification parameters. The agreements between the computer and two independent raters were 96.9 ± 1.1% for Wake, 97.1 ± 1.4% for non-rapid eye movement (NREM) sleep, and 91.4 ± 2.5% for rapid eye movement (REM) sleep, and the overall agreement was 96.7 ± 0.7%. These results indicate that the accuracies of sleep staging remain high even though only a single EEG channel was used and that a system based on this scheme would be suitable for realtime and long-term studies of sleep.
Scoring systems of sleep–wake states are employed widely in various sleep-related studies because of their high performance and reliability. In general, all algorithms used in these systems are based on various electrophysiological properties of different vigilance states.
Nowadays, most studies of sleep staging in rats exploit the characteristics of both electroencephalogram (EEG) and electromyogram (EMG) and discriminate the various stages by comparing with thresholds. 1–4 But for a long time, the application of the visually detected thresholds from one session to all the recordings from the same animal would reduce the accuracy of sleep staging because these characteristics may change with experimental manipulations (drugs, learning, etc.); 5 hence it is necessary to check and adjust periodically to provide reliable thresholds. 3 However, such a procedure would be cumbersome and would interrupt the realtime process. Therefore, a non-threshold-based method is valuable for practice.
Some pharmacological treatments may induce dissociation between the EEG and the behaviour of the animal, so the field of applications of a system relying exclusively on EEG analysis may be restricted, 6 and only a few studies have been solely based on the EEG features up to now. 7–9 Similarly, the field of applications of a system based on EEG and EMG would also be restricted because of the same reason and the chance that the rats would sometimes augment the EMG signal to ‘fool’ the computer into scoring sleep as wakefulness. 2 So each system is problem-oriented and it is important to have a clear purpose for developing a new scoring system. 6
Systems based on the characteristics of EEG and EMG need recording cables consisting of at least four wires. However, variations in recording cable weight and flexibility can have a significant impact on sleep and activity in mice. 10 Hereby, these variations may have an effect on sleep in rats. Most protocols applied would induce sleep disturbances, hence damaging the reliability of the system. 6 Compared with EMG, however, many studies have shown that EEG of rats has distinct statistical characteristics, 11,12 EEG could be acquired with electrodes at precisely defined coordinates, and the contact between an EEG electrode and the rat's skull may be better than that between an EMG electrode and the dorsal neck muscle with time elapses. So a system involving only a single EEG channel for long-term sleep study may be more effective.
In 1994, Karasinski and collaborators implanted electrodes over the right parietal cortex and the cerebellum of a rat as a bipolar cortical EEG electrode, and realized sleep staging by calculating standard deviation, skewness, kurtosis, number of zeros crossing, number of relative maxima and minima. 7 This method is based on cluster analysis instead of threshold and it uses only EEG recording; thus, it is suitable for the long-term study of sleep, especially in studies on circadian rhythmic mechanisms of the sleep–wake cycle. 7 The accuracy of this study was 83.0% for rapid eye movement (REM) sleep, 96.0% for Wake and 97.1% for non-rapid eye movement (NREM) sleep. Compared with other studies, 1,5 the accuracy of REM sleep was lower. Later, Robert et al. 8 developed an algorithm based on artificial neural networks and thresholds for the same data, and the results were 95.31 ± 1.06% for Wake, 97.43 ± 1.66% for NREM and 92.33 ± 2.27% for REM.
In fact, using the same algorithm to analyse data acquired from different electrodes, the system will give different sleep scoring results. Obviously, using different algorithms to analyse the same data from the same electrode, the systems may also give quite different results. The algorithm adopted by Karasinski et al. 7 omitted the low EEG frequencies (0–3.18 Hz), thus about half of the delta (δ) wave in sleep staging was ignored. 6 As the δ and theta (θ) waves are typical waveforms during NREM and REM sleep, respectively, and also considering their contributions in sleep staging, using other electrode locations, a wider filter band and more classification parameters may lead to a better sleep staging result.
The purpose of this study is to optimize a strategy of sleep staging in rats for the long-term study of sleep with a single EEG channel and a non-threshold-based algorithm. Based on previous studies, 7,8 two bipolar cortical EEG electrodes were tested: one was the same as that used by Karasinski et al. and the other was a frontal–parietal bipolar electrode. Two non-threshold-based algorithms were tested: one was the same as that developed by Karasinski et al. and the other was an extended version with a different band-pass filter and two more EEG parameters. In our study, the results of the four cases were compared with visual analysis results of 22 h EEG and EMG recordings of five rats by two independent raters. Based on the differences among the four cases in accuracy of sleep staging, the best one for sleep staging using a single EEG channel was found and is recommended.
Materials and methods
Animals
Five male Sprague–Dawley rats were provided by the Institute of Laboratory Animal Science of Sichuan Province, China. They were housed under controlled temperature conditions (22 ± 1°C) and maintained on a 12:12 light–dark cycle (lights on at 08:00 h). Water and food were available without restriction. Rats weighed 319.8 ± 7.7 g (mean ± SD) at the time of surgery. All procedures used in this work were approved by the Animal Care Committee of the University of Electronic Science and Technology of China.
Surgery
Sterile surgery was performed under deep anaesthesia induced by intraperitoneal pentobarbital sodium (60 mg/kg). Before surgery, the rats were given atropine (1 mg/kg) subcutaneously to decrease the secretion of the respiratory tract.
To fix body position, the rats were placed in a stereotaxic apparatus (Stoelting 516003, Illinois, USA) with a pair of blunt ear bars. Two bipolar cortical EEG electrodes made of miniature stainless-steel screws (φ 0.8 mm) were screwed up for four circles to implant in the skull of each rat at a depth of about 1.3 mm. Figure 1 shows the two bipolar electrodes and 20 s of typical EEG–EMG waves for each of the three sleep–wake stages with different bipolar electrodes. A frontal electrode was implanted 4 mm anterior to the bregma and on the midline. A right parietal electrode was implanted 3.8 mm posterior to the bregma and 2 mm lateral to the midline. Another right parietal electrode was implanted 5 mm posterior to the bregma and 1.5 mm lateral to the midline. A midline electrode above the cerebellum was implanted 11 mm posterior to the bregma (Figure 1).

Electrode placements and 20 s of typical EEG–EMG tracings in different sleep–wake stages for different bipolar electrodes. The horizontal dashed line in bold in the rat head denotes the position of the bregma. EEG = electroencephalogram; EMG = electromyogram; REM = rapid eye movement; NREM = non-rapid eye movement
The two screws used as parietal electrodes must be burnished so as to avoid short circuit. Electrode pair E1 was the one that we proposed. It should be noted that the frontal electrode was on the sagittal suture; hence the surgery had to be performed carefully to avoid injury to the major blood vessels under the midline. Electrode pair E2 was the same as that used by Karasinski et al. To record EMG for visual classification, two Formvar-insulated (except at the part contacting with muscles) nichrome wires were sutured bilaterally into the dorsal neck muscles. All electrode leads made of Formvar-insulated nichrome wires were welded to a connector fixed on the skull of the rat with dental acrylic. The impedances between EEG electrodes and corresponding connectors were measured 7.6 ± 0.5 Ω (mean ± SD). The input impedance of the amplifiers of the recording apparatus (Chengyi, RM6280C; Sichuan, China) used in this study was larger than 100 MΩ.
At the completion of the surgery, all rats were given subcutaneous analgesic (0.05 mg/kg, buprenorphine hydrochloride; Qingyao, Qinghai, China) and intramuscular antibiotic (70,000 U per rat, Benzathine Benzylpenicillin; Huabeizhiyao, Hebei, China). Each rat was housed singly for 14 days to recover before the following experiments. Finally, all rats were euthanized by an overdose of intraperitoneal pentobarbital sodium at the end of the study.
Data acquisition
The recording procedures of this study were as follows: the first two days were used for habituation, the third day (20:00–18:00 h) was used for control and the fourth and fifth days were used for acoustical experiment. The data used for sleep staging in this study were those from the third day. Since husbandries, such as cage cleaning and general health checks, may induce 1.5–2 h changes in behavioural, immunological and other stress indicators and physiological profiles, 13 the husbandries (to clean the cage and replace food and water) in our experiment were implemented once a day at 18:00 h; thus, the recording apparatus recorded continuously from 20:00 to 18:00 h (next day). The amplifier gains were set to 10,000 and 5000 for EEG and EMG, respectively. Band-pass filters were set to 0.16–100 Hz for EEG and 8.3–500 Hz for EMG. The notch filter of the amplifiers was kept on to eliminate possible interferences of 50 Hz. The sample frequency was set to 1000 Hz. The experiments were performed in a noise-attenuated room in which the environmental background noise was 32.2 ± 3.0 dB (mean ± SD) and the other environmental variables (lights, food, water and temperature) were maintained as in the home cage.
Data processing
Before automated sleep staging, for each rat and each bipolar electrode, an expert observer selected representative waves (400 s) for each vigilance state from the recording data. In order to ensure the stability and reliability of sleep staging accuracy on the whole EEG data, the following three steps were adopted: First, about 200 s representative waves were selected during the light phase and another 200 s or so during the dark phase. Second, the half-hour before and after the light was turned on and the half-hour after the light was turned off were excluded from the selection of representative waves. Finally, representative waves during each phase were selected randomly from the remaining 20.5 (20.5 = 22–1.5) h. In addition, the same time windows were adopted in selecting the data representing a vigilance state from both E1 and E2 so that the sleep staging accuracies of the two electrode pairs could be compared equitably.
These waves were first band-pass filtered, downsampled at 512 Hz, divided into epochs of 8 s (n = 4096 samples) and detrended for each epoch. Five or seven features of each vigilance state were calculated with N samples x
i
(a band-pass filter from 3.18 to 25 Hz for the following first 5 features and from 0.5 to 30 Hz for the following whole 7 features):
Standard deviation:
Skewness:
Fisher's kurtosis:
Number of zero crossing (Y
zc
), the number of changes of sign of x
i
Number of relative maxima and minima (Y
mm
), the number of maxima and minima of x
i
Power spectral densities of the δ band (Y
δ, 0.5–6 Hz) and the θ band (Y
θ, 6–10 Hz).
The first five features were the same as those in the study by Karasinski et al.
7
The newly included Y
δ
and Y
θ
were calculated by Welch's method and they were the means of the values of the corresponding frequencies. The resolution of the power spectral density was 0.5 Hz and the boundaries of the EEG frequency bands were based on the result of principal component analysis of EEG in the rat.
14
For each case of calculation of the five and seven features, respectively, the final result for each rat and each bipolar electrode was represented as {Y
pi
s
}, where p ∈ {five or seven different variables}, s ∈ {Wake, REM, NREM}, i = 1 … M
s
, where M
s
was the number of the representative epochs for a vigilance state. For each rat and each bipolar electrode, the means (μ) and standard deviations (SD, σ) of the five or the seven parameters for the selected representative waves of each vigilance state were calculated as
For the raw EEG signals acquired from the two bipolar electrodes, the data preprocessing was the same as that for the representative waves. The data were then divided into epochs of 8 s and epochs were discarded as artefacts when classified as Wake or REM by the two raters, and when their standard deviations were 1.4 times as large as the maximum of standard deviations of the Wake or REM representative waves, respectively. Then the five or the seven parameters (Y
p
) were calculated and the normalized distance between Y
p
and each of the typical vigilances was calculated by equation (6)
Visual scoring by raters (EEG and EMG)
According to the generally recognized criteria, 2 i.e. low-amplitude, high-frequency EEG accompanied by high-level EMG during the Wake stage, high-amplitude EEG associated with low EMG during NREM sleep, and EEG comprised mainly of θ band and accompanied by flat EMG during REM sleep, two independent raters analysed the 22 h EEG and EMG recordings of five rats for each bipolar electrode, respectively. Before visual scoring, raters reviewed the representative waves used in the above algorithms. They were confident about distinguishing subtle classification criteria for different animals and different channels.
To reduce the influence of human subjectivity, the following three steps were taken. First, for each rat, the channels of the signal acquisition system were assigned randomly for the two bipolar electrodes, so that the raters did not know the corresponding anatomical coordinate of a recording when they were engaged in visual scoring or selecting the representative waves. Second, the two raters reviewed the data independently. Finally, each rater classified the data of a given rat from the two channels in two consecutive days and hence improved the reliability.
Computer method 1 (M1, EEG only)
The data preprocessing of the raw EEG signals acquired from E1 and E2 was the same as that for the representative waves with a band-pass filter from 0.5 to 30 Hz. The seven features (Y sd , Y sk , Y ku , Y zc , Y mm , Y δ and Y θ ) were calculated in order to classify the artefact-free epochs to corresponding states using equation (6).
Computer method 2 (M2, EEG only)
The raw EEG signals acquired from E1 and E2 were analysed by the algorithm (M2) developed by Karasinski et al. The differences between M1 and M2 were the filter bands, 0.5–30 Hz for M1 but 3.18–25 Hz for M2; and the number of parameters, seven for M1 but only five for M2.
Difference in the EEG features between E1 and E2
To check the difference in the EEG features between E1 and E2, we calculated the means (ν) and standard deviations (τ) of each parameter in different vigilance states of the five rats for each bipolar electrode. Each parameter from each bipolar electrode of a rat in each vigilance state was obtained from its 400 s representative waves. The difference in the EEG features between E1 and E2 was determined using the statistical test with ‘combination between algorithm and bipolar electrode’ as the variable.
Statistical analyses
Results from the four combinations (M1E1, M1E2, M2E1 and M2E2) were analysed using two-way within-subject analysis of variance (ANOVA) (i.e. 2-way repeated-measures ANOVA) with the factors ‘rater’ and ‘combination’, and both main effects and interactions were examined. To determine the difference in the EEG features between E1 and E2, results of EEG features in each vigilance state were analysed using the paired-samples t-test or the one-way repeated-measures ANOVA with the factor ‘combination’. For significant ANOVAs, data were further analysed for multiple comparisons using Tukey's post hoc test. In both one-way and two-way ANOVAs, the values of epsilon (ϵ) of Greenhouse–Geisser would be denoted when Greenhouse–Geisser correction was necessary. Effect size estimates for t-tests and ANOVAs were determined with Cohen's d and partial η 2, respectively (Cohen's d or partial η 2 = 0.20 is a small effect size, 0.50 is a medium effect size and 0.80 is a large effect size). 15 For each case, the kappa statistical parameter, κ, which estimates the overall agreement beyond chance between the computer and each rater or the two raters, was computed. A significance level of P < 0.05 was used in all comparisons.
Results
In this study, we tested two algorithms (M1 and M2) and two bipolar cortical electrodes (E1 and E2). A further processing step was implemented to eliminate REM sleep epochs appearing in Wake, i.e. an epoch was only identified as REM after three NREM epochs, but not after five Wake epochs. 7 A comparative study was then implemented between the results of the computer and visual scoring (rater). Finally, all the results of five rats (22 h data for each rat) were categorized according to different electrodes and different algorithms. The agreements shown in Tables 1 –4 were calculated over the pooled classification results of the five rats. In Table 5, the agreements between a rater and the computer were first calculated for each rat, and then the means and standard deviations were calculated from the agreements of the five rats. All the data shown in Tables 1 –5 were based on artefact-free epochs, except for the data in Table 4 which were additionally based on consensus epochs for which the two raters gave the same classification. Table 6 shows the differences in EEG properties among the four combinations.
Matrix of concordance between the two raters when the pooled classification results of five rats (22 h data for each rat) were used for each bipolar electrode
Only artefact-free epochs were used in the analysis. NREM = non-rapid eye movement; REM = rapid eye movement
Matrix of concordance between rater 1 and the computer when the pooled classification results of five rats (22 h data for each rat) were used for each combination
Only artefact-free epochs were used in the analysis. NREM = non-rapid eye movement; REM = rapid eye movement
Matrix of concordance between rater 2 and the computer when the pooled classification results of five rats (22 h data for each rat) were used for each combination
Only artefact-free epochs were used in the analysis. NREM = non-rapid eye movement; REM = rapid eye movement
Matrix of concordance between the computer and the raters when the pooled classification results on consensus epochs of five rats (22 h data for each rat) were used for each combination
Only artefact-free and consensus epochs were used in the analysis. NREM = non-rapid eye movement; REM = rapid eye movement
Comparative results of the consistency of performance for different combinations
Only artefact-free epochs were used in the analysis. For each combination and each rater, the means and standard deviations of agreements between the rater and the computer of five rats were calculated for the three vigilance states respectively and globally. Values in the Average column were the means of the corresponding values between the two raters. All values were denoted as means ± SD. NREM = non-rapid eye movement; REM = rapid eye movement
Statistical results of seven parameters in different vigilance states
This table was derived from the data-sets of 400 s representative waves from each bipolar electrode in each vigilance state of each rat. For the first two parameters, Y δ and Y θ , their results in each vigilance state were analysed using the paired-samples t-test. For the other five parameters, their results in each vigilance state were analysed using one-way repeated-measures ANOVA with the factor ‘combination’. The values of epsilon (ϵ) of Greenhouse–Geisser are also denoted in this table when Greenhouse–Geisser correction is necessary. Effect size estimates for t-tests and ANOVAs were determined with Cohen's d and partial η 2, respectively (Cohen's d or partial η 2 = 0.20 is a small effect size, 0.50 is a medium effect size and 0.80 is large effect size). The symbols ‘>’ denote that the means (ν) of the seven parameters given by the combinations at the left side of ‘>’ are significantly larger than those at the right side, and no significant difference exists among the combinations at the same side of ‘>’ for each case. NREM = non-rapid eye movement; REM = rapid eye movement
*P < 0.05
**P < 0.001
†Large effect size
‡Small effect size
¶Medium effect size
Agreement between rater 1 and rater 2
The overall agreements between rater 1 and rater 2 were 97.2% (κ = 0.949, P < 0.001) and 96.2% (κ = 0.932, P < 0.001) for E1 and E2, respectively (Table 1, 22 h data for each rat).
Agreement between rater 1 and the computer
The overall agreements between rater 1 and the computer were 96.9% (κ = 0.945, P < 0.001), 93.0% (κ = 0.875, P < 0.001), 93.8% (κ = 0.888, P < 0.001) and 94.0% (κ = 0.892, P < 0.001) for M1E1, M1E2, M2E1 and M2E2, respectively. For each combination, the agreements for Wake and NREM were much better than those for REM, while REM sleep was generally overestimated by the computer. However, for the four combinations, M1E1 showed the best agreement (Table 2, 22 h data for each rat).
Agreement between rater 2 and the computer
The overall agreements between rater 2 and the computer were 96.4% (κ = 0.935, P < 0.001), 92.8% (κ = 0.873, P < 0.001), 93.7% (κ = 0.886, P < 0.001) and 92.9% (κ = 0.873, P < 0.001) for M1E1, M1E2, M2E1 and M2E2, respectively. The results for Wake and NREM were also much better than those for REM. Again REM sleep was overestimated by the computer, and M1E1 was the best of the four combinations (Table 3, 22 h data for each rat).
Performance of different combinations
Table 4, derived from 22 h data for each rat, shows concordance between the computer and the raters when the pooled classification results of consensus epochs, for which the two raters gave the same classification, were used for each bipolar electrode. The overall agreements between the computer and the raters were 98.1% (κ = 0.966, P < 0.001), 94.7% (κ = 0.905, P < 0.001), 95.1% (κ = 0.911, P < 0.001) and 95.3% (κ = 0.915, P < 0.001) for M1E1, M1E2, M2E1 and M2E2, respectively. The accuracies of Wake and NREM for M1E2, M2E1 and M2E2 were much higher than those of REM, and REM sleep was overestimated by the computer. For M1E1, though the accuracies of Wake and NREM were almost as high as the other combinations, the accuracy of REM was much higher than that of the others. Thus, it is reasonable to conclude that M1E1 is the best of the four.
Table 5 shows the means and the standard deviations of the accuracies of the data from the five rats for each combination, vigilance state and rater. For Wake and REM stages, the means of accuracies from M1E1 were the largest, and the standard deviations of accuracies of M1E1 were the smallest. M2E1 performed a little better than M1E1 did for NREM. However, if applied to all the three vigilance stages, M1E1 is still considered to be the most accurate combination (Table 5).
Statistical results of agreements between raters and the computer for the four combinations
The results of agreements between raters and the computer were analysed by ANOVA: (1) no significant difference was found in accuracies between the two raters (P > 0.05); (2) for Wake (F(3,12) = 6.018, Greenhouse–Geisser ϵ = 0.490; P < 0.05, partial η 2 = 0.601), REM (F(3,12) = 37.015, P < 0.001, partial η 2 = 0.902) and Overall (F(3,12) = 22.398, P < 0.001, partial η 2 = 0.848), the main effects of the factor ‘combination’ were all significant and effect sizes for both REM and Overall were large in contrast to the medium effect size for Wake; (3) for Wake, the accuracy of M1E1 was significantly higher than that of M2E1 and M2E2 (Tukey's test, P < 0.05); (4) for REM, the accuracies of both M1E1 and M1E2 were significantly higher than those of M2E1 and M2E2 (Tukey's test, P < 0.05); and (5) for Overall, the accuracy of M1E1 was significantly higher than that of M1E2, M2E1 and M2E2 (Tukey's test, P < 0.05).
Difference in the EEG features among the four combinations
For each combination, the means (ν) and standard deviations (τ) of each parameter (7 and 5 parameters) in each vigilance state obtained from its 400 s representative waves over the five rats are shown in Figure 2.

Means (ν) and standard deviations (τ) of seven and five parameters of sleep EEG for the three different vigilance states and for the four combinations. These parameters for each combination of a rat in each vigilance state were obtained from its 400 s representative waves. The means (ν) and standard deviations (τ) of each parameter for the five rats were calculated and plotted. The means of Y sk and Y ku were quite low; so they were multiplied by 200 for a better demonstration. The top row denotes the means (ν) of the seven and the five parameters in each vigilance state for M1E1, M1E2, M2E1 and M2E2. The bottom row denotes the standard deviations (τ) of the seven and the five parameters in each vigilance state for M1E1, M1E2, M2E1 and M2E2. Y δ = power spectral density of the δ band; Y θ = power spectral density of the θ band; Y sd = standard deviation; Y sk = skewness; Y ku = kurtosis; Y zc = number of zeros crossing; Y mm = number of relative maxima and minima; EEG = electroencephalogram; REM = rapid eye movement; NREM = non-rapid eye movement
For each vigilance state and each parameter, different combinations gave significantly different results, with the exception of Y δ in NREM (P > 0.05), Y θ in each vigilance state (P > 0.05) and Y sd in REM (F(3,12) = 4.866, Greenhouse–Geisser ϵ = 0.336; P > 0.05) (Table 6). Each case with significant difference had a large effect size, except Y ku in NREM, which had medium effect size.
Discussion
This work aims at improving a sleep staging strategy using a single EEG channel and a non-threshold-based algorithm for the long-term study of sleep in rats. We test two different bipolar electrodes (E1, E2) and two different algorithms (M1, M2), with one combination (M2E2) being the same as that in the study by Karasinski et al. The results illustrate that although the overestimations of REM epochs exist more or less in the four combinations, M1E1 is the best solution when applied to all the three vigilance stages. However, it should be noted that Robert et al. have developed an algorithm based on artificial neural networks and thresholds to the same data of the study 7 by Karasinski et al., and the results are 95.31 ± 1.06% for Wake, 97.43 ± 1.66% for NREM and 92.33 ± 2.27% for REM. 8 This result indicates that the accuracy for REM sleep of the study by Karasinski et al. can be improved admirably by developing sleep-staging algorithm. In our study, the agreements between the computer and two independent raters for M1E1 were 96.9 ± 1.1% for Wake, 97.1 ± 1.4% for NREM and 91.4 ± 2.5% for REM (Table 5). Both the accuracies for NREM and REM of our study were a little lower than those of the study by Robert et al. 8 These works confirm that, on the one hand, even with a single EEG channel, the sleep staging of the rat could be realized at an accuracy comparable to that with both EEG and EMG 1,5 and on the other, the improvement of accuracy could be realized through various strategies. Because the present study focused on sleep staging strategy using a single EEG channel and a non-threshold-based algorithm, the comparisons reported below were mainly restricted to the differences between the scheme by Karasinski et al. and ours because both of them were non-threshold-based methods.
Most agreements, especially those of REM, between rater 1 and rater 2 from E2 were slightly lower than those from E1 (Table 1), and this fact showed that the raw signals from E2 were a little more difficult for visual scoring than those from E1. The discrepancies between the two raters were seen mainly in the classification on transitions, probably due to different individual experiences and the understandings of the criteria for classification. Anyway, the transitions were indeed difficult for the raters to classify, so the minor discrepancies were quite understandable.
In this study, we compared a fronto-parietal location (E1) with the parieto-cerebellar location (E2), which Karasinski et al. 7 selected in EEG acquisition. Our idea was based on the following fact: sleep spindles and slow-wave activity (SWA; EEG power between 0.5 and 4.0 Hz, mainly reflecting the δ waves) are the typical waves during NREM sleep, and the optimized electrode placements for these waves are over the frontal and parietal cortex. 16,17 Furthermore, although interhemispheric sleep EEG asymmetry had not been found in the frontal cortex, 12 the waking EEG in complex behavioural tasks and the NREM sleep EEG after complex behavioural tasks showed significant, substantial power increase in the frontal hemisphere contralateral to the dominant paw. 18 As rats may have different handedness, an electrode on the midline could generally balance the interhemispheric EEG asymmetry caused by handedness and reduce its effect on sleep staging. So a frontal midline point was selected as one site of our electrode pair. To choose the best frontal electrode, our previous study was made with four different frontal sites in 10 rats by using M1, and the results illustrate that the frontal midline point (+4, 0L) is the best. 19 During REM sleep, the EEG is comprised of very regular waves with a dominant frequency of the θ band. 2,6,11 Since θ oscillations originate in the hippocampus and several extrahippocampal regions, 20 and power in the θ band exhibits a right-hemispheric predominance, 12 we selected the other site of our electrode pair above the right hippocampus (−3.8P, −2L).
The original method (M2) set the band-pass filter from 3.18 to 25 Hz; however, such a choice ignores the contribution of the low frequencies (0–3.18 Hz); in other words, it omits about half of the δ band (0.5–6 Hz). 6 As the δ band is one of the typical waveforms during NREM sleep, 2,6,11 the new algorithm (M1) with band-pass filter from 0.5 to 30 Hz would likely improve the accuracies of sleep staging by covering wider EEG frequencies. M2 introduces only five parameters; in M1, however, two more parameters (the power spectral densities of the δ band and the θ band) are considered. In fact, either the δ band or the θ band, its amplitude in one of the vigilance states is significantly different from the amplitude in the other two states, respectively. 2 Therefore, these two parameters may help to discriminate different states of the brain. In fact, it is M1E1 with seven parameters that gives the best result.
Matrices among the computer and the raters in different conditions, and the difference in the EEG features over the four combinations illustrated that the accuracies of sleep staging were improved by the optimizations of the algorithm and the coordinates of the electrode pair (Tables 2 –6 and Figure 2). The most distinct improvement was in the accuracy of REM sleep staging. The agreement between raters and the computer (M2E2) on consensus epochs was 96.0% for Wake, 97.1% for NREM and 83% for REM in the study by Karasinski et al., but the agreement was 97.9% for Wake, 98.8% for NREM, 95.1% for REM and 98.1% for the Overall agreement in our study (Table 4, M1E1). In fact, if we adopted the same electrode and algorithm (M2E2) as Karasinski et al., the agreement between raters and the computer on consensus epochs would be 94.4% for Wake, 98.6% for NREM and 76.9% for REM (Table 4, M2E2), which would be closed to the result reported by Karasinski et al. Such a fact indicates that the comparative studies are objective and creditable.
The results suggest that (1) different algorithms would result in different accuracies of sleep staging; (2) different electrode placement would induce differences in raw data and hence affect the accuracy of sleep staging, with a milder effect than that caused by different algorithms though. So if good accuracy of sleep staging were to be achieved, a suitable algorithm with corresponding optimum electrode locations would be necessary.
EEG has been found to be different at various regions and sleep stages in rats, and the changes of low-frequency EEG differ along the anteroposterior and left–right axes. 12 For example, the δ power spectral densities in both Wake and REM were significantly different between E1 and E2 (Figure 2 and Table 6). In fact, it is now widely accepted that sleep is not only a global process but also has a local use-dependent component that is manifested as regional differences in SWA. 21,22 Oscillations about 3 Hz (SWA) were prominent in the most rostral regions of the anterior midline cortex including the medial prefrontal region (mPFC, under the frontal electrode of E1), and the δ band power was significantly different between the anterior and posterior halves of the cortex (including the hippocampus). 23 Furthermore, the absolute mean power of cerebellar activity is several folds lower than that at the cerebral level; 24 cerebellum (under the cerebellar electrode of E2) as referenced was more like a rest reference at infinity than other cortexes. As EEG data acquired from a bipolar electrode is the difference between two electrodes, and the two parietal electrodes in our study were very close to each other, E1 may have collected more δ information than E2 (Table 6).
As shown in Figure 2, the changes of EEG parameters were quite similar between data acquired from the two bipolar electrodes, and this phenomenon illustrates that EEGs highly correlate with each other between channels. But many significant differences could be found among the four combinations by statistical tests (Table 6), and such differences resulted from different bipolar electrodes (e.g. Y δ in Wake and REM), different algorithms (e.g. Y sd in Wake) and different combinations of bipolar electrodes and algorithms (e.g. Y mm in Wake, NREM and REM). It is these differences that resulted in the different accuracies of sleep staging among the four combinations. In summary, E1 produced more EEG information than E2, and M1 made the most of the data; so M1E1 was the optimized combination for sleep scoring.
Although θ activities have been recorded locally from several extrahippocampal regions, 20 they are generally believed to originate mainly from the hippocampus in rodents. 25 On the one hand, the hippocampal θ rhythm in rats appears with striking regularity when the animals engage in exploratory behaviour, which includes movement, sniffing and orienting, and in REM sleep. 25 Hippocampal θ activities fall into two different categories: movement-related (Type I, with a frequency range of 7–12 Hz), which is observed with walking and rearing; and immobility-related (Type II, with a frequency range of 4–9 Hz), which is associated with grooming, alert immobility and REM sleep. 23,25–27 Cortical θ oscillations in rats are observed during wakefulness and REM, 23,28,29 and these oscillations are also behaviour-dependent during the awake state as the hippocampal θ rhythm (Type I and Type II). 23 So it can be deduced that the hippocampal θ rhythm and the cortical θ rhythm are similar during wakefulness or REM. On the other hand, simultaneous cellular recording in the hippocampus and recordings of cortical and hippocampal field potentials have elucidated the cellular mechanisms of the θ rhythm and proved the relationship between frequency of θ oscillations and firing frequency of the θ cells. 25,30 Furthermore, it has been demonstrated that the cortical θ is closely coupled with the hippocampal θ rhythm. 23,31 In other words, even though there may be some other θ origins besides the hippocampus, the cortical θ and the hippocampal θ are tightly correlative, thus the hippocampal θ rhythm can be represented by the cortical θ rhythm. In the present study, the hippocampal θ rhythm itself could not be recorded directly using the electrodes employed by us, but the cortical θ activity, which was highly relevant to the hippocampal θ rhythm, 23,31 could be recorded by our fronto-parietal electrode.
Because there are similar cortical and hippocampal Type II θ activities in both Wake and REM sleep, 23,25–27 the representative waves of REM sleep are similar to the θ signals in Wake with grooming and alert immobility. Consequently, the epochs in Wake with grooming and alert immobility could be easily mistaken as REM. This is the reason for the overestimation of REM sleep. But because the amplitude of the θ band in REM sleep is significantly different from those in the other two states 2 and because significantly different electrophysiological features of EEG in other frequency bands during different vigilance states exist, the disadvantage about overestimation of REM sleep may be reduced by more accurate selection of the representative wave. In fact, overestimation of REM sleep did not exist in the study by Karasinski et al. 7 We speculate it may be that they used a different method to select the representative wave. They selected the most typical representative waves from the results calculated by the five parameters while we selected them from raw EEG and EMG recordings. Although these overestimations exist in the four combinations, the overestimation in M1E1 is very low (Table 4). So M1E1 is suitable for the long-term study of sleep because of its non-threshold-based method and single use of EEG recording. 7
On the one hand, however, it should be noted that the present study was based only on healthy normal animals under baseline conditions without any manipulations or pharmacological treatments. As pharmacological treatments may induce dissociation between the EEG and the behaviour of the animal, 6 and the properties of an unhealthy rat EEG may be different from a healthy rat; so the classification accuracies of sleep staging will decline when the method M1E1 is used under the conditions different from those in this study, such as unhealthy rats, manipulations and pharmacological treatments. In general, the current clustering method with data from E1 (M1E1) is suitable for the long-term study of sleep, especially in studies on circadian rhythmic mechanisms of the sleep–wake cycle under baseline conditions without manipulations or pharmacological treatments. On the other hand, the results reported above were only from 22 h recordings instead of the whole 24 h recordings, i.e. the 2 h data from the light–dark change was omitted in this study. Since there are many transitions during the light–dark change, and the epochs of transitions are difficult to classify correctly, exclusion of the 2 h data would also affect the evaluation of various methods (M1E1 and M2E2), and the classification accuracies of sleep staging for M1E1 would decline when this method is used for 24 h recordings, especially under drug or behavioural manipulations in which the number of transitions increases dramatically.
In summary, comparative studies were carried out across two different non-threshold-based algorithms using EEG only and two different sets of coordinates of electrodes. Our results indicate that the accuracies of sleep staging, especially for REM sleep, could be improved by increasing classification parameters, optimizing the band-pass filters and the coordinates of an electrode pair for a single EEG channel. These results also show that different algorithms and different electrode placements will lead to significantly different accuracies of sleep scoring. To obtain good accuracy of sleep staging, we need not only a robust algorithm on typical features, but also an optimum electrode placement to get crucial physiological information related to sleep. Our results show that the new method (M1E1) can be realized easily and is suitable for the long-term study of sleep.
Footnotes
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Nos. 60736029, 30870655, 30525030) and the 863 project 2009AA02Z301.
