Abstract
Background and Purpose
Electroencephalography (EEG) is a popular non-invasive method for studying brain dynamics because of its excellent temporal resolution. However, the non-stationarity, intersubject variability and class imbalance of EEG data, make it difficult to automatically discriminate between brain states that correspond to various cognitive or sensory circumstances. With the use of a deep hybrid convolutional neural networks (CNNs) and bidirectional long short-term memory (BiLSTM) networks with an attention architecture intended to improve discriminative learning from wavelet-based time–frequency features, this study attempts to categorise EEG recordings into discrete brain states recorded prior to and during auditory (mantra) stimulation.
Methods
Experienced practitioners (mean age: 37 ± 6 years; mean practice: 5 years) had their EEG data recorded in two different experimental settings: (a) When they were at rest before the auditory stimulus and (b) while they were listening to mantras. Each segment’s time–frequency representations were produced using wavelet transforms and fed into a hybrid model that combined convolutional, recurrent and attention layers. To guarantee steady convergence, adaptive learning rate scheduling and early stopping were used in the model optimisation process.
Results
With CNN (76.92%), long short-term memory (LSTM) (75.30%), CNN+LSTM (84.62%) and CNN+BiLSTM (88.65%), baseline models performed moderately. The suggested CNN–BiLSTM–Attention model achieved an independent test accuracy of 99.46%, greatly outperforming all baselines. High discriminative capability was confirmed by the receiver operating characteristic (ROC) analysis, which produced an AUC near 1.0.
Conclusion
The inclusion of convolutional, recurrent and attention methods greatly improves spatial–temporal feature learning, as demonstrated by the suggested framework’s ability to distinguish between resting and during mantra EEG states. These results demonstrate the model’s resilience and possible use in neurophysiological monitoring and real-time cognitive state detection.
Introduction
A popular method for tracking brain activity is electroen-cephalography (EEG), which is non-invasive, portable and has a high temporal resolution. 1 Clinical diagnosis, emotional computing, brain–computer interfaces (BCIs) and cognitive neuroscience have all found use for it.2, 3 Nevertheless, accurate classification when the data is very less, is a difficult issue since EEG signals are intrinsically non-stationary, extremely sensitive to noise and show substantial inter-subject variability. 4 Conventional EEG classification methods use classifiers such as support vector machine (SVM) and k-nearest neighbors (k-NN) after manually constructed features, including power spectral density, entropy and wavelet coefficients. 5
Although these techniques work effectively under controlled circumstances, their limited capacity to capture intricate spatial–temporal patterns frequently results in poor generalisation. Additionally, they perform poorly when the dataset is unbalanced. 6 In EEG analysis, deep learning techniques have demonstrated great promise 7 ; however, they require huge amounts of data. While recurrent neural networks (RNNs), especially long short-term memory (LSTM) and its bidirectional variation (BiLSTM), capture temporal dynamics of brain activity, convolutional neural networks (CNNs) are good at simulating spatial correlations across electrode channels.8, 9
According to recent research,10, 11 hybrid CNN–RNN architectures can improve categorisation by utilising both temporal and spatial representations. EEG-based models have been further improved by attention processes, which adaptively highlight important temporal or spectral aspects. 12 Wavelet transform has also demonstrated efficacy in breaking down non-stationary EEG into useful time–frequency representations, frequently surpassing Fourier-based methods in clinical and cognitive EEG tasks. 13 In addition to conventional EEG classification, an increasing amount of research has examined how aural stimuli, especially mantras and meditation sounds, affect brain activity.
Research on the sacred phrase ‘OM’ has revealed deep relaxation states, improved brain synchronisation and higher alpha and theta oscillations.14–16 These results imply that unique spatial–temporal patterns in EEG data captured during mantra-based auditory stimulation can be used for classification. This article presents a hybrid CNN–BiLSTM–Attention model for wavelet-based feature-based EEG categorisation. This method enables robust feature learning across spatial, temporal and discriminative dimensions by integrating convolutional, recurrent and attention mechanisms, in contrast to previous efforts. The model was evaluated on EEG signals recorded before and during mantra stimulus conditions, achieving a test accuracy of 99.46%, significantly outperforming baseline models and highlighting the potential of deep hybrid frameworks for meditation-related EEG research.
There are still several modelling research gaps in EEG-based deep learning, despite significant advancements. The integrated modelling of spatial and temporal relationships is limited by the fact that much current research only use CNN or RNN architectures. Moreover, EEG classification frameworks rarely incorporate attention mechanisms, which can improve discriminative feature selection. Furthermore, wavelet-based representations are often neglected in hybrid deep networks, especially for paradigms related to cognition or meditation, such as mantra listening, despite their ability to manage non-stationary EEG inputs. The creation of an architecture that can successfully combine spatial, temporal and discriminative data for reliable EEG classification in the face of sparse and unbalanced data is motivated by these gaps. The following are the main contributions of this study:
A deep hybrid CNN–BiLSTM–Attention model is proposed to simultaneously extract spatial, temporal and discriminative representations from EEG signals. Wavelet-based time–frequency features are employed as model inputs to enhance interpretability and effectively handle signal non-stationarity. On mantra-related EEG data, the proposed model outperforms baseline architectures such as CNN, LSTM, CNN+LSTM and CNN+BiLSTM in classification performance. The study provides empirical evidence that attention-enhanced hybrid architectures can improve the decoding of brain states during auditory meditation tasks.
This is how the rest of the article is structured: The literature survey is described in Section 2. Section 3 describes the training methodology and the CNN–BiLSTM–Attention architecture suggested. Comparative studies, including baseline models and experimental results, are presented in Section 4. Section 6 wraps up with important findings and suggestions for further research, while Section 5 includes discussion and comparative studies.
Literature Survey
Handcrafted characteristics such as power spectral density and entropy were used in early EEG classification experiments in conjunction with machine learning classifiers.17, 18 Despite their effectiveness, the performance of these methods was limited in difficult tasks because they were unable to capture higher-level representations. CNNs have shown success in capturing spatial interdependence across EEG channels since the development of deep learning. 19 However, LSTM and BiLSTM networks are prone to disappearing gradients in lengthy sequences, making them ideal for modelling temporal dependencies. 20
To increase classification accuracy in EEG tasks, hybrid CNN–RNN architectures that incorporate spatial and temporal learning have been developed. By concentrating on the most pertinent EEG features, attention mechanisms have been added more recently to improve discriminative power. 21 Wavelet transforms offer an effective method for time–frequency decomposition because of the non-stationary character of EEG. Wavelet-based representations are more popular than Fourier approaches in classification, according to a number of studies.22–25
Despite these developments, there has not been much research done on combining wavelet-transformed EEG characteristics with CNN, BiLSTM, and attention. Simultaneously, there has been an increase in curiosity about how auditory stimuli, especially mantras and meditative sounds, affect brain dynamics. According to neurophysiological research26, 27, reciting the sacred syllable ‘OM’ causes notable alterations in EEG activity, such as an increase in alpha and theta power and improved cortical area synchronisation. Reduced stress reactions, attentional modulation and relaxation are all associated with these oscillatory alterations.
Furthermore, the significance of limbic and temporal lobe structures in producing altered states of consciousness has been emphasised by EEG investigations on auditory stimuli processing and mantra meditation. 28 EEG recordings made during auditory or contemplative exercises have also been subjected to machine learning and deep learning techniques. 27 For instance, it was shown that CNN-based models could accurately distinguish EEG states brought on by listening to music. 29
Research 30 demonstrated that RNNs could be used to consistently discriminate between different brain responses to mantra repetition. These results support deep neural architectures’ capacity to reveal minute neuronal patterns connected to auditory stimuli. However, the existing literature remains fragmented: Traditional handcrafted methods struggle to generalise, while deep learning models often overlook the combined importance of spatial, temporal and discriminative feature extraction. Additionally, very few studies have examined EEG categorisation under thoughtful audio stimuli using advanced hybrid deep learning.
However, aberrations from muscle activity, eye blinks and other physiological or ambient noise sources frequently taint EEG recordings. Classification accuracy is negatively impacted by these aberrations, which alter neuronal signals. 31 Therefore, removing artefacts effectively is an essential pre-processing step. Although conventional techniques such as adaptive filtering and independent component analysis (ICA) are frequently employed, more sophisticated strategies have recently been created. While deep CNNs have been successfully used for automated identification and removal of different EEG artefacts, wavelet-based filtering in conjunction with meta-heuristic optimisation has been demonstrated to efficiently suppress muscular artefacts. 32 These developments emphasise the significance of strong denoising pipelines, especially in EEG investigations pertaining to cognition and meditation where minute oscillatory variations need to be maintained.
EEG analysis continues to benefit greatly from both deep learning and conventional machine learning techniques. Algorithms such as support vector machines, random forests and k-nearest neighbours have been widely employed using handcrafted spectral, temporal and entropy-based features. By using sophisticated signal decomposition methods such as variational mode decomposition, which facilitates the effective extraction of frequency-domain information for classification tasks, recent studies33, 34 have improved these frameworks.
By suggesting a hybrid CNN–BiLSTM–Attention architecture trained on wavelet-transformed EEG data, the present study fills this crucial gap. The suggested model provides a strong foundation for EEG analysis in meditation research by integrating spatial, temporal and attention-based mechanisms to attain state-of-the-art performance in differentiating EEG states before to and during mantra-based aural stimuli.
Methods
Data Acquisition
Eleven experienced mantra practitioners provided EEG recordings. Every participant was female, in the age range of 37 ± 6 years and had an average of 5 years of practice. These people were selected from a group of people who chanted mantras. To guarantee a healthy group of participants, people were screened before participating to eliminate those with any documented history of neurological or psychiatric conditions. Informed consent was obtained before beginning any experiment. The Emotiv Flex 32-channel EEG device (Emotiv Inc., USA, 128 Hz sampling rate, based on the 10–20 system) was used to capture the data. The electrode impedance was kept at < 10 kΩ during the recording process, which took place in a silent room. There were two different conditions in the experiment: Before the stimulus (the resting state), participants were instructed to sit comfortably, close their eyes and feel at ease. The EEG was recorded for one minute. EEG recordings were made for two minutes while the participants listened to the mantra ‘Om’ using an audio device during the stimulus (mantra chanting state), which came after the resting state. For each condition, each subject had three trials. A multi-channel EEG matrix (channels × samples) was included in each file (all signals were sampled at 128 Hz and saved in mat format).
Data Pre-processing
The pre-processing pipeline was created to convert unstructured EEG data into deep learning-ready representations. Before feature extraction and classification, the unprocessed EEG data underwent a systematic preprocessing workflow. The actions listed below were taken:
where xbp(t) is the band-pass filtered signal, x(t) is the raw EEG signal, am and bk are the infinite impulse response (IIR) filter coefficients, N and M are the filter orders. where H(z) is the notch filter transfer function, f0 = 50 Hz is the notch frequency, fs = 128 Hz is the sampling rate and r = 0.95 is the pole radius controlling band-width. where Components corresponding to ocular and muscle artefacts were visually identified and removed. where xi(t) denotes the ith EEG epoch, Te is the epoch duration (window length) and Ts is the step size. where xcorr(t) is the baseline-corrected signal and μ is the mean amplitude of the segment. where
Segmentation into Fixed Windows
Since EEG is a continuous signal, it was divided into short temporal segments to capture non-stationary dynamics. A window length of 5 seconds (640 samples) was used. The choice of a 5-second window was based on previous studies.35, 36 Non-overlapping windows were extracted, producing multiple trials per recording. Each segment retained the full set of EEG channels.
Wavelet Decomposition
The Daubechies-4 (db4) wavelet with five levels of decomposition was used in the discrete wavelet transform (DWT) to extract time–frequency information. Previous EEG research37, 38 has frequently adopted db4 due to its capacity to imitate EEG waveform morphology and maintain energy distribution across frequency bands. To capture oscillations at increasingly smaller frequency bands, each channel segment was broken down into five detail coefficients and one approximation coefficient. For consistency, all wavelet coefficients were zero-padded to a constant length because they differ in length between levels. Each segment’s outcome was a 3D tensor of shape: (Channels, padded_length, levels+1). The overall system flow is described in Figure 1.
Overview of the Proposed Pipeline Used in the Present Study.
Data Storage and Indexing
A NumPy.npy file was used to store each wavelet-decomposed window. A structured comma-separated values (CSV) index file contained the following metadata: Condition, subject, trial, channel count, decomposition levels and window length. Later, this index was employed to effectively load data for model training.
Data Preparation
These wavelet-transformed EEG data were arranged into structured numpy arrays with dimensions (32 × 6 × 323), which correspond to 323 time steps, 32 channels and 6 frequency sub-bands, respectively. The signals under two experimental conditions (before and during) were indexed by the metadata. Using stratified sampling to maintain class proportions, the dataset was divided into 70% training, 15% validation and 15% testing to ensure robust evaluation. After splitting, the dataset comprised 862 training samples, 185 validation samples and 185 test samples.
Handling Class Imbalance
There was a moderate imbalance in the dataset, with 64.53% samples in the during condition and 35.47% in the before condition. To avoid model bias in favour of the majority class, class weights were computed using the compute_class_weight function. The resulting weights were: before: 1.408 and during: 0.775. These weights were added during training to increase the penalty for incorrectly classifying the minority group.
Hybrid CNN–BiLSTM–Attention Model
A model that combines CNN, attention layer and BiL-STM was created to capture both temporal and spatial relationships in EEG signals. The architecture consists of the following modules:
Training Strategy
The model was trained with an optimiser (Adam) with an initial learning rate of 1 × 10−3, sparse categorical cross-entropy loss and a batch size of 16. To improve the convergence and prevent overfitting, early stopping was employed that monitored validation loss with a patience of 8 epochs and ReduceLROnPlateau halved the learning rate after 4 stagnant epochs and Dropout layers (rate = 0.3) were used after CNN and BiLSTM layers.
Evaluation Metrics
Based on the independent test set, the accuracy, confusion matrix and receiver operating characteristic-area under the curve (ROC-AUC) curve(see Figure 3) were used to assess the performance of the suggested model. Plots of accuracy and loss from training histories were also examined to track convergence and generalisation effectiveness. Additionally, the baseline models in Table 1 were used to ensure a fair comparison. It should be mentioned that the training circumstances for each model were identical.
Performance Comparison of Baseline Models with Classification Metrics.
Results
The suggested CNN–BiLSTM–Attention model was trained using adaptive learning rate scheduling and early stopping for a maximum of 20 epochs (see Figure 2 for the model’s architecture). Figure 3 shows the performance of training and validation throughout epochs. By the fifth epoch, the model had rapidly converged and shown constant improvement. The validation loss decreased significantly from 1.0424 to 0.0949 during the first epochs, while the validation accuracy increased from 64.32% to 96.76% by the fourth epoch. The network reached near-saturation performance starting in epoch 5, with little volatility in validation accuracy above 97%. The learning rate reduction, which was started at epoch 14, greatly stabilised the training and guaranteed smooth convergence. The model had a validation loss of 0.0479 and a validation accuracy of 99.46% at the conclusion of training. The model’s accuracy of
Overview of the Proposed Model Architecture.
Performance Evaluation of the Proposed Hybrid CNN–BiLSTM–Attention Model.
A number of baseline tests were carried out to determine reference performance before to assessing the suggested CNN–BiLSTM–Attention architecture with class imbalance handling. The baseline models’ test accuracies are compiled in Table 1. The comparative study of the suggested model with the baseline and other hybrid models is summarized in Table 1, and the comparison of the suggested work with recent literature is summarized in Table 2.
Summary of Contemporary Work Related to the Proposed Work Where OM is Used as Stimuli.
The traditional CNN and LSTM models performed rather well, with accuracies of 72.00%. When CNN was integrated with recurrent layers (CNN+LSTM and CNN+BiLSTM), performance increased to 79.00% and 83.00%, respectively. However, with a test accuracy of 99.46%, the suggested CNN–BiLSTM–Attention model performed noticeably better than all baselines. Even in situations when there is a class imbalance, this significant improvement demonstrates how well the hybrid architecture captures spatial, temporal and discriminative representations of EEG signals. The findings show that whereas CNN and LSTM performed rather well separately, when combined to construct a hybrid CNN+BiLSTM, classification accuracy increased to 79.00%. Although class imbalance remained a limiting problem, the CNN+BiLSTM+Attention model’s performance was further enhanced to 83.00% with the addition of an attention mechanism.
Discussion
This section showcases the comparative analysis that emphasises how important it is to combine attention-based, recurrent and convolutional mechanisms for reliable EEG categorisation (Please refer Table 2). The CNN–BiLSTM–Attention hybrid model outperformed with class imbalance treatment, obtaining a test accuracy of 99.46% as opposed to 72.00% for CNN, 71.00% for LSTM, 79.00% for CNN+LSTM and 83.00% for CNN+BiLSTM. Two key elements are responsible for this significant improvement: (a) The synergistic combination of discriminative focusing (attention), temporal modelling (BiLSTM) and spatial feature extraction (CNN); and (b) the explicit treatment of class imbalance, which guaranteed stable learning across underrepresented conditions. Recent developments in deep learning for EEG analysis align with these results. Recurrent models, especially LSTM and BiLSTM, have shown promise in capturing temporal dependencies of brain activity, 30 whereas CNN-based architectures, such as EEGNet, have shown strong generalisation in motor imagery and event-related potential (ERP) research. 39 By simultaneously simulating spatial and temporal dynamics, hybrid CNN–RNN models have been shown to improve performance. 40 The significance of selectively weighting salient features has been further underscored by the integration of attention mechanisms in EEG meditation stage classification. 41 The current findings support this line of evidence by demonstrating that significant gains in discriminative power can be achieved by applying attention on top of CNN–BiLSTM networks. The application of wavelet-based time–frequency characteristics is a further vital component. Since EEG signals are by nature non-stationary, brief oscillatory fluctuations are frequently missed by Fourier-based spectrum approaches. Wavelet transformations have been proven to give higher time–frequency resolution, boosting classification accuracy across a number of cognitive and clinical EEG applications. 42 The impressive outcomes presented support the usefulness of wavelet-based pre-processing for representing EEG features. Additional relevance is added by the setting of mantra-based auditory stimulation. Chanting the phrase ‘OM’ has been shown in previous neurophysiological investigations to raise alpha and theta power, improve cortical synchronisation, and modify limbic and prefrontal circuits.43, 44 The current results, on the other hand, show that deep hybrid models with wavelet features can achieve nearly flawless classification, suggesting that sophisticated neural fingerprints of states caused by mantras can be efficiently captured by contemporary architectures. All things considered, the data points to the constant and discriminative brain responses that aural stimuli such as OM chanting produce, making them ideal for deep learning-based classification. The findings advance the methodological advancement of EEG analysis as well as the neuroscientific comprehension of the brain dynamics associated with mantra. Despite the excellent classification accuracy shown in this work, some uncertainties and restrictions should be noted. Despite pre-processing and artefact rejection, EEG recordings are naturally susceptible to noise, motion artefacts and inter-subject variability, which could cause changes in the derived features. Similar to this, the deep learning models employed here rely on stochastic optimisation, hyperparameter tweaking and parameter initialisation, all of which can affect generalisation and convergence. More extensive validation across a range of demographics and stimuli is required because the study was restricted to a moderate sample size and a particular experimental paradigm utilising OM chanting. In the future, to improve robustness and interpretability, future research should examine cross-subject transfer learning, multimodal fusion (such as EEG–fNIRS or EEG–galvanic skin response (GSR)) and adaptive and explainable architectures. Sensitivity analysis could be used to further quantify the effects of noise and parameter modifications, strengthening the case for the validity of EEG-based classification of meditation states.
Ablation Study Observations
Table 3 summarizes the ablation study of the proposed model. The internal ablation study offers quantifiable proof of the essential components of the suggested hybrid architecture’s functional need. Most significantly, the model’s performance drastically declined when the Attention module was removed, with the Matthews Correlation Coefficient (MCC) falling from 98.82% to 41.90% and the test accuracy falling from 0.9946 (full model) to 0.7459. This result demonstrates that to efficiently aggregate features across the BiLSTM output sequence and attain high discriminative power, the selective temporal weighting strategy is essential. The study found that the dilated convolutions component is not necessary, even though the full model exhibits near-perfect classification performance (accuracy ∼ 0.995) with the fastest training time (777.92 s, excluding the ’No Spatial Mixing’ variant). Removing it resulted in a slight decrease in accuracy (to 0.9892), but an unexpectedly large increase in training duration (to 1013.00 s). This shows that standard convolutional kernel connectivity may be marginally superior or that the specific dilation rate utilised did not boost feature extraction for this wavelet representation. The results collectively demonstrate that the temporal sequence-to-feature mapping controlled by the attention mechanism and the synergistically combined CNN feature extraction are essential structural prerequisites for optimising the model’s capacity to distinguish between the various EEG states recorded in this particular dataset.
Ablation Study Results Summary.
Conclusion
Using wavelet-based features, a CNN–BiLSTM–Attention hybrid model was created for EEG classification in the presence of mantra stimuli. With a test accuracy of 99.46%, the framework significantly outperformed the hybrid, LSTM and traditional CNN baselines. Attaining state-of-the-art performance required the explicit treatment of class imbalance in conjunction with the integration of spatial, temporal and attention-based methods. The results demonstrate that OM chanting and associated mantra practices result in consistent, objectively quantifiable changes in EEG dynamics from a neurocognitive standpoint. From a computational standpoint, the study shows that a strong and broadly applicable method for non-stationary EEG signals may be obtained by fusing deep hybrid architectures with wavelet-based time–frequency analysis. This approach should be expanded in future studies to include real-time BCI applications, cross-subject generalisation investigations and other meditation modalities. The findings suggest that deep learning techniques can significantly advance applied cognitive monitoring as well as EEG-based meditation research.
Footnotes
Acknowledgment
The final version has been authorised by all of the authors. The authors thank the Department of Information Technology for their support in conducting research.
Authors’ Contributions
Tony Bayan: Formal analysis, conceptualisation, writing–original draft, data curation, editing, analysis.
Daisy Das: Formal analysis, conceptualisation, writing—original draft, data curation, editing.
Nabamita Deb: Conceptualisation, editing, supervision.
Statement of Ethics
The Gauhati University Ethics Committee approved the research presented in the article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Data Availability
Data may be requested to the corresponding author.
Patient Consent
All the subjects signed informed consent before recording EEG.
