Sage Journals: Discover world-class research

Abstract

The disparity in the data from intensive care units, where stroke victims and heart attack patients make up a minority, makes this effort extremely difficult. A well-known difficulty in data mining is handling unbalanced data. The main contribution of this work is a method that accurately identifies and categorises minority-class data, even in highly imbalanced datasets with small class sizes. This work predicts stroke from the balanced and compressed data from MIMIC III dataset. The Convolutional Neural Network-Gated Recurrent Unit with Imbalanced Data Handling (CNN-GRU-IDH) is proposed. Additionally, it reduces the amount of data transferred by compressing healthcare data using the Lempel Ziv Markov Chain Algorithm (LZMA). Class imbalance problems are addressed with the Synthetic Minority Over-sampling Technique (SMOTE). Notably, this study adds a novel element by employing the Improved Multi-Objective Wolf Pack Algorithm (IMOWPA) to choose the appropriate K nearest neighbour value for SMOTE. The suggested model surpasses existing models when used on the dataset, obtaining a remarkable accuracy rate of 87.66% and 85.63% of F1 score for 70% of training and 30% of testing data. The CNN-GRU-IDH approach, which tries to forecast the incidence of strokes, is used as the major data classification technique. This study makes a substantial advancement to improving patient-specific early stroke prediction, which might save lives and lower death rates.

Graphical abstract

This is a visual representation of the abstract.

Keywords

Convolutional Neural Network-Gated Recurrent Unit SMOTE stroke prediction Improved Multi-Objective Wolf Pack Algorithm imbalanced data

Introduction

Ischaemic stroke, another name for a stroke, is a localised softening or necrosis of brain tissue brought on by insufficient cerebral blood flow, which causes ischaemia and hypoxia, hemiplegia, verbal impairment, mental coma, etc.^1,2 On the other hand, myocardial infarction also known as a heart attack that arises because of deficient blood flow or blockages interrupting the blood flow to or from the heart muscle and other organs.^3,4 An alarmingly high mortality rate of up to 54% is experienced in stroke complicated by a heart attack, which is defined as cerebral infarction occurring concurrently with a heart attack.⁵ Acute left heart failure, cardiogenic shock and ventricular arrhythmia are the three main causes of mortality in these situations.^1,4,6 This delay might be crucial and may cause abrupt fatalities in heart attack patients given the quick onset of heart attacks. Through the examination of medical markers besides troponin, this study seeks to predict heart attacks in stroke patients. Troponin is a heart muscle protein that regulates cardiac contraction. Upon impairment of heart, troponin levels rise suddenly in the arteries. Therefore, it becomes a key attribute for diagnosing myocardial infarction. The goal is to make it possible for stroke patients who are having a heart attack to receive rapid treatment. Key medical indicators such as SpO₂ and heart rate help to evaluate oxygen distribution and cardiac capacity and are vital for detecting cardiovascular distress. Glucose and creatinine levels provide insight into metabolic and renal health which are strongly influence the risk of heart disease. Respiratory rate and SBP (systolic blood pressure) are the additional indicators for cardiopulmonary function and identifying hypertension. Based on these characteristics, a data processing technique has been created to foretell heart attacks in stroke patients.⁷

Big data, the Internet of Things (IoT), and artificial intelligence (AI) are advancing quickly, opening up intriguing possibilities for applications that will alter industrial services. These potentials are accompanied by a number of significant problems, such as worries about latency, dependability and efficiently processing the enormous volumes of data created by IoT devices while maintaining a good level of service. Deep learning and active learning enable the interactive systems like autonomous vehicles, face recognition and expression analysis in various applications, speech recognition using natural language processing approaches, and also can be integrated to optimise in deployment, etc.^6,8 Notably, deep learning (DL) has shown to be essential for understanding how diseases spread, particularly in light of the present global health crisis.¹ Healthcare systems using AI enables incessant monitoring, automatic anomaly detection, timely alerts is possible by analysing real-time data collected from sensors and electronic records and critical patterns generates from the data using AI algorithms such as machine learning (ML) and deep learning and its latest evolutions.This helps to predict the required outcomes and shows that intelligent systems transform healthcare applications from a reactive approach to a proactive rely on the principles of AI algorithms that enhances the well-being of the patient and causes to get effective outcomes.

Data mining and ML are essential tools for analysing and making predictions from large medical datasets in the contemporary healthcare scenario. The development of diverse prediction models, particularly for illness diagnosis, is the main focus. Due to the large number of elements contained in medical data, key characteristics are frequently carefully chosen to ensure reliable illness diagnosis. Consequently, it is essential to thoroughly study these important elements.⁹ Furthermore, it is crucial to address the problem of managing minority cases. Clinical experience suggests that stroke patients who have experienced heart attacks are far less prevalent than stroke patients who have not, leading to an unbalanced dataset. For instance, there are around 30 times more stroke patients without heart attacks than instances with both illnesses (82 cases) in the Medical Information Mart for Intensive Care III (MIMIC III) database.^10,11 Traditional ML techniques frequently prioritise characteristics from samples of majority data, disregarding minority data and frequently provide experimental findings with high acc but low P.¹⁰ The data distribution for feature extraction in earlier classification algorithms is not successfully balanced by using conventional oversampling and under-sampling procedures. This research presents unique integrated approaches to balance the unbalanced data (general unevenness) distribution and improve classifier performance when used on such data in order to solve this. The objective is to address the issues posed by imbalanced (class-level inequality) datasets involving stroke patients, where traditional ML approaches frequently achieve high accuracy but fall short of reliability because they primarily concentrate on majority data characteristics while ignoring minority ones. In order to address this problem, this research introduces novel integrated approaches at the data level, successfully balancing the data distribution and improving classifier performance while working with such datasets. Primary multi-variate vital-sign predicting attempts have been facing challenges like performance and generalizability. Hu et al.¹¹ presented that even with neural networks using essential attributes and data for experimentation, worsening prediction required improvement in beyond the early threatening scores. Correspondingly, Blackwell et al.¹² described that a solo general model struggles compared to event-specific models in detection of in-patient deterioration. These limitations highlight the dire need of more robust approaches to address the aforementioned and high-dimensional clinical time-series data.

This article's primary contributions are:

First the MIMIC III dataset is pre-processed. Then, Healthcare data is compressed using LZMA (Lempel-Ziv-Markov Chain Algorithm) to reduce the amount of data transferred.

SMOTE (Synthetic Minority Oversampling Technique)¹³ is used to address issues with imbalanced data. To identify the K value of SMOTE, it utilizes IMOWPA (Improved Multi-Objective Wolf Pack Algorithm) optimization technique.

CNN-GRU (Convolutional Neural Network-Gated Recurrent Unit) classification uses the data to train a machine to predict the likelihood that stroke patients would also experience a heart attack. This study utilises the MIMIC III database to evaluate performance measures.

The remaining sections of the study are organised in the form of follows: the relevant works are summarised in the second section, the suggested model is briefly explained in the third section, the results and validation analysis are shown in the fourth section, and the summary and conclusion are provided in the fifth section.

Related works

Sowjanya and Mrudula¹⁴ developed a comprehensive two-step approach to address the challenges provided by imbalanced datasets in predictive modelling. Two improvements – Distance-based SMOTE (D-SMOTE)¹⁵ and Bi-phasic SMOTE (BP-SMOTE) – were initially presented to boost the SMOTE's effectiveness. The strategic coupling of these adjustments with carefully selected classifiers to increase prediction acc outperformed the performance of typical SMOTE techniques. In the second stage of their plan, they created an ensemble stacked framework that included DL, ML and ensemble algorithms. This framework demonstrated a considerable gain in acc when compared to solo algorithmic methods for ML like Naive Bayes, Decision Trees and Neural Networks as well as ensemble techniques like Voting, Bagging and Boosting. It is interesting that Sowjanya and Mrudula introduced two innovative techniques called as Stacking CNN¹⁶ and Stacked recurrent neural network (RNN),¹⁷ which merged DL with the Stacking methodology. These new techniques provided considerably higher levels of P, with acc rates between 96% and 97%. Only a few of the datasets included in their study were the Framingham the dataset, the Wisconsin Hospital Breast Cancer Data, and the Novel Coronavirus 2019 dataset.

Using the Decision Tree model, BiLSTM DL, and a successful data balancing strategy, Woniak et al.¹⁸ proposed an integrated IoT system model. With specially adjusted versions of well-known balancing algorithms as adaptive synthetic sampling and SMOTE-Tomek, they conducted comprehensive testing on data preparation. With the help of their technology, medical teams and patients may obtain automated diagnoses produced by the DL model and share safe documents while also evaluating questionnaires. By reaching great acc (above 96%), P (above 88%), and RC (RC) (above 96%), their research confirmed the efficacy of the suggested paradigm in illness diagnosis.

Peng et al.¹⁹ presented a fault detection and diagnosis (FDD) method that combines active learning with semi-supervised learning methods in their work that was printed in the International Journal of Statistics in Computing. By carefully selecting annotated samples from unlabelled dataset, this method effectively identifies both known and unknown kinds of failures while lowering labelling costs. They created a new network-based semi-supervised classifier with dynamic graph construction in order to predict labels for imbalanced data and identify distinct classes. Their analysis of real and simulated datasets for vehicle air intake equipment revealed that their technique outperformed state-of-the-art methods for fleet-level FDD.

Xu et al.²⁰ proposed a special Global Contextual Multiscale Fusion Network (GCMFN) to enhance machines medical condition recognition. This network incorporated a multi-dilated fusing level and a non-local activating module for effective multi-scale feature discovery. Furthermore, they improved diagnostic effectiveness in imbalanced circumstances by using an online labelling smoothing approach. Through experiments on benchmark datasets related to various machine performance monitoring tasks, they demonstrated GCMFN's capabilities as a potential diagnostic tool.

Pan et al.²¹ created the Power-law-based SMOTE (PL-SMOTE) to address the imbalance issue in multi-class serum statistical energy-based resampling (SERS) data. In order to balance the percentage of minority cases and class overlaps, their strategy incorporated a modifying factor. By carefully adjusting the deep neural networks model's settings using the PL-SMOTE method, they were able to develop a perfect cancer screening model with exceptional macro averaged RC and F2 scores. This method may be utilised alongside SERS cancer screening in other multi-class imbalanced situations.

A DL-based approach was suggested by Alourani et al.²² to predict patient mortality using the MIMIC III dataset. They employed a number of variables to evaluate the success of their model, including the acc, F1 score, RC, P and time to execution. The results showed the model's capacity to predict patient death with acc, possibly helping to establish patient care priorities and reducing mortality rates. In their research, they compared their conclusions to cutting-edge models.

To improve the selection of mobile clients with higher data quality and transmission circumstances for model uploading, Zhang et al.²³ suggested a unique scoring-aided federated learning architecture. By devising a client selection technique based on scoring and combining channel state information and data rate into the scoring algorithm, their solution solved the problems presented by long-tailed data. Their suggested framework outperformed the industry standard FedAvg in experimental findings, which showed its superiority.

With the use of X-ray and near-infrared imaging, Shafi et al.²⁴ investigated sophisticated computer vision methods as well as ML and deep-learning models for diagnosing dental diseases. They called attention to problems with interpretability, socioeconomic inequality and a lack of data. Their assessment identified research difficulties, gave a thorough review of X-ray and near-infrared imaging systems, and assessed the effectiveness of existing techniques using openly available benchmarks. Ethics-related issues and potential research avenues in oral disease detection were discussed at the survey's conclusion.

An artificial neural network-based method was used by Raza et al.²⁵ to forecast maternal health concerns. They presented DT-BiLTCN, a unique deep neural network design that combines bidirectional long short-term memory networks, decision trees and temporal convolutional networks. They used synthetic minority oversampling to correct the disparity in class and were highly accurate in foretelling the threats to pregnant women's health.

A method of oversampling based on human and automatic augmentation was presented by Silveira et al.²⁶ who also tested several methods of classifier selection. Using decision trees, random forests and multi-class AdaBoosted decision trees, they used dynamic classifier selection techniques. Their method of utilising unbalanced and small datasets to predict chronic kidney disease in its early stages showed exceptional acc.

IMOWPA optimisation, the LZMA²⁷ data compression method, CNN-GRU classification and unique modelling approaches²⁸ are all used in this study to solve data imbalance while including SMOTE. This work also presents a thorough strategy. This underlines how important it is for healthcare to anticipate strokes accurately.

Proposed method

Figure 1 represents the overall flow of the proposed work.

Figure 1.

Proposed work flow.

Dataset analysis and data pre-processing

The MIMIC III database will be used as the main data source for the evaluation of the proposed model. The MIT laboratory's MIMIC III acts as a comprehensive database for important medical data. This large database consists of 26 CSV files that combine medical data from a sizable cohort of 46,520 patients, with an emphasis on the adult population, which consists of 38,606 people. While the ‘Icu-stays’ file painstakingly documents time-related information relating to patients’ admissions to and discharges from the Intensive Care Unit, the ‘Chart events’ file diligently records comprehensive medical examination data.

The gateway used to access the MIMIC III database was PostSQL. The ‘d-icd-diagnoses’ chart entries in this database include all of the illness categories assigned to each patient. The ICD9-Code value for cerebral infarction was used as filtering criteria to specifically find cases of stroke. Additional filters were then used, choosing samples from patients whose ages were more than 18 and whose stays lasted longer than 12 hours. And 2488 stroke sample datasets were produced as a consequence of the careful filtering procedure. As described in the reference,²⁹ troponin levels were used to assess if a heart attack had occurred in these samples. The occurrence of a heart attack was indicated by a troponin value of 1, whereas the absence of a heart attack was indicated by a value of 0. The 2488 stroke samples were divided into 2406 samples without heart attacks and 82 samples with heart attacks as a result.

Statistics for the 2488 stroke samples’ missing data are there in this database. Notably, two items – X. haemoglobin A1e and triglycerides – showed an excessive amount of missing data points and were subsequently eliminated (shown in purple). The strategy entailed filling in the gaps with average values obtained from already-existing data points within the corresponding indication in order to address missing data in the remaining medical indicators. A dataset of 2488 stroke patients and 34 medical markers was produced as a result of this method. In data analysis and prediction, taking into account extra medical markers is not always useful. It could result in more expensive computations and possible inter-index disturbances. To determine the effect of each medical indication on Troponin levels, a T-test was used. Two hypotheses were developed to begin this investigation: $H 0 : μ = μ_{0}$ , which assume that the samples are the same. $H 1 : μ \neq μ_{0}$ , which assume that the samples are different. To determine if there is a variance between the average and overall results of two samples, a double sample T-test is then used. Formula (1) was used to get the T value, and the table of p-t was used to estimate the range of p. If the p-value fell below the cutoff, H1 would be true, showing that the medical index and troponin are strongly associated. It can anticipate the potential for heart attacks.

t = \frac{{\bar{X}}_{1} - {\bar{X}}_{2}}{\sqrt{\frac{(n_{1} - 1) S_{1}^{2} + (n_{2} - 1) S_{2}^{2}}{n_{1} + n_{2} - 2} (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}

(1)

Among them, ${\bar{X}}_{1}$ and ${\bar{X}}_{2}$ are the two sets of samples’ average values, $n_{1}$ and $n_{2}$ the two sets that contain samples’ sizes, and $S_{1}$ and $S_{2}$ the averages of the standard deviations of both sample sets were calculated to assess the variability within each set. The statistical significance of each of the 34 medical indicators was then determined by computing their p-values and comparing them to a significance level of 0.05. Medical indicators with p-values below this cutoff were kept in the study for further analysis, while those with p-values over the cutoff were discarded. Eight medical indications were thus chosen for additional research after being determined to be statistically significant. Glucose-Max, Heartrate-Mean, Glucose-min, Glucose-Mean, Creatinine, Resprate-Max, Sysbp and Heartrate-Min are among the chosen markers. Due to their clearly considerable influence on the results of the study, these particular medical markers should be given priority in further research projects.

Process for compression/decompression using LZMA

The LZMA model is used to compress and then decompress healthcare data after data pre-processing. The efficient reduction of data transmission and storage is the goal of this compression strategy. Pavlov built on the LZ_77 compression model when he created LZMA. It makes use of dynamic dictionary compression and sliding window-based interval coding methods, providing benefits including quick processing times, high compression rates and reduced decompression space needs. The ability of LZMA to accommodate dictionary sizes ranging from 4 KB to several million MBs is a significant feature. This versatility results in more cache spaces being allocated for searches and greater compression rates. LZMA uses a system where many possible matches are saved in a Hash table to speed up the process of finding longer string matches and quickly finds matching letters. For effective data retrieval, a binary search tree/Hash linked list data structure is also used. In order to optimise its efficiency for various dictionary sizes, the LZMA encoder additionally configures several hash algorithms for 2, 3 and 4 adjacent bytes.

Data balancing using SMOTE

One class is significantly under-represented in the MIMIC III dataset when compared to another, revealing a major class imbalance. This study uses the SMOTE in an effort to lessen the difficulties caused by the inherent class imbalance. SMOTE is an effective strategy for dealing with this problem since it corrects the imbalance by producing fictitious data points for the minority class. The K-Nearest Neighbour (KNN) technique, which locates data points that are close to one another in the feature space, is an essential part of the SMOTE approach. SMOTE picks a data instance and its closest neighbours when such data points have been found. After that, it uses this data to generate fake data instances that connect the chosen data point to its neighbours. By providing synthetic data that is different from the original minority class data, this interpolation approach successfully allays worries about overfitting.

The number of instances of synthetic data that will be created may be set by users, giving SMOTE flexibility. SMOTE randomly chooses an original data instance for synthetic generation if the necessary amount of synthetic data is negligibly little in comparison to the size of the original dataset. Conversely, when a substantial number of synthetic instances is needed relative to the original dataset size, the method automatically generates synthetic data instances according to a predefined oversampling ratio. The SMOTE technique's primary inputs are the oversampling ratio (N), the count of nearest neighbours (k) and the number of minority data instances (T). Finding and choosing the nearest neighbour is the first step in the procedure. The next step is to interpolate data from these neighbours to the original minority class instances to create synthetic instances. If we take the minority class as an example, it would be represented as $(x_{1}, x_{2})$ , and its closest neighbour would be $({\overset{\hat{'}}{x}}_{1}, {\overset{\hat{'}}{x}}_{2})$ , hence the data point would be synthesised using the formula in equation (2).

(X_{1}, X_{2}) = (x_{1}, x_{2}) + random (0, 1) \times Δ

(2)

where

Δ = {({\overset{\hat{'}}{x}}_{1} - x_{1}) ({\overset{\hat{'}}{x}}_{2} - x_{2})}

and random (0,1) denotes a value chosen at random between 0 and 1. The algorithm of SMOTE technique is described in Algorithm 1.

Algorithm 1: SMOTE Technique
Input: original training set (X), oversampling percentage (N), and KNN repetitions (K).
Output: Extensive training set.
N=#observation; m = #attributes; n_min = min_observation
if N < 100; then
Stop: warning (N must be greater than 100)
end if
$N = i n t (\frac{N}{100})$
$S_{(n * N) \times m}$ is an empty array for synesthetic samples
for $i \leftarrow 1$ to $n_{m i n}$ do
Compute KNN for i and save the indices in the n n_array
$n e w i n d e x \leftarrow 1$
while $N \neq 0$ do
$K_{c}$ = Identify using IMOWPA
for $j \leftarrow 1$ to m do
$d i f f \leftarrow= X [n n a r r a y [K_{c}] [j] - X [i] [j]$
$g a p \leftarrow= u n i f o r m (0, 1)$
$s y n t h e t i c [n e w i n d e x] [j] \leftarrow X [i] [j] + g a p \times d i f f$
end for
$n e w i n d e x = n e w i n d e x + 1$
$N = N + 1$
end while
end for
return X

K-value identification of SMOTE by IMOWPA

The Wolf Pack Algorithm (WPA) is inspired by the wolf's hunting propensity for predatory behaviour. It encompasses the three different types of cognitive behaviour that may be seen in wolves: siege, calling and roaming. A hierarchical structure is created inside this algorithm, with the leader wolf standing in for the best performer. A renewal process is used to maintain the wolf pack's quality; the strongest wolves are kept while the lesser ones are eliminated, as described in reference [30]. WPA has experienced issues with being stuck in local optima and converging too quickly since it was first created to address continuous function optimisation concerns. Three important intelligent behaviours in WPA have been improved in response to these drawbacks and in an effort to modify the algorithm to the stroke problem. By incorporating these upgrades, the proposed framework not only broadens the search space but also facilitates the discovery of globally optimal Pareto solutions, leading to more balanced and accurate results.

Population initialization

The quality of the first answer has a direct impact on how well the algorithm performs. While it is a widely used method that encourages variation among the starting population, random initialisation cannot guarantee the quality of the answers. Three different rules are used to produce the starting population in the context of optimisation:

The process that aims to reduce the maximum completion time;

The technique focused on reducing energy consumption Figures and

The process of random generation.

Each of these rules receives a certain amount of the population size in order to improve the quality of the first solutions: 40% for the first rule, 40% for the second rule and 30% for the random generation technique. With this allocation technique, this want to provide the optimisation process a better place to start.

Ranking of non-dominated crowds

After categorising individuals into distinct groups, the non-dominated crowding ranking technique is used to assess the amount of congestion or crowding inside each level. This method successfully updates the locations of imaginary wolves as the iterative process progresses, allowing the wolf packs to keep the most promising solutions while removing the less effective ones. Figure 2 depicts the non-dominated sorting procedure graphically.

Figure 2.

Sorting without dominance.

It's crucial to distinguish between persons in the same layer after individual stratification. Each person's advantages and disadvantages are distinguished using the crowding distance. Formula (3) illustrates how to calculate the crowding distance of people. The people with greater crowding distances are dispersed across the throng. One may assess the distribution homogeneity of the solution set based on the crowding distance.

P [i]_{distance} = \frac{P [i + 1] * f_{1} - P [i - 1] * f_{1}}{f_{1}^{\max} - f_{1}^{\min}} + \frac{P [i + 1] * f_{2} - P [i - 1] * f_{2}}{f_{2}^{\max} - f_{2}^{\min}}

(3)

where

P [i]_{distance}

indicates how close someone is to the audience: P[i]•f1 and P[i]•f2 represent two of the individual's i objective function values;

f_{1}^{\max}

f_{1}^{\min}

signify, respectively, the objective function's highest and lowest values, f1;

f_{2}^{\max}, f_{2}^{\min}

signify, respectively, the objective function's f2 maximum and minimum values.

Intelligent behaviour design

Crossover and mutation operators inspired by genetic algorithms have been introduced into each of the three intelligent behaviours inside the WPA framework in order to guarantee a wide range of workable solutions and improve the local search capabilities of the algorithm. Furthermore, by using a non-dominated ranking technique and an elite retention strategy, the algorithm's capacity to find workable solutions is strengthened. Specialised crossover and mutation procedures have been painstakingly constructed in order to preserve the integrity of solutions and prevent the formation of incorrect outcomes following the application of intelligent behaviours. These processes care for both the encoding method and the particularities of stroke. Specifically, a Partially Mapped Crossover is used for the summoning behaviour, a dual-layer mutation is used for the roaming behaviour, and a mutation operator is used for the besieging behaviour.

In terms of wandering behaviour, there are two types: machine wandering and process wandering. Figure 3's depiction of the process wandering feature focuses our attention on the random selection of stepa1 process codes, each of which corresponds to a distinct work piece. These chosen codes are then organised in a random order and put into the open spaces inside the first run of process codes. The various location vectors used by these stepa1 process codes help to direct the motion of the wolves-detection devices.

Figure 3.

Process wandering behaviour.

During the calling practice, a random selection is taken from a set of at least two machines, each of which corresponds to its own machine set, under the situation where machine wandering continues with step stepa2 = 1. For the machine code wandering operation, this choice is made. The wolf pack is evaluated using the non-dominated crowding degree ranking technique. A subset of the top Pareto solutions is chosen at random to form the Xleader solution set. Next, the Partially Mapped Crossover (POX) crossover approach is used as follows and also shown in Figure 4.

Figure 4.

Flowchart for POX crossover operation. POX: Partially Mapped Crossover.

At the same positions as the number of serials of the associated work pieces, the serial values of the elements are relocated to child X1’. These numbering systems for the component were chosen at random from parent X1. After then, two complementary, non-empty sets, Q1 and Q2, are given the component serial numbers at random. Serial numbers for the work pieces from parent Xleader that correspond to set Q2 are successively inserted into child X1's open places. In contrast, the parent Xleader, which is holding set Q1, contains component serial numbers that are removed and placed into the child Xleader while maintaining their original places. Finally, keeping their arrangement in alphabetical order, the set of Q2 component serial numbers from parent X1 is picked and entered into the open slots of child Xleader’. Figure 5 depicts the POX crossover procedure in graphic form.

Figure 5.

Structure of GRU model. GRU: Gated Recurrent Unit.

Adjustments to the machine code may be adjusted as required since the siege behaviour only applies to the process code. The customisable integer parameter known as ‘stepc’, which stands for ‘siege step size’, acts as a warning sign for increasing wandering behaviour. Consider the process code for artificial wolf Xi, which is [1,1,2,3,3,1,2,2,3] and contains the unique code number 9. It's vital to keep in mind that during the siege stage, the best solution range might easily vary owing to an inappropriate setting of the siege step size. Usually, a step size between 0 and 9 is chosen at random.

Typically, between one-third and 50% of all the different codes are used to determine this step size. In order to update the wolf pack, the siege behaviour process must be incorporated. To do this, R synthetic wolves with the lowest scent concentration value – that is, those with a higher goal function value – must be eliminated. The next step is to create R new synthetic wolves utilising a non-dominated crowding sorting technique. The range of [M/(2b), M/b], where ‘b’ denotes the population update percentage factor and ‘M’ denotes the quantity of synthetic wolves, is where the value of R often falls.

Flow of the algorithm

To summarise, the IMOWPA algorithm steps’ flow chart and specifics are explained as follows:

Step 1: Set the algorithm's parameters to zero.

Step 2: Q =? is established, the objective function value for each artificial wolf in the initial population is determined, the population is swiftly sorted into layers, and the set is updated.

Step 3: The fitness value is assessed to see if the maximum number of walks, Tmax, has been attained. The best-performing artificial wolves are chosen if the fitness assessment shows that the maximum number of walks has not been reached. Then, for the detection wolves, these chosen wolves exhibit the double walk behaviour, requiring modifications to both process coding and machine coding. The detecting wolves’ locations are then adjusted in response to these activities. Up until the Tmax requirements are satisfied, this procedure is repeatedly repeated. When Tmax is reached, the algorithm moves on to step 4.

Step 4: The remaining artificial wolves are classified as the aggressive wolves, and the summoning procedure is started by the detecting wolves. During this step, one of the detecting wolves is chosen at random to carry out a POX operation. The POX crossover approach is then used to modify the placements of the aggressive wolves depending on the determined prey odour concentration value for each manufactured wolf.

Step 5: For carrying out the siege behaviour, an enraged wolf works with identifying the wolf. The target for the siege is chosen at random during this behaviour from the locations of fictitious wolves demonstrating the best fitness values for each optimised sub-goal. Each false wolf's placement is altered when the siege behaviour is finished. The better Pareto solutions are then determined using the computed and recorded optimised objective function values. Individuals are sorted, and then the outside profile set is updated.

Step 6: Population renewal based on the viability of the strongest.

Step 7: Analyse the algorithm to see if it has achieved the termination condition. If so, then the Pareto optimum solution set should be used to produce a set of optimal solutions. If not, move on to step 3.

Classification using CNN-GRU

In this stage of the procedure, the data are classified to see whether a stroke sickness exists or not using the CNN-GRU model. The fully connected (FC), pooling and layers of convolution make up the CNN, a multi-layer neural network structure. The feature extraction in this system is handled by the pooling and convolution layers, while the classification of medical data is handled by the FC layer. There are intricate relationships between each layer and the one that sits above it, with the outputs of one layer serving as inputs to the next. The network parameters are trained and fine-tuned using the back propagation (BP) method.

The Long Short-Term Memory (LSTM) network¹³ is composed of three fundamental gates: the entry gate, updated gate and forgot gate. On the other hand, the GRU is a simpler model with only two gates: the gate that resets and the update gate. These gates are essential for judging the relevance of incoming data. The vital information is maintained, while the less significant information is eliminated. As a result, neural units within the same layer of the CNN-FC network are not directly connected. The GRU network is used in lieu of the CNN-FC layer to transform classification problems into sequence-based tasks. The CNN's detection effectiveness is increased by taking into consideration the classification outcomes of every map of features when deciding how to categorise the map of features that follows it in the same hidden layer. In Figure 6, the recommended GRU framework is shown. Additionally, the dropout technique is utilised to lessen any overfitting issues that may arise from the CNN-GRU method by periodically disabling certain neural units inside the hidden layer.

Figure 6.

Accuracy analysis.

The feature extraction process is carried out inside the CNN-GRU framework by adding a unique CNN input along with resizable layers for convolution and pooling of various sizes. Equation (4) is used to calculate all of the feature maps that are produced in the first convolution layer.

O_c o n v_{j}^{l} = s i g m o i d (\sum_{i = 1}^{M} c o n v n (a_{i}^{l - 1 *} k_{i j}^{l}) + b_{j}^{l})

(4)

where

a_{i}^{l - 1}

identifies the ith feature map in M's collection of feature maps of l - 1 plus the first previous layer, and

k_{i j}^{l}

stands for the lth layer kernels. The convolutional function Conv's value at position (m, n) was calculated using equation (5).

c o n v n (a_{i}^{l - 1}, k_{i j}^{l} [m] [n]) = \sum_{w = 0}^{k - 1} \sum_{l = 0}^{k - 1} k_{i j}^{l} [w] [l] * a_{i}^{l - 1} [m + w] [n + l]

(5)

where the convolution kernel width is given by k = 5. The average pooling approach shown in equation (6), where a_j is the jth feature map of the l 1 convolution layer, and K = 2 is the pooling kernel widths, was used to compute all resulting feature maps values at the location of (m, n) during the pooling layer.

a v e r_p o o l_{j}^{l} [m] [n] = \frac{1}{4} \sum_{l, w = 0}^{K - 1} a_{j}^{l - 1} (m + l, n + w)

(6)

Establishing connections between each feature in the feature maps resulting from the final CNN pooling layer and a matching GRU technique is what comes next. The softmax function, which is obtained from equation (7) and used for the categorisation of medical data, is then used to stimulate the GRU findings. Each GRU output is evaluated collectively during the testing rounds using a voting system to determine the final detection outcome.

y_{k} = \frac{e x p (w_{k} h)}{\sum_{k^{'}} e x p (w_{k} . h)}

(7)

Results and discussion

Experimental arrangement

In this part, the WEKA 3.8.6 environment is used to assess the efficacy of the DL models. The GNU (General Public Licence) was used to produce and distribute WEKA, a free JAVA-based data mining tool. It offers a huge library of models for activities. The tests used the following PC specifications: Windows 10 Home with a 64-bit operating system, an x64-based processor, an Intel(R) Core(TM) i7-9750H CPU running at 2.60 GHz or 2.59 GHz, and 16 GB of memory.

Performances metrics

A variety of assessment indicators were used in this study to evaluate the effectiveness of the suggested approach. Accuracy (Acc), the Receiver Operating Characteristics Curve (AUC), F1 score (F1), precision (P), Recall (RC), and Average Precision (AP) are some of these measurements. By calculating the average epoch training time, abbreviated ‘e-Time’, the computational cost of training was also assessed. Equation (8) was used to calculate acc:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(8)

The standard metrics used in classification evaluation are defined in this paragraph. True positive is denoted by TP, true negative by TN, false positive by FP and false negative by FN. These metrics are essential for evaluating how well categorisation models work.

Equation (9) is used to compute the P:

P r e c i s i o n = \frac{T P}{T P + F P}

(9)

The RC is derived using equation (10) below:

R e c a l l = \frac{T P}{T P + F N}

(10)

Equation (11) gives the formula for calculating the F1:

F 1 = 2 \frac{P r e c i s i o n * r e c a l l}{P r e c i s i o n + r e c a l l}

(11)

Using area under the ROC curve (AUC) curves, the true positive rate and false positive rate are compared across various decision criteria. The weighted mean of acc acquired at each threshold is calculated by AP, which represents the P-RC curve.

From Table 1 and Figures 7 to 11 the analysis based on 60% to 40%, 1D-CNN had 83.46% of acc, 87.45% of AUC, 80.85% of F1, 82.10% of P, 79.64% of RC and 0.7431 of AP. ANN had 84.78% of acc, 88.51% of AUC, 81.87% of F1, 85.62% of P, 78.44% of RC and 0.7661 of AP. CapsNet had 84.77% of acc, 88.07% of AUC, 82.84% of F1, 81.87% of P, 83.83% of RC and 0.7572 of AP. RNN had 84.79% of acc, 89.13% of AUC, 82.63% of F1, 82.63% of precsion, 82.63% of RC and 0.7590 of AP. The proposed model achieves 86.88% of acc, 90.58% of AUC, 85.03% of F1, 85.03% of P, 85.03% of RC and 0.7886 of AP.

Figure 7.

AUC analysis.

Figure 8.

F1 analysis.

Figure 9.

P analysis.

Figure 10.

RC analysis. RC: Recall.

Figure 11.

AP validation. AP: Average Precision.

Table 1.

Analysis of proposed model for 60% to 40%.

Model	Acc	AUC	F1	P	RC	AP
1D-CNN	83.46	87.45	80.85	82.10	79.64	0.7431
ANN	84.78	88.51	81.87	85.62	78.44	0.7661
CapsNet	84.77	88.07	82.84	81.87	83.83	0.7572
RNN	84.79	89.13	82.63	82.63	82.63	0.7590
Proposed model	86.88	90.58	85.03	85.03	85.03	0.7886

Acc: accuracy; AP: Average Precision; AUC: area under the ROC curve; RC: Recall; RNN: recurrent neural network.

From Table 2 and Figures 6 to 11 the analysis based on 70% to 30%, 1D-CNN had 84.82% of acc, 87.97% of AUC, 80.95% of F1, 81.51% of P, 80.41% of RC and 0.7339 of AP. ANN had 84.51% of acc, 86.49% of AUC, 81.62% of F1, 85.06% of P, 78.44% of RC and 0.7618 of AP. CapsNet had 85.83% of acc, 87.87% of AUC, 88.33% of F1, 85.99% of P, 80.84% of RC and 0.7791 of AP. RNN had 86.88% of acc, 90.58% of AUC, 85.03% of F1, 85.03% of P, 85.03% of RC and 0.7886 of AP. The proposed model achieves 87.66% of acc, 91.35% of AUC, 85.63% of F1, 87.50% of P, 83.83% of RC and 0.8044 of AP. Since models have a tendency to memorise the dominant class, imbalanced datasets might make them more susceptible to overfitting. By offering a more balanced training set, SMOTE helps to lower this risk. This can improve model generalisation and, as a result, increase model acc on test data.

Table 2.

Analysis of proposed model for 70% to 30%.

Model	Acc	AUC	F1	P	RC	AP
1D-CNN	84.82	87.97	80.95	81.51	80.41	0.7339
ANN	84.51	86.49	81.62	85.06	78.44	0.7618
CapsNet	85.83	87.87	83.33	85.99	80.84	0.7791
RNN	86.88	90.58	85.03	85.03	85.03	0.7886
Proposed model	87.66	91.35	85.63	87.50	83.83	0.8044

Acc: accuracy; AP: Average Precision; CNN: Convolutional Neural Network; RC: Recall.

The IMOWPA efficiently optimises the K value of SMOTE, which improves the creation of synthetic data for unbalanced datasets, according to Table 3. Because the IMOWPA optimisation model outperforms all other methods in terms of acc, it is the most effective predictor overall. If compared to IMOWPA, the convergence rates of the Firefly Algorithm (FA), Ant Colony Optimisation (ACO), Cuckoo Search Optimisation (CSO), Harmony Search Optimisation (HSO), and Whale Optimisation Algorithm (WOA) may be slower, leading to longer optimisation times.

Table 3.

Optimisation analysis.

Features	Acc	P	RC	F1 score
FA	0.8804 ± 0.0028	0.8507 ± 0.0045	0.8695 ± 0.0032	0.8600 ± 0.0035
ACO	0.8864 ± 0.0028	0.8604 ± 0.0032	0.8748 ± 0.0041	0.8675 ± 0.0032
HSO	0.8977 ± 0.0033	0.8693 ± 0.0055	0.8890 ± 0.0029	0.8790 ± 0.0040
WOA	0.7197 ± 0.0025	0.6364 ± 0.0044	0.6730 ± 0.0038	0.6542 ± 0.0034
CSO	0.8335 ± 0.0020	0.7870 ± 0.0029	0.8158 ± 0.0030	0.8011 ± 0.0024
Proposed	0.9021 ± 0.0039	0.8941 ± 0.0047	0.8778 ± 0.0049	0.8859 ± 0.0046

Acc: accuracy; ACO: Ant Colony Optimisation; AP: Average Precision; CSO: Cuckoo Search Optimisation; FA: Firefly Algorithm; HSO: Harmony Search Optimisation; RC: Recall; WOA: Whale Optimisation Algorithm.

Conclusion

In scientific research, addressing the problem of imbalanced datasets is of paramount importance. Graph theory also encounters such unbalanced sets. The minority class is more significant in these datasets, although it is represented by many fewer instances than the dominant class. When dealing with a large number of examples belonging to the majority class, conventional classification techniques face difficulties that often result in high acc but decreased reliability when categorising the minority class. In order to address this issue, this work presents unique hybrid methodologies at the data level with the goal of balancing the distribution of the data and improving classifier performance. A new CNN-GRU-IDH model has been created in this study to identify illnesses in a cloud-based setting with IoT capabilities. The model uses LZMA-based data compression, Imbalanced Data Handling (IDH) based on SMOTE, CNN-GRU-based classification, and IMOWPA for calculating the ideal K value. On benchmark datasets, several simulations were run to demonstrate how well the CNN-GRU-IDH model performed, outperforming current state-of-the-art techniques. The CNN-GRU-IDH model therefore shows potential as a useful real-time diagnostic tool for the healthcare industry. Notably, the proposed model outperforms previous models with a high acc rate of 87.66%. The development of outlier identification and clustering approaches may be necessary to improve the CNN-GRU-IDH model's classification performance in the future.

Footnotes

Author contributions

RA, UM, KS: Conceptualization; UM, RA, GGT: Methodology; UM, RA KS, JKKA: Formal analysis & data curation; RA, UM, SKB, JKKA, SJM: Writing – original draft preparation; RA, SJM, SKB, GGT: writing – review & editing; RA, SJM, UM, GGT, SJM: Supervision. All authors have read besides approved the published version of the manuscript.

ORCID iDs

Seyed Jalaleddin Mousavirad

Ghanshyam G Tejani

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Choi

Park

Jun

, et al. Deep learning-based stroke disease prediction system using real-time bio signals. Sensors 2021; 21: 4269.

Liu

Kurgan

, et al. Attention convolutional neural network for accurate segmentation and quantification of lesions in ischemic stroke disease. Med Image Anal 2020; 65: 101791.

Islam

Hussain

Rahman

, et al. Explainable artificial intelligence model for stroke prediction using EEG signal. Sensors 2022; 22: 9859.

Patil

Shastry

Ashokumar

. Heart attack detection based on mask region based convolutional neural network instance segmentation and hybrid classification using machine learning techniques. Turkish J Computer Mathematics Educ 2021; 12: 2228–2244.

Schindler

Schinner

Altaf

, et al. Prediction of stroke risk by detection of hemorrhage in carotid plaques: meta-analysis of individual patient data. Cardiovasc Imag 2020; 13: 395–406.

Park

Kwon

, et al. AI-based stroke disease prediction system using real-time electromyography signals. Appl Sci 2020; 10: 6791.

Jindal

Agrawal

Khera

, et al. Heart disease prediction using machine learning algorithms. In: IOP conference series: materials science and engineering 2021. IOP Publishing, 2021, pp.1–7.

Sarra

Dinar

Mohammed

, et al. Enhanced heart disease prediction based on machine learning and χ2 statistical optimal feature selection model. Designs 2022; 6: 87.

Malla

Dubey

Kumar

, et al. Exploring the human microbiome: the potential future role of next-generation sequencing in disease diagnosis and treatment. Front Immunol 2019; 9: 2868.

10.

. A novel nomogram to predict mortality in patients with stroke: a survival analysis based on the MIMIC-III clinical database. BMC Med Inform Decis Mak 2022; 22: 92.

11.

Wong

Correa

, et al. Prediction of clinical deterioration in hospitalized adult patients with hematologic malignancies using a neural network model. PLoS ONE 2016; 11: e0161401.

12.

Blackwell

Keim-Malpass

Clark

, et al. Early detection of in-patient deterioration: one prediction model does not fit all. Critical Care Explor 2020; 2: e0116.

13.

Chawla

Bowyer

Hall

, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321–357.

14.

Sowjanya

Mrudula

. Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms. Appl Nanosci 2023; 13: 1829–1840.

15.

Maulidevi

Surendro

. SMOTE-LOF for noise identification in imbalanced data classification. J King Saud Univ – Computer Inform Sci 2022; 34: 3413–3423.

16.

Zhao

Sun

Yang

. Automatic recognition of surface defects of hot rolled strip steel based on deep parallel attention convolution neural network. Mater Lett 2023; 353: 135313.

17.

Cho

Van Merriënboer

Gulcehre

, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 2014 Jun 3.

18.

Woźniak

Wieczorek

Siłka

. BiLSTM deep neural network model for imbalanced medical data of IoT systems. Future Gener Comput Syst 2023; 141: 489–499.

19.

Peng

Jin

Duan

, et al. Active learning-assisted semi-supervised learning for fault detection and diagnostics with imbalanced dataset. IISE Transact 2023; 55: 672–686.

20.

Yan

Feng

, et al. Global contextual multiscale fusion networks for machine health state identification under noisy and imbalanced conditions. Reliab Eng Syst Saf 2023; 231: 108972.

21.

Pan

Peng

Chen

, et al. Power-law-based synthetic minority oversampling technique on imbalanced serum surface-enhanced Raman spectroscopy data for cancer screening. Adv Intell Syst 2023; 5: 2300006.

22.

Alourani

Tariq

Tahir

, et al. Patient mortality prediction and analysis of health cloud data using a deep neural network. Appl Sci 2023; 13: 2391.

23.

Zhang

Chen

, et al. Scoring aided federated learning on long-tailed data for wireless IoMT based healthcare system. IEEE J Biomed Health Inform 2024; 28: 3341–3348.

24.

Shafi

Fatima

Afzal

, et al. A comprehensive review of recent advances in artificial intelligence for dentistry e-health. Diagnostics 2023; 13: 2196.

25.

Raza

Siddiqui

Munir

, et al. Ensemble learning-based feature engineering to analyze maternal health during pregnancy and health risk prediction. PLoS ONE 2022; 17: e0276525.

26.

Silveira

Sobrinho

Silva

, et al. Exploring early prediction of chronic kidney disease using machine learning algorithms for small and imbalanced datasets. Appl Sci 2022; 12: 3673.

27.

Qiu

Yin

, et al.

Design of a hybrid compression algorithm for high-fidelity synchro-waveform measurements

2023 IEEE international conference on energy technologies for future grids (ETFG). Piscataway, NJ: IEEE, 2023, pp.1–6.

28.

Xin

Zhang

Zhong

, et al. Lateral spread prediction based on hybrid CNN-LSTM model for hot strip finishing mill. Mater Lett 2025; 378: 137594.

29.

Wang

Yao

Chen

. An imbalanced-data processing algorithm for the prediction of heart attack in stroke patients. IEEE Access 2021; 9: 25394–25404.

30.

, et al. Flexible job shop scheduling optimization for green manufacturing based on improved multi-objective wolf pack algorithm. Appl Sci 2023; 13: 1–22.

A CNN-GRU framework for stroke–heart attack prediction using IMOWPA-tuned SMOTE and LZMA compression

Abstract

Keywords

Introduction

Related works

Proposed method

Dataset analysis and data pre-processing

Process for compression/decompression using LZMA

Data balancing using SMOTE

K-value identification of SMOTE by IMOWPA

Population initialization

Ranking of non-dominated crowds

Intelligent behaviour design

Flow of the algorithm

Classification using CNN-GRU

Results and discussion

Experimental arrangement

Performances metrics

Conclusion

Footnotes

Author contributions

ORCID iDs

Funding

Declaration of conflicting interests

References