Research on massive ECG data in XGBoost

Abstract

There exists a huge amount of ECG data available in heart disease diagnosis which is found difficult in handing. Recently, many researchers focused on mining disease diagnosis to innovate the hidden patterns and their relevant features. Mining bio-medical data is one of the predominant research areas where clustering techniques are emphasized in heart disease diagnosis. But few people deal with large heart disease datasets and then classify disease data sets according to heart disease feature. We propose a method of anomaly threshold based on multiple classifiers can be well suited to datasets containing abnormal data, and use XGBoost algorithm as a sub-classifier to process massive ECG data. This research focuses on the heart disease classification problem. The data set is divided into two categories, and then it was classified into more specific categories, experimental results show that this method can improve classification accuracy. The experiments are conducted on massive instances of different heart disease obtained from the hospital actual cases and two data sets of UCI. In fact, we compared SVM, C4.5, Naive Bayes, Logistic, RandomForest and XGBoost algorithms, and found that tree-based model classifier is the best fit to predict arrhythmia. The method proposed in this paper is of great significance to the processing and forecasting system of large medical data sets, and promote the development of wisdom medical care.

Keywords

ECG heart disease diagnosis heart disease intelligent medical

Get full access to this article

View all access options for this article.

References

Chen

, Fu

, Zuo

et al. Radar emitter classification for large data set based on weighted-XGBoost[J], Iet Radar Sonar & Navigation 11(8) (2017), 1203–1207.

Kshirsagar

P.R.

, Akojwar

S.G.

and Dhanoriya

, Classification of ECG-signals using Artificial Neural Networks[C], International Conference on Electrical, Computer and Communication Technologies, 2017.

Fitriah

, Wijaya

S.K.

, Fanany

M.I.

et al. EEG Channels Reduction using PCA to Increase XGBoost’s Accuracy for Stroke Detection[C], Iscpms AIP Publishing LLC, 2017,pp. 2489–2492.

Ramraj

, Nishant

, Sunil

, et al., Experimenting XGBoost Algorithm Prediction and Classification of Different Datasets[C], National Conference on Recent Innovations in Software Engineering and Computer Technologies, 2017.

Chen

, He

, Benesty

et al. XGBoost: Extreme Gradient Boosting[J], 2016.

Chen

and Guestrin

, XGBoost: A Scalable Tree Boosting System[J], 2016, pp. 785–794.

Pavlyshenko

B.M.

Linear, machine learning and probabilistic approaches for time series analysis, IEEE First Int Conf on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 2016, pp. 377–381.

Zhang

and Johnson

, Learning nonlinear functions using regularized greedy forest, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(5) (2014).

Jabbar

M.A.

, Deekshatulu

B.L.

and Chandra

, Heart disease prediction system using associative classification and genetic algorithm[J], Computer Science (2013).

10.

Austin

P.C.

, Tu

J.V.

, Ho

J.E.

et al. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types[J], Journal of Clinical Epidemiology 66(4) (2013), 398.

11.

Gonçalves

Guizzardi

and J.G.P.

Filho

, Using an ECG reference ontology for semantic interoperability of ECG data[J], Journal of Biomedical Informatics 44(1) (2011), 126.

12.

Burges

C.J.

, From ranknet to lambdarank to lambdamart: An overview, Learning 11 (2010), 23–581.

13.

Polat

, Günes

and S.

Tosun

, Diagnosis of heart disease using artificial immune recognition system and fuzzy weighted pre-processing[J], Pattern Recognition 39(11) (2006), 2186–2193.

14.

C.P.

and Reilly

R.B.

, A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features [J], IEEE Transactions on Biomedical Engineering 53(1) (2006), 2535–2543.

15.

Güler

İ.

and Übeyli

E.D.

, ECG beat classifier designed by combined neural network model[J], Pattern Recognition 38(2) (2005), 199–208.

16.

Israel

S.A.

, Irvine

J.M.

, Cheng

et al. ECG to identify individuals[J], Pattern Recognition 38(1) (2005), 133–142.

17.

Bekkerman

, The present and the future of the kdd cup competition: An outsider’s perspective.

18.

Willis

H.J.

, Morris

D.C.

and Wayne

A.R.

, The use of the New York Heart Association’s classification of cardiovascular disease as part of the patient’s complete Problem List [J], Clinical Cardiology 22(6) (1999), 385.

19.

Kotch

J.B.

, The effectiveness of medical care: Validating clinical wisdom, by barbara starfield[J], Journal of Public Health Policy 7(2) (1986), 268–270.