A Responsible AI approach for designing resilient classifier to handle incomplete data

Abstract

Missing values can greatly affect analyses and decision-making in many fields. In the context of Responsible Artificial Intelligence (AI), ensuring the robustness of machine learning models is essential because Responsible AI emphasizes reliability and interpretability in decision-making processes. However, traditional imputation and ensemble learning methods often fail to preserve critical relationships between independent and dependent variables, introducing bias or noise into the data and undermining the development of robust classification models. To address these challenges, we propose a novel classification approach that aligns with Responsible AI principles. Our Resilient Decision Tree classifier is specifically designed to handle incomplete datasets. We employ subspace classifiers that operate on different non overlapping subsets of features without relying on imputation. By combining these subspace models into a weighted ensemble classifier, we enhance prediction accuracy for test datasets with missing values. The experimental results obtained on real-life and synthetic datasets demonstrate that our methodology produces an effective ensemble classifier.

Keywords

Responsible AI supplemented data subspace classifier weighted ensemble classifier incomplete data

Get full access to this article

View all access options for this article.

References

Oroy

Jhonson

. Ethical considerations in AI and machine learning: Towards responsible AI Deployment (No. 12238). Switzerland: Frontiers in Artificial Intelligence, 2024.

González-Vidal

, et al. Missing data imputation with Bayesian maximum entropy for internet of things applications. IEEE Internet Things J 2020; 8: 16108–16120.

Leukel

González

Riekert

. Adoption of machine learning technology for failure prediction in industrial maintenance: a systematic review. J Manuf Syst 2021; 61: 87–96.

Beaulac

Rosenthal

. BEST: a decision tree algorithm that handles missing values. Computat Stat 2020; 35: 1001–1026.

Ayilara

, et al. Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcom 2019; 17: 1–9.

Sairam

Krishna

Kamalakar

. Missing Data Resilient Ensemble Subspace Decision Tree Classifier. In: Proc. Of 6th Joint CODS-COMAD, Mumbai, India, 2023. https://doi.org/10.1145/3570991.3571006.

Bartlett

Hughes

. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Stat Methods Med Res 2020; 29: 3533–3546.

Díaz-Rodríguez

, et al. Connecting the dots in trustworthy Artificial Intelligence: from AI principles, ethics, and key requirements to responsible AI systems and regulation. Inf Fusion 2023; 99: 101896.

Aleryani

Wang

De La Iglesia

. Multiple imputation ensembles (MIE) for dealing with missing data. SN Comput Scie 2020; 1: 134.

10.

Boulmaiz

Reignier

Ploix

. Mind the Gap: Addressing Incompleteness Challenge in Case-Based Reasoning Applications. In: IFIP International Conference on Artificial Intelligence Applications and Innovations, Cham: Springer Nature Switzerland, 2023.

11.

Duan

. Explainable, trustworthy and responsive intelligent processing of biological resources integrating data, information, knowledge, and wisdom—volume II. Front Genet 2023; 13: 1114441.

12.

Gomer

Yuan

K-H

. A realistic evaluation of methods for handling missing data when there is a mixture of MCAR, MAR, and MNAR mechanisms in the same dataset. Multivar Behav Res 2023; 58: 988–1013.

13.

Iliadis

Maglogiannis

Papadopoulos

. Artificial Intelligence Applications and Innovations. In: 8th IFIP WG 12.5 International Conference, AIAI 2012, Halkidiki, Greece, September 27–30, 2012, Proceedings, Part I. Vol. 381, Springer, 2012.

14.

Nanni

Lumini

Brahnam

. A classifier ensemble approach for the missing feature problem. Artif Intell Med 2012; 55: 37–50.

15.

Phoon

Zhang

Cao

. Special issue on “machine learning and AI in geotechnics”. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards 2023; 17: 1–6.

16.

Enders

Baraldi

. Missing data handling methods. In: The wiley handbook of psychometric testing: a multidisciplinary reference on survey, scale and test development. Wiley Blackwell, 2018, pp.139–185.

17.

Jerez

, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 2010; 50: 105–115.

18.

Chhabra

Vashisht

Ranjan

. A classifier ensemble machine learning approach to improve efficiency for missing value imputation. In: 2018 International Conference on Computing, Power and Communication Technologies (GUCON), IEEE, 2018, September, pp.23–27.

19.

Emmanuel

, et al. A survey on missing data in machine learning. J Big Data 2021; 8: 1–37.

20.

Enders

. Applied missing data analysis. USA: Guilford Press, 2010.

21.

Allison

. Missing data. In: The SAGE handbook of quantitative methods in psychology. Sage, 2009, pp.72–89.

22.

Somasundaram

Nedunchezhian

. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int J Comput Appl 2011; 21: 14–19.

23.

Fekade

, et al. Probabilistic recovery of incomplete sensed data in IoT. IEEE Internet of Things J 2017; 5: 2282–2292.

24.

Zhang

, et al. Missing value imputation based on data clustering. In: Transactions on computational science I. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp.128–138.

25.

Kong

, et al. Dealing with missing values in proteomics data. Proteomics 2022; 22: 2200092.

26.

Breiman

. Classification and regression trees. USA: Routledge, 2017.

27.

Spinelli

Scardapane

Uncini

. Missing data imputation with adversarially-trained graph convolutional networks. Neural Networks 2020; 129: 249–260.

28.

Papageorgiou

Grant

Takkenberg

, et al.

Statistical primer: how to deal with missing data in scientific research?

Interactive Cardiovasc Thor Surg 2018; 27: 153–158.

29.

Suthar

Patel

Goswami

. A survey: classification of imputation methods in data mining. Int J Emerg Technol Adv Eng 2012; 2: 309–312.

30.

Zhang

Liu

. Dimension reduction of high-dimensional dataset with missing values. J Algorithms Comput Technol 2019; 13: 1748302619867440.

31.

Yan

, et al. Missing value imputation based on Gaussian mixture model for the internet of things. Math Probl Eng 2015; 2015: 548605.

32.

Shadbahr

, et al. The impact of imputation quality on machine learning classifiers for datasets with missing values. Commun Med 2023; 3: 139.

33.

Siswantining

Soemartojo

Sarwinda

. Application of sequential regression multivariate imputation method on multivariate normal missing data. In: 2019 3rd International Conference on Informatics and Computational Sciences (ICICoS), IEEE, 2019, October, pp.1–6.

34.

Ding

Simonoff

. An investigation of missing data methods for classification trees applied to binary response data. J Machine Learn Res 2010; 11: 131–170.

35.

Boluki

, et al. Optimal clustering with missing values. BMC Bioinform 2019; 20: 1–10.

36.

Tran

, et al. Multiple imputation and ensemble learning for classification with incomplete data. In: Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings, Springer International Publishing, 2017.

37.

Hasan

, et al. Dermoexpert: skin lesion classification using a hybrid convolutional neural network through segmentation, transfer learning, and augmentation. Inform Med Unlocked 2022; 28: 100819.

38.

Little

Rubin

. Statistical analysis with missing data, Vol. 793. USA: John Wiley & Sons, 2019.

39.

Zhao

Long

. Multiple imputation in the presence of high-dimensional data. Stat Methods Med Res 2016; 25: 2021–2035.

40.

Gao

Jian

Peng

, et al. A subspace ensemble framework for classification with high dimensional missing data. Multidimens Syst Signal Process 2017; 28: 1309–1324.

41.

Kizaric

Pimentel-Alarcón

. Classifying incomplete data with a mixture of subspace experts. In: 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), IEEE, 2022, September, pp.1–8.

42.

John

. Robust Decision Trees: Removing Outliers from Databases. In: KDD, Vol. 95, 1995.

43.

Cabana

Lillo

. Robust adjusted discriminant analysis based on shrinkage with application to geochemical and environmental fields. Chemometr Intell Lab Syst 2022; 221: 104488.

44.

Dash

CSK

, et al. An outliers detection and elimination framework in classification task of data mining. Decis Anal J 2023; 6: 100164.