ScaffoldViz: Interactive 3D visualization of ensemble decision trees based on relationship between weak learners

Abstract

Ensemble learning that combines multiple weak learners for enhanced performance, is widely used but suffers from low interpretability and explainability. Low interpretability hinders the identification of factors contributing to model inaccuracies, such as overfitting or underfitting, which are critical considerations in the operation of machine learning models. Additionally, this leads challenges not only in operational aspects like model maintenance and quality assurance but also in addressing societal needs such as fairness and privacy. This study focuses on the relationships among weak learners within an ensemble model and proposes a novel visualization method to enhance understanding of the model’s structure and learning process. In this paper, we propose ScaffoldViz, a system that defines relationships among weak learners based on “common samples” in gradient boosting decision trees and visualizes these relationships as a three-dimensional graph structure, linking them progressively from lower levels. This system enables observation of ensemble models composed of multiple weak learners, as well as their transformations during the training and validation processes. An example of the operation of this system is presented by visualizing an ensemble model trained with synthetic data sets exhibiting typical distribution shifts, as well as real-world open data sets. As a result, we demonstrated that this approach enables a more accessible understanding of the behavior and structure of ensemble models. Moreover, facilitating the identification of overfitting and underfitting, and detection of outliers within the dataset. User testing further demonstrated the effectiveness of this visualization method.

Keywords

Visualization machine learning ensemble model layered graph MLOps

Get full access to this article

View all access options for this article.

References

Chatzimparmpas

Martins

Kucher

, et al. Stackgenvis: alignment of data, algorithms, and models for stacking ensemble learning using performance metrics. IEEE Trans Vis Comput Graph 2021; 27(2): 1547–1557.

Meng

van den Elzen

Vilanova

. Modelwise: interactive model comparison for model diagnosis, improvement and selection. Comput Graph Forum 2022; 41(3): 97–108.

Zhang

Yin

Feng

, et al. Neuralvis: Visualizing and interpreting deep learning models. In : 2019 34th IEEE/ACM international conference on automated software engineering (ASE), 2019, pp.1106–1109.

Nagasaka

Izuhara

. Interactive visualization of deep learning models in an immersive environment. In: Proceedings of the 27th ACM symposium on virtual reality software and technology, VRST 2021, New York, NY: Association for Computing Machinery.

Yang

Yuan

, et al. Ruleexplorer: A scalable matrix visualization for understanding tree ensemble classifiers. IEEE Trans Vis Comput Graph 2024; 1–15.

Lommers

. Random forest interpretation through representative trees. Master Thesis, Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, 2023.

Kashiyama

Hirokawa

Matsuno

, et al. Interactive visualization of ensemble decision trees based on the relations among weak learners. In: 2024 28th International conference information visualisation (IV), 2024, pp. 105–110. IEEE.

Friedman

. Greedy function approximation: a gradient boosting machine. Ann Stat 2001; 29(5): 1189–1232.

Kreuzberger

Kühl

Hirschl

. Machine learning operations (MLOps): Overview, definition, and architecture. IEEE Access 2022; 11: 31866–31879.

10.

Sakuma

Matsuno

Kameda

. A method of identifying causes of prediction errors to accelerate mlops. In: 2023 IEEE/ACM international workshop on deep learning for testing and testing for deep learning (DeepTest), 2023, pp. 9–16.

11.

Wang

Zhong

Xin

, et al. TimberTrek: exploring and curating sparse decision trees with interactive visualization. In : 2022 IEEE visualization and visual analytics (VIS), 2022, pp. 60–64.

12.

Kovalerchuk

Dunn

Worland

, et al. Interactive decision tree creation and enhancement with complete visualization for explainable modeling. In: Kovalerchuk

Nazemi

Andonie

, et al. (eds) Artificial intelligence and visualization: Advancing visual knowledge discovery. Cham: Springer Nature Switzerland, 2024, pp.3–40. https://link.springer.com/chapter/10.1007/978-3-031-46549-9_1

13.

Zhou

Nguyen

, et al. Using visualization to illustrate machine learning models for genomic data. In: Proceedings of the Australasian computer science week multiconference, ACSW 2019, New York, NY: Association for Computing Machinery.

14.

Nsch

Wiesner

Wendler

, et al. Colorful trees: visualizing random forests for analysis and interpretation. In: 2019 IEEE winter conference on applications of computer vision (WACV), 2019, pp. 294–302.

15.

Gao

Liu

Zhou

, et al. GBDT4CTRVis: visual analytics of gradient boosting decision tree for advertisement click-through rate prediction. J Vis 2024; 27: 639–659.

16.

Zhang

Cao

Shi

, et al. Interpreting CNN knowledge via an explanatory graph. Proc AAAI Conf Artif Intell 2018; 32(1): 11819.

17.

van den Elzen

van Wijk

. Multivariate network exploration and presentation: from detail to overview via selections and aggregations. IEEE Trans Vis Comput Graph 2014; 20(12): 2310–2319.

18.

Feyer

Pinaud

Kobourov

, et al. 2d, 2.5d, or 3d? An exploratory study on multilayer network visualisations in virtual reality. IEEE Trans Vis Comput Graph 2024; 30(1): 469–479.

19.

Meng

Finley

, et al. LightGBM: a highly efficient gradient boosting decision tree. In : Proceedings of the 31st international conference on neural information processing systems, NIPS 2017, pp. 3149–3157. Red Hook, NY: Curran Associates Inc..

20.

Fan

. LIBSVM data: Classification, regression, and multi-label. https://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets (accessed 24 April 2024).

21.

Nakaya

. A modification of Scheffe’s method for paired comparisons. In: Procedure of the 11th meeting of sensory test, 1970, pp. 1–12.

22.

Lin

Zhong

, et al. Generalized and scalable optimal sparse decision trees. In: III

Singh

(eds.) Proceedings of the 37th international conference on machine learning, proceedings of machine learning research, vol. 119, PMLR, 2019, pp. 6150–6160.

23.

McTavish

Zhong

Achermann

, et al. Fast sparse decision tree optimization via reference ensembles. Proc AAAI Conf Artif Intell 2022; 36(9): 9604–9613.