Abstract
Ensemble learning that combines multiple weak learners for enhanced performance, is widely used but suffers from low interpretability and explainability. Low interpretability hinders the identification of factors contributing to model inaccuracies, such as overfitting or underfitting, which are critical considerations in the operation of machine learning models. Additionally, this leads challenges not only in operational aspects like model maintenance and quality assurance but also in addressing societal needs such as fairness and privacy. This study focuses on the relationships among weak learners within an ensemble model and proposes a novel visualization method to enhance understanding of the model’s structure and learning process. In this paper, we propose ScaffoldViz, a system that defines relationships among weak learners based on “common samples” in gradient boosting decision trees and visualizes these relationships as a three-dimensional graph structure, linking them progressively from lower levels. This system enables observation of ensemble models composed of multiple weak learners, as well as their transformations during the training and validation processes. An example of the operation of this system is presented by visualizing an ensemble model trained with synthetic data sets exhibiting typical distribution shifts, as well as real-world open data sets. As a result, we demonstrated that this approach enables a more accessible understanding of the behavior and structure of ensemble models. Moreover, facilitating the identification of overfitting and underfitting, and detection of outliers within the dataset. User testing further demonstrated the effectiveness of this visualization method.
Get full access to this article
View all access options for this article.
