Abstract
The term “model trees” is commonly used for regression trees that contain some non-trivial model in their leaves. Popular implementations of model tree learners build trees with linear regression models in their leaves. They use reduction of variance as a heuristic for selecting tests during the tree construction process. In this article, we show that systems employing this heuristic may exhibit pathological behaviour in some quite simple cases. This is not visible in the predictive accuracy of the tree, but it reduces its explanatory power. We propose an alternative heuristic that yields equally accurate but simpler trees with better explanatory power, and this at little or no additional computational cost. The resulting model tree induction algorithm is experimentally evaluated and compared with simpler and more complex approaches on a variety of synthetic and real world data sets.
Get full access to this article
View all access options for this article.
