Abstract
Crash data for road safety analysis and modeling are growing steadily in size and completeness because of the latest advancement in information technologies. This increased availability of large data sets has generated resurgent interest in applying a data-driven nonparametric approach as an alternative to the traditional parametric models for crash risk prediction. This paper investigates the question of how the relative performance of these two alternative approaches changes as crash data grow. Two popular techniques from the two approaches are compared: negative binomial models for the parametric approach and kernel regression for the nonparametric counterpart. Two large crash data sets are used to investigate the performance of these two methods as a function of the amount of training data. A rigorous bootstrapping validation process shows that the two approaches have strikingly different patterns, especially in sensitivity to data size. The kernel regression method outperforms the model-based approach—that is, negative binomial—for predictive performance, and that performance advantage increases noticeably as data available for calibration grow. With the arrival of the big data era and the added benefits of enabling automated road safety analysis and improved responsiveness to current safety issues, nonparametric techniques (especially those of modern machine approaches) can be counted as an important tool in road safety studies.
Get full access to this article
View all access options for this article.
