Model-Based Versus Data-Driven Approach for Road Safety Analysis: Do More Data Help?

Abstract

Crash data for road safety analysis and modeling are growing steadily in size and completeness because of the latest advancement in information technologies. This increased availability of large data sets has generated resurgent interest in applying a data-driven nonparametric approach as an alternative to the traditional parametric models for crash risk prediction. This paper investigates the question of how the relative performance of these two alternative approaches changes as crash data grow. Two popular techniques from the two approaches are compared: negative binomial models for the parametric approach and kernel regression for the nonparametric counterpart. Two large crash data sets are used to investigate the performance of these two methods as a function of the amount of training data. A rigorous bootstrapping validation process shows that the two approaches have strikingly different patterns, especially in sensitivity to data size. The kernel regression method outperforms the model-based approach—that is, negative binomial—for predictive performance, and that performance advantage increases noticeably as data available for calibration grow. With the arrival of the big data era and the added benefits of enabling automated road safety analysis and improved responsiveness to current safety issues, nonparametric techniques (especially those of modern machine approaches) can be counted as an important tool in road safety studies.

Get full access to this article

View all access options for this article.

References

Highway Safety Manual. AASHTO, Washington, D.C., 2010.

Hauer

The Art of Regression Modeling in Road Safety. Springer, New York, 2015.

Karlaftis

M. G.

, and Golias

. Effects of Road Geometry and Traffic Volumes on Rural Roadway Accident Rates. Accident Analysis and Prevention, Vol. 34, No. 3, 2002, pp. 357–365.

Chang

L.-Y.

Analysis of Freeway Accident Frequencies: Negative Binomial Regression Versus Artificial Neural Network. Safety Science, Vol. 43, No. 8, 2005, pp. 541–557.

Xie

, Lord

, and Zhang

. Predicting Motor Vehicle Collisions Using Bayesian Neural Network Models: An Empirical Analysis. Accident Analysis and Prevention, Vol. 39, No. 5, 2007, pp. 922–933.

Abdel-Aty

, and Haleem

. Analyzing Angle Crashes at Unsignalized Intersections Using Machine Learning Techniques. Accident Analysis and Prevention, Vol. 43, No. 1, 2011, pp. 461–470.

, Lord

, Zhang

, and Xie

. Predicting Motor Vehicle Crashes Using Support Vector Machine Models. Accident Analysis and Prevention, Vol. 40, No. 4, 2008, pp. 1611–1618.

Lord

, and Miranda-Moreno

L. F.

. Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter of Poisson–Gamma Models for Modeling Motor Vehicle Crashes: A Bayesian Perspective. Safety Science, Vol. 46, No. 5, 2008, pp. 751–770.

Lord

, and Mannering

. The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives. Transportation Research Part A: Policy and Practice, Vol. 44, No. 5, 2010, pp. 291–305.

10.

Jovanis

P. P.

, and Chang

H. L.

. Modeling the Relationship of Accidents to Miles Traveled. In Transportation Research Record 1068, TRB, National Research Council, Washington, D.C., 1986, pp. 42–51.

11.

Jones

, Janssen

, and Mannering

. Analysis of the Frequency and Duration of Freeway Accidents in Seattle. Accident Analysis and Prevention, Vol. 23, No. 4, 1991, pp. 239–255.

12.

Miaou

S.-P.

, and Lum

. Modeling Vehicle Accidents and Highway Geometric Design Relationships. Accident Analysis and Prevention, Vol. 25, No. 6, 1993, pp. 689–709.

13.

Miaou

S.-P.

The Relationship Between Truck Accidents and Geometric Design of Road Sections: Poisson Versus Negative Binomial Regressions. Accident Analysis and Prevention, Vol. 26, No. 4, 1994, pp. 471–482.

14.

Miaou

S.-P.

, and Lord

. Modeling Traffic Crash–Flow Relationships for Intersections: Dispersion Parameter, Functional Form, and Bayes Versus Empirical Bayes Methods. In Transportation Research Record: Journal of the Transportation Research Board, No. 1840, Transportation Research Board of the National Academies, Washington, D.C., 2003, pp. 31–40.

15.

Persaud

B. N.

Accident Prediction Models for Rural Roads. Canadian Journal of Civil Engineering, Vol. 21, No. 4, 1994, pp. 547–554.

16.

Shankar

, Mannering

, and Barfield

. Effect of Roadway Geometrics and Environmental Factors on Rural Freeway Accident Frequencies. Accident Analysis and Prevention, Vol. 27, No. 3, 1995, pp. 371–389.

17.

Miranda-Moreno

L. F.

Statistical Models and Methods for Identifying Hazardous Locations for Safety Improvements. PhD thesis. University of Waterloo, Ontario, Canada, 2006.

18.

El-Basyouny

, and Sayed

. Comparison of Two Negative Binomial Regression Techniques in Developing Accident Prediction Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 1950, Transportation Research Board of the National Academies, Washington, D.C., 2006, pp. 9–16.

19.

Daniels

, Brijs

, Nuyts

, and Wets

. Explaining Variation in Safety Performance of Roundabouts. Accident Analysis and Prevention, Vol. 42, No. 2, 2010, pp. 393–402.

20.

Miranda-Moreno

L. F.

, Fu

, Saccomano

F. F.

, and Labbe

. Alternative Risk Models for Ranking Locations for Safety Improvement. In Transportation Research Record: Journal of the Transportation Research Board, No. 1908, Transportation Research Board of the National Academies, Washington, D.C., 2005, pp. 1–8.

21.

Usman

, Fu

, and Miranda-Moreno

L. F.

. A Disaggregate Model for Quantifying the Safety Effects of Winter Road Maintenance Activities at an Operational Level. Accident Analysis and Prevention, Vol. 48, 2012, pp. 368–78.

22.

Cheng

, Geedipally

S. R.

, and Lord

. The Poisson–Weibull Generalized Linear Model for Analyzing Motor Vehicle Crash Data. Safety Science, Vol. 54, 2013, pp. 38–42.

23.

Connors

R. D.

, Maher

, Wood

, Mountain

, and Ropkins

. Methodology for Fitting and Updating Predictive Accident Models with Trend. Accident Analysis and Prevention, Vol. 56, 2013, pp. 82–94.

24.

Aguero-Valverde

, and Jovanis

P. P.

. Analysis of Road Crash Frequency with Spatial Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 2061, Transportation Research Board of the National Academies, Washington, D.C., 2008, pp. 55–63.

25.

Lambert

Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing. Technometrics, Vol. 34, No. 1, 1992, pp. 1–14.

26.

Washington

, Karlaftis

M. G.

, and Mannering

. Statistical and Econometric Methods for Transportation Data Analysis. Chapman & Hall/CRC Press, Boca Raton, Fla., 2003.

27.

Anastasopoulos

P. C.

, and Mannering

F. L.

. A Note on Modeling Vehicle Accident Frequencies with Random-Parameters Count Models. Accident Analysis and Prevention, Vol. 41, No. 1, 2009, pp. 153–159.

28.

El-Basyouny

, and Sayed

T. A.

. Collision Prediction Models Using Multivariate Poisson–Lognormal Regression. Accident Analysis and Prevention, Vol. 41, No. 4, 2009, pp. 820–828.

29.

Ukkusuri

, Hasan

, and Abdul Aziz

H. M.

. Random Parameter Model Used to Explain Effects of Built-Environment Characteristics on Pedestrian Crash Frequency. In Transportation Research Record: Journal of the Transportation Research Board, No. 2237, Transportation Research Board of the National Academies, Washington, D.C., 2011, pp. 98–106.

30.

Mitra

, and Washington

. On the Significance of Omitted Variables in Intersection Crash Modeling. Accident Analysis and Prevention, Vol. 49, 2012, pp. 439–448.

31.

, Sharma

, Mannering

F. L.

, and Wang

. Safety Impacts of Signal-Warning Flashers and Speed Control at High-Speed Signalized Intersections. Accident Analysis and Prevention, Vol. 54, 2013, pp. 90–98.

32.

Chen

, and Tarko

A. P.

. Modeling Safety of Highway Work Zones with Random Parameters and Random Effects Models. Analytic Methods in Accident Research, Vol. 1, 2014, pp. 86–95.

33.

Livanis

G. T.

, Salois

, and Moss

. A Nonparametric Kernel Representation of the Agricultural Production Function: Implications for Economic Measures of Technology. In Proceedings of the 83rd Annual Conference of the Agricultural Economics Society, Dublin, Ireland, 2009.

34.

Thakali

, Fu

, and Chen

. A Comparison Between Parametric and Nonparametric Approaches for Road Safety Analysis: A Case Study of Winter Road Safety. Presented at 93rd Annual Meeting of the Transportation Research Board, Washington, D.C., 2014.

35.

Nadaraya

E. A.

On Estimating Regression. Theory of Probability and Its Applications, Vol. 9, No. 1, 1964, pp. 141–142.

36.

Watson

G. S.

Smooth Regression Analysis. Sankhya: Indian Journal of Statistics, Series A, Vol. 26, No. 4, 1964, pp. 359–372.

37.

Pagan

, and Ullah

. Nonparametric Econometrics. Cambridge University Press, Cambridge, United Kingdom, 1999.

38.

Silverman

B. W.

Density Estimation for Statistics and Data Analysis. CRC Press, Boca Raton, Fla., 1986.

39.

Lavergne

, and Vuong

. Nonparametric Significance Testing. Econometric Theory, Vol. 16, No. 4, 2000, pp. 576–601.

40.

, Li

, and Liu

. Bootstrap Non-Parametric Significance Test. Journal of Nonparametric Statistics, Vol. 19, No. 6–8, 2007, pp. 215–230.

41.

Dudek

Variable Selection in the Kernel Regression Based Short-Term Load Forecasting Model. In Artificial Intelligence and Soft Computing, Springer, New York, 2012, pp. 557–563.

42.

Council

F. M.

, and Stewart

J. R.

. Safety Effects of the Conversion of Rural Two-Lane to Four-Lane Roadways Based on Cross-Sectional Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 1665, TRB, National Research Council, Washington, D.C., 1999, pp. 35–43.

43.

Begum

Safety Performance Assessment of Ontario Highway Sections. MASc thesis. Ryerson University, Toronto, Ontario, Canada, 2008.

44.

Ahmed

, Huang

, Abdel-Aty

, and Guevara

. Exploring a Bayesian Hierarchical Approach for Developing Safety Performance Functions for a Mountainous Freeway. Accident Analysis and Prevention, Vol. 43, No. 4, 2011, pp. 1581–1589.

45.

Zhang

Variable Selection in Nonparametric Regression with Continuous Covariates. Annals of Statistics, Vol. 19, No. 4, 1991, pp. 1869–1882.