Abstract
Every year, crashes in rural towns take lives and alter the dynamics of entire communities. This research seeks to understand a broad scope of factors that determine the severity of crashes that produce either severe or fatal results, specifically in and around the region of Texas known as the “Piney Woods.” In the past 10 years, the Piney Woods has accounted for over half of the rural population crashes in the state of Texas, provoking specific interest in understanding the influencing factors of the crashes that produce severe and fatal injuries in this region. Therefore, to better understand these contributing factors, we extracted data from the Texas Department of Transportation Crash Record Information System database, which were then further split into a training and a test data set. Then, five machine learning techniques, namely binary logistic regression, k-nearest neighbors, naïve Bayes, random forest, and an artificial neural network, were applied to the unseen test data. The random forest model produced the most promising results by predicting nonsevere crashes with 99.5% accuracy. The results of this research afford engineers and industry practitioners a greater understanding of the factors that influence crashes, specifically severe crashes, within the Piney Woods. This random forest model could be used along with the readily available input parameters to predict roadways and intersections that might yield severe crashes in the future.
Get full access to this article
View all access options for this article.
