Abstract
Safety is a critical factor in evaluating autonomous vehicles, and real-world crash data provide valuable insights for assessing autonomous vehicle (AV) safety performance. While structured AV crash data have been widely used to analyze general crash patterns, unstructured crash narratives contain rich contextual information that remains underutilized. These narratives offer in-depth descriptions of crash circumstances, making them essential for understanding AV crash causes. However, extracting meaningful insights from these narratives presents challenges such as data scarcity and class imbalance in cause classification. Therefore, this study utilizes an improved bidirectional encoder representations from transformers (BERT) model to classify sentences related to crash causes and then perform fine-grained cause analysis using topic modeling method latent Dirichlet allocation. Then, text similarity between cause sentences and topic word is computed for topic assignment. To address the problem of data scarcity and class imbalance in cause classification, mixup data augmentation strategy and focal loss are respectively integrated to the BERT model. Experimental results on real California Department of Motor Vehicles crash reports show a significant improvement in cause sentence classification performance compared with baseline methods. Specifically, accuracy, precision, recall, F1-score, and area under curve increased by approximately 4.95%, 8.39%, 20.25%, 14.32%, and 10.16%, respectively. Topics of cause sentences are summarized into three groups, including operational scene, location, and driving status in AV crashes. The results indicate that crashes are most common in operational scenes such as “traffic yielding,”“waiting to turn,” and “pedestrian yielding”. For location-related factors, crashes frequently occur at “intersections” and “stop signs”. Notably, within the driving status category, “manual operation” is the most critical factor.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
