Abstract
Accurate near-term passenger train delay prediction is critical for optimal railway management and providing passengers with accurate train arrival times. In this work, a novel bi-level random forest approach is proposed to predict passenger train delays in the Netherlands. The primary level predicts whether a train delay will increase, decrease, or remain unchanged in a specified time frame. The secondary level then estimates the actual delay (in minutes), given the predicted delay category at primary level. For validation purposes, the proposed model has been compared with several alternative statistical and machine-learning approaches. The results show that the proposed model provides the best prediction accuracy compared with other alternatives. Moreover, constructing the proposed bi-level model is computationally cheap, thereby being easily applicable.
Get full access to this article
View all access options for this article.
