Investigating the Use of Machine Learning Methods in Direct Ridership Models for Bus Transit

Abstract

This test paper develops and tests 13 direct ridership models (DRMs) for transit sketch planning the Dallas–Fort Worth region. We explore both, machine learning modeling approaches (e.g., ridge regression and random forest) and traditional statistical models (e.g., linear regression and multiplicative regression). This effort provides a detailed description of modeling workflows and of the preprocessing of input data including general transit feed specification (GTFS), employment, socio-demographic, and ridership data. We also describe metrics to compare model performance; in our experiments the ridge regression framework using a Yeo-Johnson power transformation led to the most accurate predictions with an $R^{2}$ of 0.88. The sensitivity of the DRM model to errors in the service-related predictor variables is within acceptable limits with the root mean squared error (RMSE) increasing by less than 20% for a 25% error in any one of the input predictors. Our findings suggest that DRMs can be a powerful complement to the four-step planning process, providing an alternative that is easier to maintain and run, and which may lead to more accurate ridership estimates given the limitations of transit modeling in traditional regional models. To illustrate the benefits of DRMs, this effort describes the deployment of trained models using a web-based framework which allows practitioners to obtain ridership estimates by drawing prospective routes on a map and providing a small number of service attributes as input.

Keywords

data and data science machine learning (artificial intelligence)planning and analysis demand estimation ridership estimation modeling decision tools

Get full access to this article

View all access options for this article.

References

Cervero

Alternative Approaches to Modeling the Travel-Demand Impacts of Smart Growth. Journal of the American Planning Association, Vol. 72, No. 3, 2006, pp. 285–295. https://doi.org/10.1080/01944360608976751.

Federal Highway Administration. 2017 National Household Travel Survey, 2017. https://nhts.ornl.gov.

Choi

Lee

Y. J.

Kim

Sohn

An Analysis of Metro Ridership at the Station-To-Station Level in Seoul. Transportation, Vol. 39, No. 3, 2012, pp. 705–722. https://doi.org/10.1007/s11116-011-9368-3.

Cervero

Murakami

Miller

Direct Ridership Model of Bus Rapid Transit in Los Angeles County, California. Transportation Research Record: Journal of the Transportation Research Board, 2010. 2145: 1–7.

Kuby

Barranda

Upchurch

Factors Influencing Light-Rail Station Boardings in the United States. Transportation Research Part A: Policy and Practice, Vol. 38, No. 3, 2004, pp. 223–247. https://doi.org/10.1016/j.tra.2003.10.006.

Mucci

R. A.

Erhardt

G. D.

Evaluating the Ability of Transit Direct Ridership Models to Forecast Medium-Term Ridership Changes: Evidence From San Francisco. Transportation Research Record: Journal of the Transportation Research Board, 2018. 2672: 21–30.

Baek

Sohn

Deep-Learning Architectures to Forecast Bus Ridership at the Stop and Stop-To-Stop Levels for Dense and Crowded Bus Networks. Applied Artificial Intelligence, Vol. 30, No. 9, 2016, pp. 861–885. https://doi.org/10.1080/08839514.2016.1277291.

Zhao

Deng

Song

Zhu

Analysis of Metro ridership at Station Level and Station-To-Station Level in Nanjing: An Approach Based on Direct Demand Models. Transportation, Vol. 41, No. 1, 2014, pp. 133–155. https://doi.org/10.1007/s11116-013-9492-3.

Peng

Z.-R.

Dueker

K. J.

Strathman

Hopper

A Simultaneous Route-Level Transit Patronage Model: Demand, Supply, and Inter-Route Relationship. Transportation, Vol. 24, No. 2, 1997, pp. 159–181. https://doi.org/10.1023/A:1017951902308.

10.

Yan

Liu

Zhao

Using Machine Learning for Direct Demand Modeling of Ridesourcing Services in Chicago. Journal of Transport Geography, Vol. 83, 2020, p. 102661. https://doi.org/10.1016/j.jtrangeo.2020.102661.

11.

Berrebi

S. J.

Joshi

Watkins

K. E.

On Bus Ridership and Frequency. Transportation Research Part A: Policy and Practice, Vol. 148, 2021, pp. 140–154. https://doi.org/10.1016/j.tra.2021.03.005.

12.

Rahman

Yasmin

Faghih-Imani

Eluru

Examining the Bus Ridership Demand: Application of Spatio-Temporal Panel Models. Journal of Advanced Transportation, Vol. 2021, 2021, pp. 1–10.

13.

Guerra

Cervero

Tischler

Half-Mile Circle: Does it Best Represent Transit Station Catchments?

Transportation Research Record: Journal of the Transportation Research Board, 2012. 2276: 101–109.

14.

Dill

Schlossberg

Meyer

Predicting Transit Ridership at the Stop Level: The Role of Service and Urban Form. Presented at 92nd Annual Meeting of the Transportation Research Board, Washington, DC., 2013.

15.

Chow

L.-F.

Zhao

Liu

M.-T.

Ubaka

Transit Ridership Model Based on Geographically Weighted Regression. Transportation Research Record: Journal of the Transportation Research Board, 2006. 1972: 105–114.

16.

Ding

Cao

Liu

How Does the Station-Area Built Environment Influence Metrorail Ridership? Using Gradient Boosting Decision Trees to Identify Non-Linear Thresholds. Journal of Transport Geography, Vol. 77, 2019, pp. 70–78. https://doi.org/10.1016/j.jtrangeo.2019.04.011.

17.

Liu

Gao

Liu

Decision Tree Based Station-Level Rail Transit Ridership Forecasting. Journal of Urban Planning and Development, Vol. 142, No. 4, 2016, p. 04016011. https://doi.org/10.1061/(ASCE)UP.1943-5444.0000331.

18.

Hastie

Tibshirani

Friedman

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, 2nd ed. Springer, New York, NY, 2016.

19.

Kuhn

Johnson

Applied Predictive Modeling, 1st ed. Springer, New York, NY, 2013.

20.

Box

G. E. P.

Cox

D. R.

An Analysis of Transformations. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 26, No. 2, 1964, pp. 211–252. http://www.jstor.org/stable/2984418.

21.

Miller

D. M.

Reducing Transformation Bias in Curve Fitting. The American Statistician, Vol. 38, No. 2, 1984, pp. 124–126. http://www.jstor.org/stable/2683247.

22.

Tiefelsdorf

A Variance-Stabilizing Transformation to Mitigate Biased Variogram Estimation in Heterogeneous Surfaces With Clustered Samples. In Advances in Geocomputation ( Griffith

D. A.

Chun

Dean

D. J.

, eds.), Springer International Publishing, Cham, 2017, pp. 271–280.

23.

Transitfeed. OpenMobilityData -Public Transit Feeds From Around the World, 2021. https://transitfeeds.com/.

24.

US Census Bureau. American Community Survey (ACS), 2021. https://www.census.gov/programs-surveys/acs.