Abstract
New stations (such as metro stations) will bring remarkable changes to the local transportation and economic development. Understanding patterns of factors which importantly impact on public transit ridership in the surrounding areas of new stations is essential to their construction planning, like estimating the possible ridership. Built environment variables with high importance magnitude, which were thought applicable to estimate public transit ridership in other areas of the same category, were described as transferable variables (TVs) in this study. A transferability analysis method of the built environment for the ridership estimation was constructed by adopting partial least square regression (PLSR) based on available data. Taking Wuhan, China as an example, this study analyzed the changes and differences of the built environment variables in different categories of pedestrian catchment areas (PCAs) of metro stations on the importance and transferability magnitude for the metro and taxi ridership, based on the metro and taxi data of one week in January, April, and June. Performances of the ridership estimation based on TVs and all the built environment variables were compared. This study inferred that (1) most of the land use variables (about 85%) showed important influence on the metro and taxi ridership, while only about 18% of the other variables showed key impact. The importance magnitude of the built environment variables was mainly related to PCA categories and public transportation modes, but less related to time. (2) Highly important built environment variables also tended to be highly transferable. Transferability magnitude of the built environment variables for the ridership was related to PCA categories and types of public transport. (3) Compared to all the built environment variables, using TVs, the relative accuracy of the metro and taxi ridership estimation was around 20% and 18% higher respectively.
Keywords
Highlights
Exploring the public transit ridership (metro and taxi) in different categories of pedestrian catchment areas of metro stations in Wuhan.
Understanding the importance and transferability magnitude of the built environment variables for the metro and taxi ridership estimation.
Comparing the ridership estimation results based on transferable variables and all the built environment variables.
The findings could be helpful in the decision-making process for constructing new metro stations.
Introduction
The built environment in the pedestrian catchment areas (PCAs) of metro, train, or bus stations plays an important role in attracting human activities (Yang et al., 2016), which have a significant influence on the public transit ridership within it (Chakour and Eluru, 2016). The estimation of public transit ridership is a critical task for the planning of new stations. Their construction is expensive and difficult to abandon or replace. New stations bring significant changes to the traffic situation and the development of the surrounding areas. However, there are no existing public transit ridership data in such new areas. Therefore, based on the accessible data, this study attempts to use the built environment variables in the PCAs of existing stations to estimate the ridership in those of new stations. This study is significant for the scientific planning and operation of urban public transportation systems.
Many models have been proposed to determine the relationship between the built environment variables and public transit ridership and estimate the ridership. For example, (1) the four-step model, which considers trip generation, distribution, mode choice, and route assignment for modeling and estimating ridership, has been used by scholars worldwide (McNally, 2007). However, Marshall and Grady (2006) pointed such a model would invite several potential problems, such as model accuracy, sensitivity to land use, institutional barriers, and cost of use. (2) The activity-based model assumes that traveling is the derivation of the demand for personal activities (Shiftan and Suhrbier, 2002). Although this model can estimate ridership by generating time- and mode-specific trip matrices, it is expensive to implement and maintain (Bowman and Ben-Akiva, 2001). Zhao et al. (2014b) pointed out that transit providers usually would not be able to participate in modeling and estimation, which would restrict quick response to the results. (3) Direct ridership models (DRMs), which overcome the disadvantages of the four-step and activity-based models, have recently gained significant attention (Kepaptsoglou et al., 2017). DRMs based on regression analysis are complementary approaches for estimating ridership as a function of the built environment and transit service features within the surroundings of stations (Cervero, 2006; Chu, 2004; Gutiérrez et al., 2011; Kuby et al., 2004). Compared to the above two models, DRMs have faster responses and lower costs, and are simpler to apply, validate, and develop (Cardozo et al., 2012; Walters and Cervero, 2003). Additionally, DRMs results are easier to analyze and explain, which may enable one to precisely understand the impact of built environment variables on public transit ridership (Gutiérrez et al., 2011). Generally, the ridership and built environment are always taken as dependent and independent variables, respectively. The impact of the built environment variables on the ridership at different times (weekdays, weekends, peak, and non-peak periods) is determined through a particular model and measured by the corresponding variable coefficients. The values of the coefficients reflect the impact magnitude of independent variables on dependent variables, and the sign determines the correlation (positive or negative correlation) between the two.
With the accessible data, this study took Wuhan, China as an example to understand the importance magnitude of the built environment on travel patterns of urban residents in more detail. In this study, built environment variables with high importance magnitude, which were thought suitable for ridership estimation in other areas of the same category, were defined as transferable variables (TVs). Transferability magnitude indicated the length of time when built environment variables were transferable or maintained high importance magnitude to transit ridership variables. The longer built environment variables were transferable, the higher transferability magnitude they had. Partial least square regression (PLSR) was adopted to explore the importance magnitude of each built environment variable, which was represented by the variable importance in project (VIP) values, for the metro and taxi ridership in different categories of PCAs of metro stations. Besides, this study further analyzed the transferability magnitude with time and evaluated the estimation performance with TVs. This study helps to optimize the regional industrial layout and traffic planning to meet the dynamic changes in the travel needs of urban residents, as well as providing a scientific base for the construction and development of the same category of metro stations based on the available data.
The remainder of this paper was organized as follows. The existing literature was reviewed in section “Literature review”. The study area and dataset used were described in section “Study area and data”. The study process was described in section “Methods”. In section “Result and discussion”, the importance and transferability magnitude of the built environment for metro and taxi ridership were analyzed, and ridership estimation was evaluated. Finally, section “Conclusion” summarized the study and discusses its disadvantages as well as future scope.
Literature review
Modeling and estimating public transit ridership is essential for analyzing the project viability of stations and the development of urban areas (Zhao et al., 2013). There are three main categories of DRMs: traditional, spatial (Guo and Huang, 2020), and machine learning models. In the traditional category, ordinary least squares (OLS) regression is considered as the basic approach (An et al., 2019; Kim et al., 2016; Sohn and Shim, 2010; Zhao et al., 2013). Other types of models in this category include Poisson regression (Chu, 2004), negative binomial regression (Thompson et al., 2012), stepwise regression (Currie et al., 2011; Li et al., 2020), and PLSR (Chen et al., 2022; Zhao et al., 2014a). Furthermore, some studies have estimated the ridership from multiplicative models (Choi et al., 2012; Zhao et al., 2014b). These are estimated to be linear models following logarithmic transformation, and hypothesize that the ridership is associated with the product of explanatory factors (Kepaptsoglou et al., 2017).
Many scholars adopt spatial models to take spatial effects into consideration. Geographically weighted regression (GWR) has been implemented in empirical studies in many economically developed areas with high population density, such as Sydney (Blainey and Mulley, 2013), New York City (Qian and Ukkusuri, 2015), Seoul (Sung et al., 2014), and Beijing (Zhu et al., 2019). Besides, in order to overcome the GWR limitations, improved versions of GWR, like multi-scale geographically weighted regression (MGWR) (Fotheringham et al., 2017) and geographically and temporally weighted regression (GTWR) (Fotheringham et al., 2015) have been proposed and adopted. For example, Lyu et al. (2020) explored the multi-scale spatial relationship between public bicycle ridership and built environment in Nanjing. Ma et al. (2018) applied GTWR to investigate the spatiotemporal influence of the built environment on the hourly public transit ridership in Beijing. Additionally, other types of spatial models, such as distance-decay weighted regression (Gutiérrez et al., 2011) and network kriging regression (Zhang and Wang, 2014), have been implemented.
Nowadays, although linear and log-linear regression methods have been the most prevalent models in this research area, some scholars begin to argue that it is also essential to discard the assumption that a linear relationship or log-linear relationship exists between these two research targets (Ding et al., 2019; Gan et al., 2020). Therefore, machine learning models have recently been taken prevalence and used gradually (Yan et al., 2020). Ding et al. (2019) employed gradient boosting decision trees to investigate the non-linear influence of the built environment on average weekday passenger boarding of metro stations. Additionally, Gan et al. (2020) adopted gradient boosting regression model to explore the relationship between built environment and metro ridership at station-to-station level and compared to traditional multiplicative model.
The public transit ridership variation in the PCAs of stations is associated with the station characteristics and its surrounding environment (Chen et al., 2019; Gutiérrez et al., 2011; Li et al., 2019, 2020). The attributes of a station mainly comprise the distance to the city center (Ding et al., 2019) and the connection characteristics within the transit network, such as whether it is a terminal or transfer station and its closeness and betweenness centrality (Cardozo et al., 2012; Sohn and Shim, 2010). Land use of the surrounding environment reflects different human activities, and hence contributes to various public transit ridership (Chakraborty and Mishra, 2013). The proportions of residential, commercial, business, government, and industrial areas or floor area are the most chosen variables in the land use category (Tu et al., 2018), along with land use mix (Ding et al., 2019; Lee et al., 2013). Besides, points of interest (POIs) are able to reveal more detailed land use characteristics, therefore their impact on transit ridership are paid more attention gradually (An et al., 2019; Chen et al., 2019; Li et al., 2019). Demographic and socio-economic information is used to understand the daily travel plan of residents; population density is used to represent the activity demand of people and socio-economic factors, such as employment and income, are also associated with city-wide travel (Jun et al., 2015; Qian and Ukkusuri, 2015; Tu et al., 2018). External road connectivity characteristics, such as road length, intersection density, and road density (Jun et al., 2015), and intermodal connection factors, such as the number of bus stops and bus lines (Ding et al., 2019; Lee et al., 2013; Zhao et al., 2013), are used to reflect the accessibility of the station.
There are several research gaps regarding modeling and estimating public transit ridership that need to be filled: (1) How does the importance magnitude of the built environment variables for public transit ridership, such as metro systems and taxis, change? (2) Which built environment variables are transferable? Judging the transferability magnitude involves determining the key factors influencing the ridership. (3) The effect of using TVs to estimate ridership. Comparing the improvement in accuracy will help understand the role played by TVs, thereby providing new perspectives for ridership estimation.
In order to fill these gaps, this study made some improvements. First, stations were divided into different categories according to their surrounding land use, instead of considering all of them as a category. Stations are in various functional areas of the city, thus the difference of the influence on the transit ridership of the same built environment variable in different location might be ignored if the stations are grouped into one category. Second, unlike most studies focusing on one type of transportation mode, this study explored the metro and taxi ridership. The relationship between the built environment and transit ridership were analyzed more comprehensively. Finally, importance magnitude of built environment variables was analyzed. Previous studies always used the model coefficients to reflect the relationship between the built environment and ridership variables. However, they could not accurately reveal the importance magnitude of various built environment variables, due to the unit of different built environment variables. In this study, PLSR was selected to explore the impact of built environment variables on transit ridership, because of its relatively good performance on multicollinearity, model interpretability, and calculation cost (Chen et al., 2022). It is more effective when the number of samples is similar to or less than that of the variables (Qiao et al., 2018), which is more suitable for this study due to the stations’ classification. Moreover, this model provides the VIP value to reflect variable importance magnitude. Besides, there is a commonly used threshold of the VIP value, which is 1, to indicate a variable have high importance magnitude (important) or low one (unimportant) (Ong et al., 2021).
Study area and data
Study area
Wuhan is the capital city of the Hubei Province and transportation hub of central China. By the end of 2017, the Wuhan metro system had carried a total of 927 million passengers. The average daily ridership reached 2.54 million, which accounted for 23.5% of the city’s public transportation ridership. By 2020, there were 10 metro lines in Wuhan (Lines 1–8, Line 11, and Line Yangluo), with a total operating mileage of 360 km, and ranking 1st in central China. The distribution of the Wuhan metro system was illustrated in Figure 1. The foundation for classifying metro stations into different categories (work, residential, etc.) (Figure 1) is discussed under ‘Categories of PCAs’. It is observed that most of metro stations are located within the third ring road of Wuhan, which is the main urban zone. Currently, to support the construction of key development regions and alleviate urban traffic congestion, Wuhan is vigorously building its urban metro system, thus increasing the demand for scientific subway planning.

The Wuhan metro system.
Data description
Land planning, metro smart card (SCD), taxi GPS trajectory, POIs, roads, population distribution, and other city basic data were used in this study. Land planning data, obtained from the detailed city-wide regulatory plan, were used to classify PCAs. This legal plan provided specific guidance for the urban construction in Wuhan. Land planning data of Wuhan for 2020 enlisted 27 types of functional land, such as education and scientific research land. Metro SCD and taxi GPS trajectory data were used to calculate the flow of residents entering and leaving the PCAs through metro or taxi as the variables of public transit ridership. Metro SCD denoted the number of citizens traveling by metro in three weeks, namely 15–21 January, 16–22 April, and 4–10 June, 2018. Additionally, taxi GPS trajectory data revealed the operation of taxis during the same period. Road, POIs, population distribution, and other basic city data, such as bus stations, were described as the built environment variables in PCAs. The POIs from 2018 used in this study were collected by a web crawler using AMap APIs.
Methods
The research flow is illustrated in Figure 2. First, data were preprocessed to acquire metro and taxi ridership and built environment variables. Second, PCAs of metro stations were divided into different categories based on the main land planning types. Third, PLSR was adopted to model the built environment and public transit ridership (metro and taxi) in different categories of PCAs. Finally, the changes in the importance and transferability magnitude of the built environment variables for the ridership ones were analyzed, after which this study compared the differences in estimating ridership with TVs and all the built environment variables.

Research flow.
Categories of PCAs
Selection of the PCA range is essential in the studying the impact of the surrounding environment of metro stations on public transit ridership. The radius of PCAs is generally 300–900 m (Gan et al., 2020; Gutiérrez et al., 2011; Jun et al., 2015; Li et al., 2020). This study considered approximately 10 min walking time of residents as the threshold, so 600 m was chosen as the PCA radius. However, some PCAs overlapped when the straight-line distance between their corresponding metro stations was < 600 m. Therefore, the Thiessen polygons were adopted to intersect with the original PCAs to eliminate the overlap.
PCAs classification depended on The Athens Charter, which was developed by the Fourth Congress of the Congrès Internationaux d’Architecture Moderne (CIAM IV) in 1933. It is considered to be a programmatic document of modern urban planning and is an important reference for the planning and development of many cities (GOLD, 1998). There are four basic function categories of cities in The Athens Charter: residence, work, recreation, and transportation, according which the PCAs of this study were classified (Table 1). For example, PCAs mainly comprising commercial service, administrative office, and industrial land were classified under “work”. Due to limited number of PCAs, this study chose residential and work PCAs as the research objects.
Categories of PCAs.
Built environment and public transit ridership variables
Public transit ridership and the built environment variables were set as dependent variables Y (Table 2) and independent variables X, respectively (Table 3).
Public transit ridership variables.
Built environment variables.
Taking Y1 as an example,
These four ridership variables were obtained from the metro SCD and taxi GPS trajectory data. For the metro ridership variables, this study counted the number of people boarding and alighting each metro station at each time period after splitting the data according to different metro stations. Taxi ridership variables of each PCA were calculated by cleaning the taxi GPS trajectory data, splitting the order according to the ID of each taxi, and counting the number of pick-ups and drop-offs.
Table 3 summarized the built environment variables used in this study. Taking station A as an example, X1 represented the complexity of the road network, the formula for which was:
where
X2 was the mean shortest path length of road calculated by:
where
X6, the degree of a metro station, denoted the number of stations directly connected to A. Additionally, if the value of X7, the betweenness centrality of A, was larger, then the shortest paths between more stations in the metro system would pass through station A. The corresponding formula was as follows:
where s and u indicate the other two metro stations.
Regression modeling of the built environment and public transit ridership variables
PLSR is a regression model mainly related to multiple linear regression, typical correlation analysis, and principal component analysis. It establishes a regression model between the
where X is the independent variable and denotes the
The built environment and metro and taxi ridership variables were modeled by using an R package developed by Mevik and Wehrens (2007). Taking Y1 in the t period of residential PCAs as an example, the formula was as follows:
where
The importance magnitude of built environment variables
VIP value was chosen to denote the importance magnitude of the built environment variables. It is an indicator of feature selection in PLSR and can be used to measure the relative importance and contribution of the independent variables to the dependent variables (Mehmood et al., 2012; Mukherjee et al., 2015). The formula for calculating the VIP of variable
where
The threshold of VIP was usually set to 1, and an independent variable with a VIP value >1 was considered important, otherwise it was unimportant (Mehmood et al., 2012; Mukherjee et al., 2015; Suo et al., 2018). Therefore, when the VIP value of a built environment variable was >1, the variable was considered transferable with an important impact on the ridership. A larger VIP value denoted higher importance magnitude.
Public transit ridership estimation model and evaluation
TVs and all the built environment variables were used to estimate the metro and taxi ridership and their results were compared. Considering the estimation of Y1 in the
where
In order to compare the performance of these two methods, 5-fold cross-validation was employed in the metro and taxi ridership estimation in each category of PCAs. The mean absolute error (MAE) and its percentage form MAE% was selected as the evaluation index. The formula of MAE% was shown as follows:
where
The calculation of MAE was given as:
where
Result and discussion
Analyzing the importance magnitude of the built environment variables on regression modeling of the metro and taxi ridership
This section discussed the importance magnitude of the built environment variables for the metro and taxi ridership from two aspects: time-varying law and sorting.
For time-varying, the importance magnitude of the built environment variables on the metro and taxi ridership changed over time. Figure 3 showed the hourly variation in the VIP values of X12 (financial insurance service POIs) and X14 (corporate enterprise POIs) for the metro and taxi ridership of work PCAs on Mondays in January, April and June, and Figure 4 demonstrates their situation on Saturdays for the same time frames. Additionally, the gray regions in these figures indicated that the VIP values were <1. For the metro ridership, as shown in Figure 3(a), the average VIP values of X12 and X14 for Y1 (metro boarding) on Mondays were <1 during the peak morning hours (6:00–9:00) and reached their highest value (>1.1) for the day during 17:00–19:00, i.e., the peak evening hours. However, for Y2 (metro alighting), their importance magnitude was highest in the peak morning hours, especially during 7:00–9:00 (VIP values of X12 >1.2 and X14 >1.1), and lowered (VIP values <1) after 17:00, which was opposite to that for Y1. The period of relatively higher VIP values (>1.1) corresponded to that of a larger metro ridership in a day. The trend was similar to the variation of morning and evening peak metro ridership. Notably, such trend of X12 and X14 on Saturdays was the same but with lower VIP values. Nevertheless, compared to the metro ridership, there were mainly two difference of VIP values for the taxi ridership, as shown in Figure 3(b) and Figure 4(b). First, the VIP values of both X12 and X14 were >1 in almost all time periods, which indicated X12 and X14 had a lasting important influence on the taxi ridership. Second, the change trend of VIP values did not reflect characteristics of morning and evening peak. Although the trend was not very similar in different months like that for the metro ridership, there was a decreasing tendency of importance magnitude from morning to evening.

Changes in the VIP values of X12 and X14 for the metro and taxi ridership on Mondays in work PCAs.

Changes in the VIP values of X12 and X14 for the metro and taxi ridership on Saturdays in work PCAs.
Two conclusions are also drawn in terms of sorting. Tables 4 and 5 showed the daily average importance magnitude of each built environment variable in work PCAs for Y1 and Y2, respectively. The magnitudes in other situations were shown in Appendix A. The values were mentioned in two decimal places; hence, there were cases where the ranks were different, but the values were the same.
On average, one or two (about 18%) variables in external connectivity, intermodal connection, population, station categories (non-land use categories, X1–X8) were important (mean VIP values >1) for the metro and taxi ridership. On the contrary, eight or nine (about 85%) land use variables showed important influence on the ridership. In general, the influence of land use variables on metro and taxi ridership was greater than that of non-land use ones.
Among all the built environment variables in this study, those with the highest importance magnitude for the metro and taxi ridership were independent of time (weeks, months), but were related to PCA categories and transportation modes. In residential PCAs, the most important variables in all variable categories for the metro and taxi ridership were X17 (accommodation services POIs) and X15 (serviced apartment POIs), respectively. Moreover, in non-land use category, only X8 (distance to the city center) and X5 (population size), with VIP values >1, separately showed an important influence on the metro and taxi ridership. In work PCAs, X10 (tourist attraction POIs) and X17 had the greatest impact on the metro ridership, while X18 (scientific and education service POIs) on the taxi ridership. Additionally, in non-land use category, X5 was the most important factor for both metro and taxi ridership.
Rank of mean VIP values of the built environment variables for Y1 in work PCAs.
Rank of mean VIP values of the built environment variables for Y2 in work PCAs.
Analysis of TVs for the metro and taxi ridership
This section has discussed the transferability magnitude of the built environment variables. The built environment variables (VIP values >1) were considered to have important influence on the ridership, so they were thought transferable and applicable to be used to estimate the ridership in other PCAs of the same category. Variables maintaining important influence (VIP values >1) for longer periods indicated they had higher transferability magnitude.
Tables 6 and 7 show the number of hours when each built environment variable of residential and work PCAs was transferable on weekdays (Monday to Friday; 80 hours) and weekends (Saturday and Sunday; 32 hours), respectively. In work PCAs, from the aspects of time and traffic modes, about 85% of the land use variables were found to be transferable at least 70% of the time (including weekdays and weekends), and the non-land use variables accounted for almost 19%, while they were about 72% of the land use variables and 12% of the non-land use ones in residential PCAs. This indicated that most land use variables were highly transferable, and served as key factors affecting the metro and taxi ridership. However, most non-land use variables showed low transferability magnitude. The last section illustrated that although the mean VIP values of most non-land use variables were <1, indicating their overall slight influence, they showed important impact on the ridership in some periods. For example, during 6:00–7:00 on Mondays in residential PCAs, the VIP values of X3 (number of bus stations) and X4 (number of bus lines) for Y1 were >1 (Figure 5).
The total hours when the built environment variables were transferable in residential PCAs.
The total hours when the built environment variables were transferable in work PCAs.

Change in the importance magnitude of non-land use variables for Y1 on Mondays in residential PCAs.
Transferability magnitude of the built environment variables was jointly affected by PCA categories and traffic modes. Table 8 showed the comparison of total hours when the built environment variables were transferable for the metro and taxi ridership on weekdays and weekends in three months. In residential PCAs, transferable duration for the metro ridership was remarkably less than that for the taxi ridership. The average time difference was about 187 hours on weekdays and 87 hours on weekends. Both accounted for nearly 28% of the average duration of the metro ridership in the corresponding time period (weekdays or weekends). This suggested that the VIP values for the taxi ridership remaining >1 tended to be more lasting and stable. Additionally, the VIP values for the metro ridership were susceptible to the morning and evening peak periods, such as X12 and X14 (the last section). The transferable durations for the metro and taxi ridership were similar in work PCAs, with difference accounting for about 3% of the mean duration for the metro ridership on weekdays and 6% on weekends. Furthermore, the total hours when the built environment variables were transferable for the metro and taxi ridership in work PCAs were more than those in residential PCAs, especially for the metro ridership. The time difference reflected that the built environment variables in work PCAs maintained a key influence on the metro and taxi ridership for a longer time period.
The total hours when the built environment variables were transferable for the metro and taxi ridership.
Comparative analysis of the estimation results of the metro and taxi ridership in PCAs
The metro and taxi ridership were estimated with TVs and all the built environment variables respectively based on 5-fold cross-validation, and their results were compared. This section illustrated the estimation of the average daily ridership. The specific estimation models of one fold for estimating the average daily metro and taxi ridership on Monday and Saturday in January in residential PCAs using Equation (9) are shown in Table 9 as an example.
Estimation models of the average daily metro and taxi ridership on Monday and Saturday in January in residential PCAs.
Table 10 showed the average performance of the metro and taxi ridership estimations on weekdays and weekends in work and residential PCAs, where the value column represented the actual values, and the MAE_A and MAE_T columns showed the MAE of the results estimated by all the built environment variables and TVs, respectively. Besides, the MAE% column, calculated by Equation (10), revealed the relative improvement in the accuracy of the ridership estimation.
Average performances of the metro and taxi ridership estimation in residential and work PCAs.
The improvement in estimation accuracy was affected by transportation types and PCA categories. From the aspect of transportation types, in residential and work PCAs, the average accuracy improvement for the metro ridership estimation based on TVs on weekdays and weekends was greater than that for the taxi ridership estimation. As shown in Table 10, the average improvement for the metro ridership (Y1 and Y2) prediction in January, April, and June was 19.73%, 21.02%, and 21.64% respectively, while that for the taxi ridership (Y3 and Y4) prediction was 18.33%, 19.83%, and 17.88%. Additionally, from the aspect of PCA categories, more accurate results were achieved for both ridership in residential PCAs. In residential PCAs, all the average improvement for the metro and taxi ridership were 22.65% and 21.30%, which was greater than those in work ones (18.95% and 16.06%). Besides, in each category of PCAs, the MAE% of the metro or taxi ridership estimation were similar in each month, which meant the accuracy improvement was not significantly related to the weather. Therefore, the improvement of estimating the metro and taxi ridership in different categories of PCAs were different, with the more obvious in residential PCAs. Besides, it was inferred that the ridership estimation error based on TVs was smaller than that based on all the built environment variables, with an average decrease of about 20% and 18% for the metro and taxi one respectively.
Conclusion
This study adopted PLSR to explore the importance magnitude of each built environment variable for the metro and taxi ridership in residential and work PCAs and the change in transferability magnitude with time, and then evaluated the estimation performance with TVs. The main conclusions drawn from this study were as follows: first, the importance magnitude of the built environment variables varied over time. The magnitude variation tendencies were mainly affected by PCA categories and traffic modes, and were generally similar in different seasons. Most of land use variables (about 85%) had a key impact on the metro and taxi ridership, while only around 18% of variables non-land use variables had important influence on the ridership. In residential PCAs, the vital built environment variables for the metro and taxi ridership were X17 (accommodation service POIs) and X15 (serviced apartment POIs), respectively. Besides, they were X10 (tourist attraction POIs) and X17 for the metro ridership, and X18 (scientific and education service POIs) for the taxi ridership in work PCAs. Notably, in residential PCAs, X8 (distance to the city center) and X5 (population size) had important influence on the metro and taxi ridership respectively, although the overall VIP values of non-land use variables were small. Additionally, in non-land use category of work PCAs, X5 was the most important factor for the metro and taxi ridership.
Second, in more than half of the time, nearly 80% land use variables were highly transferable, while around 85% non-land use variables showed low transferability magnitude. Highly important variables were generally highly transferable. In residential PCAs, the total number of hours when the built environment variables were transferable for the taxi ridership was greater than that for the metro ridership, while they were similar in work PCAs. Additionally, in work PCAs, the transferable time for the metro and taxi ridership was more than that in residential PCAs, especially for metro ridership.
Third, the estimation accuracy based on TVs was 20% for the metro ridership and 18% for the taxi one respectively higher than that based on all the built environment variables, even though the ridership varied in different categories of PCAs.
Based on the analysis of this study, some policy implications could be summarized as follows:
The VIP variation tendencies of X12 (financial insurance service POIs) and X14 (corporate enterprise POIs) for metro ridership also had obvious characteristics of peak hours in the morning and evening on weekends, instead of only on weekdays. This implied that in such PCAs, the metro management department should also pay attention to the ridership change on weekends during the peak hours and imply the ridership control measures in time.
Land use variables were the main factors that influencing changes of public transit ridership, which suggested there is no need to pay too much attention to non-land use variables, such as bus stops, in PCAs.
Predicting public transit ridership based on the TVs achieved more accurate results than those based on all variables. In order to reduce error, planners could use this method or idea proposed in this study to estimate when planning a new metro station in cities.
This study focused on the metro and taxi ridership in Wuhan, China. However, due to the data limitation, low-carbon traffic modes, such as shared bikes and electric vehicles, are not considered in this study. With the booming of sharing economy, such traffic modes will bring significant influence on the traffic and economic development. Therefore, it is important to include them in future analysis, which will help to understand the interaction between built environment and public transit ridership more comprehensively. Additionally, the importance and transferability magnitude of the built environment for public transit ridership in other cities should also be explored in order to verify the validity if the data is available.
Footnotes
Appendix A
Rank of mean VIP values of the built environment variables for Y4 in residential PCAs.
| Time | Rank | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | ||
| Jan | Mon | X15(1.38) | X13(1.27) | X14(1.24) | X16(1.24) | X12(1.23) | X11(1.21) | X9(1.19) | X17(1.17) | X18(1.17) | X5(1.13) | X10(0.90) | X4(0.81) | X8(0.68) | X6(0.68) | X7(0.59) | X1(0.27) | X3(0.25) | X2(0.15) |
| Tue | X15(1.38) | X13(1.27) | X12(1.23) | X16(1.23) | X14(1.22) | X11(1.22) | X9(1.19) | X18(1.17) | X17(1.16) | X5(1.14) | X10(0.89) | X4(0.83) | X6(0.68) | X8(0.66) | X7(0.62) | X1(0.27) | X3(0.26) | X2(0.13) | |
| Wed | X15(1.35) | X12(1.24) | X13(1.24) | X14(1.24) | X16(1.21) | X11(1.20) | X18(1.17) | X9(1.17) | X17(1.15) | X5(1.11) | X10(0.91) | X4(0.81) | X6(0.72) | X8(0.66) | X7(0.64) | X3(0.31) | X2(0.30) | X1(0.29) | |
| Thu | X15(1.37) | X12(1.27) | X14(1.26) | X13(1.24) | X16(1.22) | X11(1.20) | X17(1.17) | X18(1.17) | X9(1.16) | X5(1.15) | X10(0.88) | X4(0.84) | X8(0.66) | X6(0.66) | X7(0.59) | X3(0.27) | X1(0.27) | X2(0.15) | |
| Fri | X15(1.36) | X13(1.25) | X12(1.24) | X14(1.23) | X16(1.22) | X9(1.18) | X11(1.17) | X18(1.17) | X5(1.16) | X17(1.14) | X10(0.89) | X4(0.84) | X6(0.70) | X8(0.69) | X7(0.63) | X3(0.31) | X1(0.28) | X2(0.21) | |
| Sat | X15(1.34) | X16(1.27) | X12(1.24) | X11(1.22) | X14(1.21) | X17(1.21) | X13(1.21) | X18(1.21) | X9(1.17) | X5(1.12) | X10(0.92) | X4(0.85) | X7(0.65) | X6(0.63) | X8(0.61) | X3(0.27) | X1(0.27) | X2(0.18) | |
| Sun | X15(1.36) | X16(1.26) | X13(1.24) | X11(1.23) | X18(1.21) | X9(1.20) | X17(1.19) | X14(1.18) | X12(1.16) | X5(1.14) | X10(0.91) | X4(0.84) | X7(0.68) | X8(0.62) | X6(0.62) | X3(0.29) | X1(0.27) | X2(0.24) | |
| Apr | Mon | X15(1.34) | X12(1.25) | X14(1.24) | X13(1.22) | X16(1.20) | X11(1.18) | X17(1.17) | X18(1.16) | X9(1.16) | X5(1.12) | X10(0.90) | X4(0.84) | X6(0.68) | X8(0.66) | X7(0.62) | X1(0.34) | X3(0.33) | X2(0.30) |
| Tue | X15(1.35) | X12(1.27) | X14(1.25) | X13(1.23) | X16(1.22) | X11(1.19) | X18(1.18) | X17(1.17) | X9(1.15) | X5(1.13) | X10(0.91) | X4(0.82) | X6(0.67) | X8(0.64) | X7(0.63) | X3(0.32) | X1(0.28) | X2(0.28) | |
| Wed | X15(1.37) | X13(1.25) | X12(1.25) | X16(1.24) | X14(1.23) | X18(1.18) | X9(1.18) | X11(1.17) | X17(1.15) | X5(1.13) | X10(0.93) | X4(0.81) | X6(0.72) | X8(0.66) | X7(0.64) | X1(0.28) | X3(0.27) | X2(0.19) | |
| Thu | X15(1.34) | X12(1.30) | X14(1.27) | X13(1.23) | X17(1.21) | X16(1.21) | X18(1.18) | X11(1.17) | X9(1.13) | X5(1.12) | X10(0.89) | X4(0.84) | X6(0.68) | X7(0.63) | X8(0.63) | X3(0.29) | X2(0.29) | X1(0.28) | |
| Fri | X15(1.34) | X12(1.24) | X13(1.24) | X14(1.22) | X16(1.21) | X18(1.17) | X9(1.16) | X11(1.16) | X17(1.15) | X5(1.13) | X10(0.91) | X4(0.86) | X6(0.70) | X7(0.66) | X8(0.66) | X2(0.32) | X1(0.29) | X3(0.29) | |
| Sat | X15(1.33) | X14(1.26) | X12(1.25) | X11(1.23) | X16(1.22) | X13(1.20) | X18(1.19) | X9(1.16) | X17(1.15) | X5(1.13) | X10(0.90) | X4(0.85) | X6(0.73) | X7(0.65) | X8(0.64) | X3(0.28) | X1(0.28) | X2(0.25) | |
| Sun | X15(1.32) | X16(1.22) | X14(1.21) | X11(1.20) | X13(1.19) | X12(1.19) | X18(1.18) | X17(1.17) | X9(1.16) | X5(1.15) | X10(0.88) | X4(0.87) | X7(0.69) | X8(0.63) | X6(0.61) | X2(0.46) | X3(0.39) | X1(0.32) | |
| Jun | Mon | X15(1.38) | X13(1.27) | X16(1.22) | X12(1.22) | X14(1.21) | X18(1.19) | X11(1.18) | X9(1.17) | X17(1.17) | X5(1.15) | X10(0.90) | X4(0.83) | X8(0.69) | X6(0.68) | X7(0.61) | X1(0.30) | X3(0.27) | X2(0.16) |
| Tue | X15(1.37) | X13(1.26) | X12(1.24) | X16(1.23) | X14(1.23) | X18(1.17) | X11(1.16) | X9(1.16) | X17(1.16) | X5(1.16) | X10(0.90) | X4(0.84) | X6(0.70) | X8(0.66) | X7(0.62) | X1(0.31) | X3(0.29) | X2(0.20) | |
| Wed | X15(1.34) | X14(1.23) | X13(1.22) | X12(1.20) | X16(1.20) | X11(1.19) | X9(1.17) | X18(1.17) | X5(1.15) | X17(1.12) | X10(0.91) | X4(0.85) | X8(0.70) | X6(0.67) | X7(0.64) | X2(0.38) | X3(0.33) | X1(0.32) | |
| Thu | X15(1.35) | X12(1.28) | X14(1.25) | X16(1.24) | X13(1.23) | X18(1.19) | X17(1.18) | X5(1.17) | X11(1.16) | X9(1.14) | X10(0.91) | X4(0.84) | X6(0.68) | X7(0.66) | X8(0.60) | X1(0.30) | X3(0.25) | X2(0.21) | |
| Fri | X15(1.33) | X14(1.21) | X12(1.21) | X13(1.20) | X16(1.18) | X11(1.15) | X18(1.15) | X5(1.15) | X9(1.14) | X17(1.08) | X10(0.89) | X4(0.89) | X6(0.81) | X8(0.79) | X7(0.70) | X2(0.42) | X3(0.33) | X1(0.28) | |
| Sat | X15(1.34) | X16(1.27) | X14(1.23) | X12(1.23) | X11(1.21) | X18(1.20) | X17(1.20) | X13(1.20) | X5(1.16) | X9(1.16) | X10(0.93) | X4(0.82) | X7(0.64) | X6(0.64) | X8(0.56) | X1(0.30) | X3(0.28) | X2(0.27) | |
| Sun | X15(1.34) | X12(1.25) | X16(1.24) | X14(1.22) | X18(1.22) | X17(1.20) | X11(1.20) | X13(1.19) | X5(1.17) | X9(1.13) | X10(0.90) | X4(0.86) | X7(0.65) | X6(0.63) | X8(0.59) | X1(0.28) | X2(0.26) | X3(0.26) | |
Acknowledgements
The authors are grateful to the two anonymous reviewers for their comments.
Authorship contribution statement
Zhixiang Fang: conceptualization, supervision, methodology, writing – review and editing. Lupan Zhang: data curation, methodology, formal analysis, writing – original draft. Meng Zheng: investigation, validation, writing – review and editing.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research was supported in part by the National Natural Science Foundation of China (Grants 41771473).
