Sage Journals: Discover world-class research

Abstract

As we all known, estimating the proportion of passenger route choice is of great significance in almost every aspect of urban rail transit control system, including passenger allocation, fare clearing and flow control strategies. Existing researches only pay attention to the route choice through travel time, but usually ignore the influence in different periods of the day. Therefore, this paper proposes a novel estimation method for the proportion of passenger route choice in different periods. Firstly, by introducing the normalized value of passenger flow and the standard coefficient of peak passenger flow, the train operation time is divided into peak and flat periods. Secondly, the travel time distribution of each route can be obtained by estimating the expected value and standard deviation of passenger travel time in each different period. The Naïve Bayes algorithm is further employed to realize the identification of the proportion of passenger route choices. Finally, this proposed algorithm is applied to Hangzhou Metro. The result shows that by using the segmented estimation, the error can be reduced by more than 60% compared with the whole-day experiment, which indicates the superiority of the method.

Keywords

Urban rail transit system Naïve Bayes algorithm proportion of route choice traffic control parameter estimation

Introduction

The networked operation of urban rail transit provides passengers with diversified route choices between origin-destination (OD). However, the mode of “one ticket transfer” only requires passengers swipe their ticket cards when they enter and exit the stations. Though it brings much convenience to passengers, but makes it difficult to confirm the specific routes of passengers. Mastering the travel routes of passengers is the basic to calculate the passenger flow of urban rail transit lines and evaluate the operation level of urban rail transit. It also guides the operation management department to improve the operation organization of urban rail transit and formulate operation control strategy reasonably and scientifically.

Some scholars have made a range of achievements on passenger route choice. The early researches were mainly based on the multi-routes probability distribution method,^1–6 by which the generalized travel cost of each route is obtained by establishing impedance functions including travel time, transfer time and other factors, and then the probability of route selection is calculated by LOGIT model and its deriving forms. Recently, with the emergence and wide application of automatic fare collection system, the information such as passengers’ enter stations with enter time, exit stations with exit time can be easily obtained. It provides researchers with new perspectives of passenger route choice estimation. Zhou et al.⁷ analyzed the travel time elements of urban rail transit passengers in detail. This research pointed out the travel time of passengers in each path obeyed normal distribution, and gained the path selection ratio through the AFC data. Hong et al.⁸ proposed an urban rail transit passenger flow distribution model, in which clustering technology was used to integrate similar travel times in AFC data and divided them into the same path. It provided a new idea for the study of passenger route selection. Cheng et al.⁹ filtered the unreasonable data in AFC information and analyzed the travel time characteristics of non-transfer passengers and transfer passengers. They verified that the passengers’ travel time obeyed logarithmic normal distribution and established the model of urban rail transit passenger travel path selection. Zhang et al.¹⁰ compared the correlation of travel time among multiple paths according to the AFC data, and established the route choice model based on the Gaussian mixture model. Ni et al.¹¹ studied the passenger path choice problem of multi-track rail transit network, and fitted the parameters of the established path selection model according to the AFC data. All of the above studies are considered from the perspective of aggregate rather than the individual passengers, ignoring the differences among passengers.

Wang et al.¹² used AFC data to identify the passengers’ travel information to speculate the travel chain of individual passengers. The results were corrected by the physical relationship between lines. Hu et al.¹³ calculated the travel time parameters of a single passenger with AFC data, and considered the matching relationship between train schedules to determine the passenger route selection scheme. Wu et al.¹⁴ mined AFC data in Beijing and adopted the similarity measurement method to match individual travel time with travel routes. The proportion of travel route choice was obtained by integrating individual data. Sun et al.¹⁵ restored multiple travel chains of passengers according to AFC data and train schedule. The probability of each travel chain was calculated according to departure time and the chain with the highest probability was selected as the passenger route option.

However, the previous studies mainly focus on the total travel time of passengers, and rarely take into account the change of travel time distribution of passengers at different periods. Some studies have shown that¹⁶ the route choice characteristics of passengers is varying throughout a day. The study results will be biased without the consideration of these. Therefore, this paper proposes an approach to estimate the proportion of passenger route choice for urban rail transit control system with the consideration of different periods. The train operation time is divided into different stages with the granularity of τ, and then the normalized value of passenger flow corresponding to each stage is calculated. By comparing with the standard coefficient of peak passenger flow, the stage can be classified as the peak or flat period. Studying the proportion of passenger route choice in multi-period can provide a more scientific and accurate basis for urban rail transit operational control system. In particular, the contribution and the innovation of this paper can be highlighted in the following aspects:

It provides a new perspective for studying of proportion estimation of passengers’ travel route choices. By analyzing the characteristics of passengers’ route choice at different times, this paper pointes out the importance of studying passengers’ travel routes in multi-period and divides train operation time into two periods according to the intensity of inbound passenger flow.

In order to avoid spending plenty of time and energy on passenger travel time field research, this paper analyzes the composition and representation of passengers’ travel time in detail and obtain the probability distribution parameters of single-route OD pairs with the actual AFC data. Then, we estimate the probability distribution parameters of travel time of each route between of multi-route OD pairs through electronic map API technology and random forest model.

We use the Naïve Bayes algorithm to estimate the probability that passengers travel along each route, and assign passengers to the route with the highest probability.

The rest of this paper is organized as follows: The second section provides a discussion of the intensities of inbound passenger flow, and divides the travel time of passengers into two periods: peak and flat. In Section 3, the expected value, variance as well as the distribution of passenger travel time for each route in each period are obtained. In addition, the Naïve Bayes algorithm is used to estimate the proportion of passenger route choice. The fourth section takes Hangzhou Metro in China as an example for analysis, and makes relevant comparative tests to verify the effectiveness of the proposed method. The fifth section presents the conclusion and future research plan.

Passenger travel period division

Significant variations exist in the passenger flow over the course of a day, and the passengers’ travel characteristics under different passenger flow intensities have the following differences:

Different route choice preferences. Through the questionnaire, this paper investigates whether passengers will choose another route that takes more time because they are not satisfied with the congestion and ride comfort of the original route. The survey results can be classified into usually, occasionally and hardly. Figure 1 shows that passengers in peak passenger flow period are more sensitive to time factor and most of them will hardly change to other routes with more time. They pay less attention to the factors such as train congestion and ride comfort. While, the proportion of people changing their route usually increases significantly during flat period compared with peak period, which reflects passengers’ different route choice preferences.

Different walking speeds. Relevant research¹⁷ shows that passengers’ walking speed are strongly affected by passenger flow density in urban rail transit system. The greater the passenger flow density is, the more slowly the passengers walk, as shown in Figure 2. The passenger flow density in peak period is greater than another period, so it takes longer time for passengers to get in and out of a station and transfer to other stations in peak period.

Different waiting time on the platform. During the peak passenger flow period, there will be the overload delay phenomenon at some stations. Overload delay refers to the situation that some passengers have to wait for the follow-up train on the platform due to the limitation of train capacity, resulting in the extension of waiting time.

Figure 1.

Questionnaire results.

Figure 2.

“Passenger-walking-speed” and “passenger-density” curve.

In this section, the train operation time is divided into peak and flat periods. In order to determine the period more scientifically and reasonably, the passenger flow intensities of stations are analyzed first. Figure 3 reflects the distribution of inbound passenger flow at different types of stations on weekdays. The types of station A, B, C, and D are respectively residence-and-work hybrid type, transportation hub type, residential type, and comprehensive commercial type.

Figure 3.

Distribution of inbound passenger flow at different stations on weekdays.

For station A, the inbound passenger flow curve shows a bimodal pattern, and the peak value at night is slightly lower than that in the morning. The passenger flow at other time is relatively flat. For transportation hub stations, there is no obvious peak in the curve, but the passenger flow intensity of the whole train operation time is at a high level. For residential stations, the inbound passenger flow curve presents a single-peak pattern. The peak is in the morning, and the intensity of passenger flow is relatively low and stable in other periods. For the comprehensive commercial station, the whole day passenger flow state is similar to the transportation hub type, but the passenger flow intensity is weaker than that of the latter.

The passenger flow on weekends demonstrates a significant difference from it on weekdays for the same station. Take station A for example, the passenger flow shows two obvious peaks on weekdays, which are distributed in the morning and evening. Whereas, the peak of the weekends passenger flow curve appears in the middle time of the whole day, and it is not obvious compared with other time. In addition, the overall passenger flow intensity on the weekends is significantly less compared with the weekdays, as can be observed in Figure 4.

Figure 4.

Distribution of inbound passenger flow in different days.

From above analysis, it can be seen that the peak period of passenger flow varies greatly among different types of stations on different types of days, which cannot be generalized. This paper identifies the peak and flat periods according to the following principles:

(1) The intensity of inbound passenger flow has a greater influence on the passengers’ arrival time and waiting time compared with outbound passenger flow.¹⁸ Therefore, this paper identifies periods only based on the intensity of inbound passenger flow.

(2) Using τ as the time granularity, the train operation time is divided into several stages, denoted as $[t_{1}, t_{2}, \dots, t_{n}]$ , and the inbound passenger flow corresponding to each stage is $[v_{1}, v_{2}, \dots, v_{n}]$ . The maximum passenger flow is denoted as peak passenger flow $v_{\max}$ . The passenger flow of other stages are normalized by $v_{\max}$ and recorded as the normalized value of passenger flow $[α_{1}, α_{2}, \dots, α_{n}]$ . The standard coefficient of peak passenger flow $β$ is introduced to determine whether $t_{i}$ belongs to peak period. If $α_{i}$ is greater than or equal to $β$ , we can consider $t_{i}$ belongs to peak period, otherwise, it belongs to flat period. The formula is expressed as follows:

\begin{matrix} α_{i} = \frac{v_{i}}{v_{\max}}, if α_{i} \geq β, t_{i} \in peak; else, \\ t_{i} \in flat; i \in [1, 2, \dots, n] \end{matrix}

(1)

$peak :$ peak period;

$flat :$ flat period.

(3) Considering the situation that passengers enter stations during the peak or flat period and leave stations during another period. In order to simplify the analysis, passengers are classified according to their enter stations time in this paper.

Estimation of passenger route choice proportion in multi-period

Travel time parameters

It is assumed that the travel time of passengers in urban rail transit consists of the following parts: entry and exit station walking time, waiting time at entry station, on-train time, transfer walking time and waiting time at a transfer station. It can be expressed as follows:

T_{od}^{k} = T_{ex}^{k} + T_{wait}^{k} + T_{train}^{k} + T_{twalk}^{k} + T_{twait}^{k} + T_{out}^{k}

(2)

$T_{od}^{k}$ : the travel time of passenger from station $o$ to station $d$ along route $k$ ;

$T_{ex}^{k}$ : the walking time from the entry AFC gate to the platform when the travel route is $k$ ;

$T_{wait}^{k}$ : the waiting time at the entry station $o$ when the travel route is $k$ ;

$T_{train}^{k}$ : the on-train time when the travel route is $k$ ;

$T_{twalk}^{k}$ : the walking time at the transfer station when the travel route is $k$ ;

$T_{twait}^{k}$ : the waiting time at the transfer station when the travel route is $k$ ;

$T_{out}^{k}$ : the walking time from the platform to the exit AFC gate when the travel route is $k$ .

The researches^7,9 show that passengers’ walking time $T_{ex}^{k}, T_{twalk}^{k}$ and $T_{out}^{k}$ obey normal distribution approximately, this is,

T_{type}^{k} ~ N (μ_{type}^{k}, {(σ_{type}^{k})}^{2}), type \in (ex, out, twalk)

(3)

$μ_{type}^{k}$ : the expected value of walking time for entering station, exiting station, and transfer;

${(σ_{type}^{k})}^{2} :$ the variance of walking time for entering station, exiting station, and transfer.

Assuming that the trains run according to the train schedule strictly, passengers’ on-train time can be obtained from the timetable, which is regarded as a constant in this paper and expressed as:

T_{train}^{k} = t_{train}^{k}

(4)

$t_{train}^{k}$ : the time of passengers on the train.

The waiting time on the platform is related to the train departure interval. Ingvardson et al.¹⁹ proposes that there are two distinct passenger behavior types existing. One is that passengers arrive at the platform randomly and evenly, and the waiting time can be represented by the uniform distribution. The other is that passengers try to minimize their waiting time and will arrive when the train is about to leave, which can be represented by Beta distribution. The probability distribution of passengers’ waiting time can be expressed as follows:

f (x, a, b) = ς \frac{Γ (a + b)}{Γ (a) \cdot Γ (b)} \cdot x^{a - 1} \cdot {(1 - x)}^{b - 1} + (1 - ς)

(5)

$ς$ : the proportion of passengers whose waiting time follows the Beta distribution;

$a, b$ : the shape parameters of the Beta distribution;

$Γ (*)$ : the gamma function, $Γ (*) = \int_{0}^{\infty} x^{* - 1} e^{- x} dx$ .

Assuming that there is no overload delay phenomenon during the flat period and a part of passengers will wait for the second train during the peak period. Therefore, the expected value and variance of passengers’ waiting time can be expressed as:

\begin{matrix} μ_{type}^{k} = (\frac{(1 - ς)}{2} + \frac{ς a}{a + b}) \cdot (1 + 2 θ) \cdot \\ H, type \in (wait, twait) \end{matrix}

(6)

\begin{array}{l} {(σ_{t y p e}^{k})}^{2} = (\frac{(1 - ς)}{12} + \frac{ς a b}{{(a + b)}^{2} (a + b + 1)}) \cdot \\ (1 - 3 θ + 3 θ^{2}) \cdot {(H)}^{2}, t y p e \in (w a i t, t w a i t) \end{array}

(7)

$θ$ : the proportion of passengers who wait for the second train, $θ \in [0, 1)$ ;

$μ_{type}^{k}$ : the expected value of passengers’ waiting time;

$(σ_{type}^{k})^{2}$ : the variance of passengers’ waiting time;

$H$ : the train departure interval.

Thus, the expected value and variance of passengers’ travel time can be expressed as:

μ_{od}^{k} = μ_{ex}^{k} + μ_{wait}^{k} + t_{train}^{k} + μ_{twalk}^{k} + μ_{twait}^{k} + μ_{out}^{k}

(8)

(σ_{od}^{k})^{2} = (σ_{ex}^{k})^{2} + (σ_{wait}^{k})^{2} + (σ_{twalk}^{k})^{2} + (σ_{twait}^{k})^{2} + (σ_{out}^{k})^{2}

(9)

Passenger travel time estimation

Due to the difference in stations’ layout and heterogeneity of passengers, the parameters such as entry walking time and transfer time are greatly different. Therefore, this paper adopts AFC data to infer passenger travel time. According to the number of feasible routes, the OD pairs can be classified as single-route ODs and multi-route ODs. Actually, feasible routes do not refer to all routes in the topological structure of two stations, but the routes that passengers may choose when traveling. If there is only one feasible route between an OD pair, it belongs to single-route ODs; otherwise, it belongs to multi-route ODs. As shown in Figure 5, although there are two routes from A to C in physical structure, passengers hardly choose route ④. So there is only one feasible route ① between AC and it belongs to single-route ODs. There are two feasible routes between AB (② and ③), so it belongs to the multi-route ODs.

Figure 5.

Single-route ODs and multi-route ODs.

By analyzing the probability distribution and parameters of single-route ODs’ travel time, the relevant rules of route travel time is obtained. Based on this, the travel time of each route between multi-route ODs can be estimated. The main steps are as follows:

Step 1: Search for the feasible routes between OD pairs.

Electronic map API is a free application program interface developed by map operators, which can quickly and accurately realize route searching, GPS positioning and other functions. This paper adopts the electronic map API to obtain the feasible routes between each OD pair, and adjusts the results with the consideration of the following rules: (1) Remove the travel routes which include other modes of transportation. (2) Since the results given by the map are based on the latest road network structure. If the road network now is different from the road network in the research period, we should adjust the results according to the historical information.

Step 2: Analyze the probability distribution and related parameters of single-route ODs’ travel time. The basic AFC data structure is shown in Table 1.

Table 1.

The basic data structures of AFC data.

UserID	Entry station	Entry time	Device no.	Exit station	Exit time	Device no.	Pay type
1	27	07:10:14	1354	72	07:56:07	3353	2
2	9	07:10:14	200	36	07:42:13	1198	2

The total travel time of passengers for a certain OD pair can be obtained according to the arrival and departure swiping time. The AFC data of single-route OD can be fitted with normal distribution to gain the expected value and variance, as shown in Table 2.

Table 2.

Single-route travel time parameters.

No.	Entry station	Exit station	Period type	Expected value/s	Variance/s
1a	Xiang Hu	Bin Kang	peak period	371.26709	73.78325
1b	Xiang Hu	Bin Kang	flat period	365.45224	76.67625
2a	Xiang Hu	Xi Xing	peak period	500.27961	77.76836

Step 3: Analyzing the expected value of travel time. When planning travel routes, the electronic map can not only give multiple feasible routes between OD pairs, but also estimate the travel time of each route. Comparing with the data in Table 2, we can find that the results of the electronic map are equal to the expected value in flat period approximately. Thus, we can consider:

μ_{od}^{k flat} = I_{od}^{k}

(10)

The expected value in peak period is larger than the estimated time of electronic map. Through the correlation analysis, we can find the difference between them is related to the departure interval of trains at the entry station and follows the uniform distribution $(0, H_{0})$ basically. Thus, it is considered that:

μ_{od}^{k peak} = I_{od}^{k} + ε

(11)

$I_{od}^{k}$ : the travel time of route $k$ estimated by electronic map;

$ε : ε ~ (0, H_{0})$ , the difference between expected value of travel time in peak period and estimated time of electronic map; $H_{o}$ is the departure interval of the train at the entry station.

Step 4: Travel time variance analysis. By analyzing the parameters of single-route ODs, it can be seen that the standard deviation of travel time is related to the departure interval of the entry train, the number of passenger transfers, the passenger density between OD pairs, the number of stations in the route. In this paper, random forest model was selected to fit the standard deviation of travel time and it can be expressed as follows:

σ_{od}^{k} = f (H_{o}, N_{trans}, N_{men}, N_{sta})

(12)

$N_{trans}$ : the number of passenger transfers in the travel;

$N_{men}$ : interval passenger density between OD stations;

$N_{sta}$ : the number of stations on route k.

Step 5: According to formulas (10)–(12), the expected value and variance of travel time for each route between multi-route ODs can be estimated, and the probability distribution of travel time of all routes can be obtained.

Estimation the proportion of passenger route choice

The Naïve Bayes algorithm is one of the most widely used classification algorithms. For the given training set, the joint probability distribution from input to output is trained based on the assumption of the independence between features. Then, based on the learned model, the output with the maximum posterior probability is calculated for an input. We propose an approach to estimate the proportion of route choice by introducing the Naïve Bayes algorithm. Taking the travel time as the attribute and the travel time of each passenger as the sample, we calculate the posterior probability of this sample belonging to each route and classify it into the route with the highest probability. The steps are described as follows:

(1) Suppose the number of routes between an OD pair is n: $R_{od}^{1}, R_{od}^{2}, \dots, R_{od}^{n}$ . Given the travel time of a passenger in a certain period $T_{od}^{period}$ as a sample, the probability of it belonging to each route can be calculated by the Naïve Bayes classifier, this is,

P (R_{od}^{k} | T_{od}^{period}) = \frac{P (T_{od}^{period} | R_{od}^{k}) P (R_{od}^{k})}{P (T_{od}^{period})}; (k = 1, 2, \dots n)

(13)

(a): $P (R_{od}^{k})$ is the prior probability. That is the probability of the passenger choosing route $k$ in travel. As the route selection scheme of passengers is unknown, it is temporarily assumed that the prior probability for each route is equal, namely:

P (R_{od}^{1}) = P (R_{od}^{2}) = \dots = P (R_{od}^{n}) = \frac{1}{n}

(14)

(b): $P (T_{od}^{period} | R_{od}^{k})$ refers to the probability that the travel time is $T_{od}^{period}$ when choosing the route $k$ . Through the above analysis, we can find that it obeys normal distribution, namely,

P (T_{od}^{period} | R_{od}^{k}) = \frac{1}{\sqrt{2 π} (σ_{od}^{k period})} \exp (- \frac{{(T_{od}^{period} - μ_{od}^{k period})}^{2}}{2 {(σ_{od}^{k period})}^{2}})

(15)

$μ_{od}^{k period}$ , $σ_{od}^{k period}$ can be obtained according to the method proposed in Section 3.2.

(2) Judge the classification of samples $T_{od}^{period}$ .Calculating the posterior probability $P (R_{od}^{k} | T_{od}^{period}), (k = 1, 2, \dots n)$ . If and only if $P (R_{od}^{i} | T_{od}^{period}) \geq P (R_{od}^{k} | T_{od}^{period})$ for any $k \in {1, 2, \dots, n}$ is true, we can consider the category of the sample $T_{od}^{period}$ is $R_{od}^{i}$ and then we can label the sample as $λ (R_{od}^{i})$ , this is,

λ (R_{od}^{i}) = argmaxP (R_{od}^{k} | T_{od}^{period}), (k = 1, 2, \dots n)

(16)

(3) Label all samples and count the sample number on each route $(λ_{1}, λ_{2}, \dots λ_{n})$ . Literature²⁰ points out that, when the prior probability is “improper prior,” the prior probability can be updated for optimization according to the output. Since we assume that the prior probabilities were equal before, which is different from actual situation possibly and leads to large errors. Therefore, combined with the basic principle of Expectation-Maximum algorithm, the hypothesis parameter is regarded as E-step, and then the prior probabilities were updated according to Formula (17), as the M-step. The final posterior probability is calculated through the updated prior probability, and the samples are labeled again.

P (R_{od}^{k}) = \frac{λ_{k}}{λ_{1} + λ_{2} \dots + λ_{n}}

(17)

(4) When the difference in the travel time of multiple routes is not significant, not all passengers will choose the route with the least travel time, but will choose a route according to their own preference. Therefore, when several posterior probabilities are approximately equal, only using the route corresponding to the maximum posterior probability as the passenger travel route will be quite different from the actual situation. Thus, after calculating the final posterior probability, this paper sorts it according to its value and confirms the passenger travel route in the following ways: If there are m feasible routes, which meet condition $P (R_{od}^{1} | T_{od}^{period}) - P (R_{od}^{m} | T_{od}^{period}) < 0.2 P (R_{od}^{m} | T_{od}^{period})$ , we choose one of them randomly as the travel route. If not, the route corresponding to the maximum probability is taken as the travel route. It’s expressed by Formula (18),(19).

\begin{matrix} P (R_{od}^{1} | T_{od}^{period}) > P (R_{od}^{2} | T_{od}^{period}) > \\ \dots P (R_{od}^{m} | T_{od}^{period}) > \dots P (R_{od}^{n} | T_{od}^{period}) \end{matrix}

(18)

λ (R_{od}^{i}) = {\begin{matrix} P (R_{od}^{i} | T_{od}^{period}), i \in {1, 2, \dots, m}; \\ \exists m \in {1, 2, \dots, n}, P (R_{od}^{1} | T_{od}^{period}) - P (R_{od}^{m} | T_{od}^{period}) < 0.2 P (R_{od}^{m} | T_{od}^{period}); \\ P (R_{od}^{1} | T_{od}^{period}) \\ P (R_{od}^{1} | T_{od}^{period}) - P (R_{od}^{2} | T_{od}^{period}) \geq 0.2 P (R_{od}^{2} | T_{od}^{period}) \end{matrix}

(19)

(5) Through the proposed method, we can get the corresponding routes of all samples and the number of samples on each route. Finally, we realize the estimation of the proportion of passenger route choice.

Case analysis

This paper takes Hangzhou Metro as an example to verify the proposed method empirically. Until January 2019, Hangzhou Metro had three lines, including Line 1, Line 2, and Line 4. It should be noted that line 1 contained two branch lines, which were the Xiashajiangbin branch line and the Linping branch line. The AFC card swiping data of passengers on 25 January 2019 is adopted for analysis. We choose a single-route OD pair (from Xianghu Station to Jinjiang Station) to show the route travel time distribution in multi-period. In order to analyze the travel time distribution of routes accurately, abnormal travel records are excluded in the study. Abnormal travel records indicate OD travel records whose travel time is longer than 95% of records or twice longer than the estimated time of the route returned by the electronic map API. Figure 6 shows that the probability density of passengers’ travel time has obvious normal distribution characteristics:

Figure 6.

Probability distribution of passenger travel time from Xianghu station to Jinjiang Station.

Take the multi-route OD pair (from Jiangling Road to East Railway Station) as an example to estimate the proportion of passenger route choice. We use Amap API to acquire the routes between stations. Two feasible routes are obtained in this example, as shown in Figure 7. Route 2 transfers at the Jinjiang Station.

Figure 7.

The routes from Jiangling Road to East Railway Station.

Jiangling Road inbound passenger flow is visualized in Figure 8. Two peaks in the passengers flow curve can be easily seen in the morning and evening. Thus, we can consider that Jiangling Road station is the residence-and-work hybrid type station, and most passengers are commuters in peak period.

Figure 8.

Peak and flat period identification result.

According to the passenger flow intensity of Jiangling road, the train operation time is divided into different stages with a granularity of 15 min and the normalized value of passenger flow $α_{i}$ in each stage is obtained. Comparing with the standard coefficient of peak passenger flow $β$ ( $β = 0.5$ ), we can get the result that the peak periods are from 7:00 to 9:00 and from 17:00 to 19:00. The other time of train operation is the flat period.

The parameters of the travel time are shown in Table 3. From Table 3, it can be seen that although passenger does not need to transfer by route 1, the expected value of it is larger than route 2 in both periods. In addition, the expected values of both routes are larger in peak period compared with the flat period.

Table 3.

“Jiangling Road → Train East station” travel time parameters.

	Expected value in peak period	Standard deviation in peak period	Expected value in flat period	Standard deviation in flat period
Route 1	2106.52 s	117.83 s	1860.34 s	139.32 s
Route 2	1876.83 s	122.12 s	1680.18 s	133.09 s

A simulation experiment is designed to verify the proposed method. The simulation scenario is as follows: Randomly generate 1000 passengers, including 400 in peak period and 600 in flat period. The ratio of passengers choosing route 1 and 2 is $λ_{1} : λ_{2}$ . The travel time distributions of passengers who choose route 1 and 2 during peak period respectively obey $T_{od}^{1 peak} ~ N (2106.52, 117.83)$ and $T_{od}^{2 peak} ~ N (1876.83, 122.12)$ . The parameters during flat period obey $T_{od}^{1 flat} ~ N (1860.34, 139.32)$ and $T_{od}^{2 flat} ~ N (1680.18, 133.09)$ respectively. Based on the approach provided in this paper, the results that the number of passengers on each route is acquired, as shown in Tables 4 and 5. Table 4 shows the expression of the results. When the actual results are consistent with the predicted results, they are marked as true and are represented by TR1 (True Route1) and TR2 (True Route2). Otherwise, they are marked as false and are represented by FR2 (False Route2) and FR1 (False Route1).

Table 4.

Jiangling Road to East Railway Station route choice result.

Actual results	Predicted results
Actual results	Route 1	Route 2
Route 1	TR1 (True Route1)	FR2 (False Route2)
Route 2	FR1 (False Route1)	TR2 (True Route2)

Table 5.

Jiangling Road to East Railway Station route choice results.

No.	$λ_{1} : λ_{2}$	Type	TR1	FR2	TR2	FR1	Error (%)	Error reduction (%)
1	8:2	Peak period	312	8	74	6	2.2	65.1
		Flat period	477	3	115	5	2.2
		Whole-day period	779	21	158	42	6.3
2	7:3	Peak period	273	7	115	5	2.4	67.1
		Flat period	412	8	176	4	2.4
		Whole-day period	673	27	254	46	7.3
3	6:4	Peak period	234	6	157	3	2.4	68.4
		Flat period	352	8	233	7	2.4
		Whole-day period	578	22	346	54	7.6
4	5:5	Peak period	195	5	194	6	2.1	68.2
		Flat period	294	6	296	4	2.1
		Whole-day period	479	21	455	45	6.6

The “Whole-day period” refers to the contrast experiment of estimating passenger travel routes without dividing periods. The experiment $λ_{1} : λ_{2} = 7 : 3$ is used as an example to explain the data in Table 5. In the peak period, the number of FR2 and FR1 are 7 and 5 respectively, which means there are 12 samples misclassified in total. In addition, there are 12 samples misclassified in flat period. We can calculate the error rate according to Formula (20).

E = \frac{FR 2 + FR 1}{(TR 1 + FR 2 + TR 1 + TR 2)}

(20)

The denominator is the number of all samples and the numerator is the number of samples marked as false in the experiment. There are 24 samples misclassified in segment experiment and the number of all samples is 1000. By Formula (20), we can calculate that the error is 2.4%. Furthermore, we can acquire the error reduction with Formula (21).

ER = \frac{E_{whole - day} - E_{segement}}{E_{whole - day}}

(21)

The result in Table 5 shows that the error of the segmented experiment is reduced by more than 60% compared with the whole-day experiment, which verifies the effectiveness of the method. Figure 9 shows the result of passenger travel route estimation.

Figure 9.

Passenger travel route identification result: (a) the segmented experiment and (b) the whole-day experiment.

Note that the bimodal distribution phenomenon can be clearly seen in Figure 9(a). In addition, some interesting findings are found in Figure 9(a). The travel time distribution of route 2 in peak period is similar to the distribution of route 1 in flat period, which are both concentrated between 1700 and 2000 s. However, there is no significant difference in travel time distribution between two periods in Figure 9(b).

The relevant information and distribution characteristics of the misclassified samples can be observed in Table 6 and Figure 10.

Table 6.

Information about the misclassified samples.

	Mistake type	Sample number	Expected value/s	Variance/s
Peak period	route 1-route 2	7	1910.28	58.52
Peak period	route 2-route 1	5	2018.74	30.17
Flat period	route 1-route 2	8	1667.57	45.14
Flat period	route 2-route 1	4	1741.52	68.02
Whole-day period	route 1-route 2	27	1877.46	86.37
Whole-day period	route 2-route 1	46	1874.46	94.10

Figure 10.

The misclassified samples distribution: (a) the segmented experiment and (b) the whole-day experiment.

From them, we can find these characteristics:

From Figure 10, we can clearly see that the number of misclassified samples in the segmented experiment is significantly less than that in the whole-day experiment. In the segmented experiment, the number of misclassified samples is 24, while in the whole-day experiment, the number is 73.

In the segmented experiment, the expected value of travel time for route 1 is larger than that for route 2 in both periods (as shown in Table 3). However, the travel time of the misclassified sample corresponding to route 1 is smaller than route 2, which is caused by the randomness of the experiment.

In the whole-day experiment, 27 samples are misclassified to route 2, and 46 samples are misclassified to route 1. The expected values are 1877.46 and 1874.46 s respectively, which are close to the expected value of travel time for route 1 in flat period and route 2 in peak period. Therefore, it can be considered that the above two types of samples are easy to be confused in classification.

Conclusion

In this paper, an estimation method for the proportion of route choice in multi-period urban rail transit control systems is proposed. In particular, in order to divide the train operation time into different periods, this paper formulate two indicators, which are the normalized value of passenger flow and the standard coefficient of peak passenger flow. This paper analyzes the probability distribution of travel times and obtains route choice through the Naïve Bayes algorithm. The simulation results show that the proposed method can effectively reduce the error rate caused by the similar travel time of different routes in different periods. However, when the travel times of several routes for a certain OD pair are almost equal, the accuracy of passenger travel route identification needs to be improved. In the future, we will consider combining passenger travel time with train schedules to optimize the approach.

Research Data

sj-docx-1-mac-10.1177_00202940221089501 – Supplemental material for Estimation and simulation for route choice proportion in multi-period urban rail transit control systems

Supplemental material, sj-docx-1-mac-10.1177_00202940221089501 for Estimation and simulation for route choice proportion in multi-period urban rail transit control systems by Yiyuan Yang, Ranran Sun, Peiyue Wang, Yang Yi and Guangyu Zhu in Measurement and Control

This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China [No.61872037, No.61833002, No.62173167, No.62132003].

ORCID iDs

Yiyuan Yang

Guangyu Zhu

References

Nguyen

Pallottino

Malucelli

. A modeling framework for passenger assignment on a transport network with timetables. Transp Sci 2001; 35(3): 238–249.

Liu

. Traffic equilibrium assignment model specially for urban railway network. J Tongji Univ (Nat Sci) 2004; 32(9): 1158–1162.

Hibino

Hyodo

Uchiyama

. A study on characteristics of non-iia route choice models on high density railway network. Doboku Gakkai Ronbunshu 2004; 2004: 131–142.

Mao

Liu

. Passenger flow assignment model and algorithm for urban railway traffic network under the condition of seamless transfer. J China Railw Soc 2007; 29(6): 12–18.

Luo

Gao

. Passenger flow distribution model and algorithm for urban rail transit network based on multi-route choice. J China Railw Soc 2009; 31(2): 110–114.

Liu

Sun

Bai

, et al. Passenger flow route assignment model and algorithm for urban rail transit network. J Transp Syst Eng Inform Technol 2009; 9(2): 81–86.

Zhou

Shi

. Estimation method of path-selecting proportion for urban rail transit based on AFC data. Math Probl Eng 2015; 2015(16): 350–397.

Hong

Zhu

. Assigning passenger flows on a metro network based on automatic fare collection data and timetable. Discrete Dyn Nat Soc 2017; 2017: 1–10.

Cheng

Zhao

. Estimation of passenger route choices for urban rail transit system based on automatic fare collection mined data. Trans Inst Meas Contr 2019; 41(11): 3092–3102.

10.

Zhang

Yao

Zheng

, et al. Metro passenger’s path choice model estimation with travel time correlations derived from smart card data. Transp Plann Technol 2020; 43(2): 141–157.

11.

Yang

Peng

. Passenger flow distribution of regional multi-standard rail transit based on passenger route selection. J Transp Syst Eng Inform Technol 2021; 21(1): 108–115.

12.

Wang

Rong

, et al. Travel patterns analysis of urban residents using automated fare collection system. Chin J Electron 2016; 25(1): 40–47.

13.

Luo

. Research on the simulation method of urban rail transit feedback assignment. J Syst Simul 2022, 34(3): 512–526.

14.

Sun

, et al. Data-driven model for passenger route choice in urban metro network. Phys A 2019; 524: 787–798.

15.

Sun

Zhao

Zhang

. Determination of passenger travel path in rail transit network simulation. Railw Signal Commun Eng 2020; 17(12): 45–50.

16.

Deng

Wang

Gao

, et al. Assessing temporal–spatial characteristics of urban travel behaviors from multiday smart-card data. Phys A 2021; 576: 126058.

17.

Leurent

Xie

. Exploiting smartcard data to estimate distributions of passengers’ walking speed and distances along an urban rail transit line. Transp Res Procedia 2017; 22: 45–54.

18.

Teng

Pan

Zhang

. Quantitative modeling of congestion in metro station based on passenger time perceptions. Transp Res Rec 2020; 2674(5): 270–281.

19.

Ingvardson

Nielsen

Raveau

, et al. Passenger arrival and waiting time distributions dependent on train service frequency and station characteristics: a smart card data analysis. Transport Res Part C 2018; 90: 292–306.

20.

Shahriari

Swersky

Wang

, et al. Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 2016; 104(1): 148–175.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB