Discover the road sequences of bus lines using bus stop information and historical bus locations

Abstract

Discovering the road sequences traveled by the bus lines is of great importance for public transport management, such as bus journey time prediction and multimodal travel route recommendation, as the road sequence provides important information on roadway characteristics such as the number of intersections and traffic signals, road type, and number of lanes. This article develops methods to discover the covered road sequence for a given bus line, using bus stop information as well as historical bus locations (i.e. locations where buses had appeared in history). To solve the problem, we first construct a high-quality global positioning system (GPS) trajectory and then employ a novel map-matching algorithm to the resultant dense trajectory to obtain the road sequence of the bus line. We focus on constructing high-quality trajectory with dense GPS points by (1) forming an initial bus stop trajectory using ordered bus stop coordinates, (2) identifying sufficient suitable historical bus locations, and (3) inserting the identified bus locations into proper positions of the bus stop trajectory. Our proposed method is evaluated on real-world bus line data involving more than 400 bus services in Singapore.

Keywords

Bus line road sequence sparse trajectory historical bus locations map matching

Introduction

Efficient and easy-to-use public transportation system is an important element in sustainable cities as it can contribute to considerable reduction in traffic congestion and lower carbon emissions from vehicles.¹ A key enabler to the success of public transportation system lies in the provision of accurate travel time information for travelers to make reliable journey planning. This is especially critical for bus services which typically account for the majority ridership among all public transport journeys.²

Unfortunately, since bus travel time is affected by multiple factors such as the riding time of moving on road segments, the stopping time at bus stops as well as the delays at intersections, accurate prediction of the travel time for bus journeys remains to be a challenging problem. This problem can be effectively circumvented by providing sufficient features to capture roadway characteristics (e.g. distance, number of bus stops, traffic signals) and traffic conditions (e.g. congestions) on the road segments covered by the bus line. However, the road characteristic information is not immediately available as the road sequence of bus lines is typically not publicly available. Therefore, there is a need to discover the sequence of road segments covered by the bus lines.

An intuitive method to find the road sequence of a bus line is to treat the sequence of bus stops as an object moving trajectory (called bus stop (BS) trajectory) and map the BS trajectory to road network using map-matching method. However, the points of BS trajectory are sparse, because the distance between consecutive bus stops is typically large, ranging from 200 to 500 m. Thus, it is possible that there could exist multiple road paths connecting consecutive bus stops. As such, existing map-matching algorithms typically cannot generate good results for such sparse BS trajectories.

In this article, we investigate the problem of discovering the road sequence covered by a bus line using bus stop information as well as historical bus locations where buses had appeared in history (bus location for short). Due to the fact that there usually exist many possible travel paths between two consecutive points of a sparse trajectory (e.g. as large as 300 m between two consecutive points), thus, it is a challenging task to determine which is the traveled one. Most of the existing works try to improve map-matching accuracy for a given trajectory without considering to augment the input trajectory. In this article, we explore new possibility to further optimize map-matching results by producing dense trajectory, which is also a challenging task. We will develop efficient methods to construct dense hybrid trajectories based on the two datasets. The first dataset—that is, bus stop data—has fixed global positioning system (GPS) coordinates and known orders of the bus stops, but the resultant BS trajectory is too sparse to produce correct sequence of road segments of the bus line using existing map-matching algorithm. The second dataset has large volume of historical bus location records where each record has the information of the bus service ID (e.g. bus 179). However, the records do not have the information of bus ID, meaning that it is not known which bus locations belong to the same bus trajectory, because there are usually multiple buses in operation simultaneously along the bus line/service. In other words, the second dataset contains historical bus locations that certainly locate on the bus line but out of order.

To address our problem, we first form a sparse BS trajectory using the ordered GPS coordinates of bus stops, then we strategically select a set of bus locations (where buses had appeared in history) and insert them into the BS trajectory to obtain a dense hybrid trajectory. We develop novel techniques to (1) identify sufficient suitable bus locations from the mass disordered historical bus locations, and (2) determine the proper positions of the BS trajectory in order to insert the identified bus locations. With the obtained dense trajectory, the map-matching algorithm proposed in Koller et al.³ is employed to find the sequence of road segments that are covered by the corresponding bus line. We evaluated our method on real-world bus line data involving more than 400 bus services in Singapore, and the results show that the proposed method achieves remarkable results by comparing with Baidu Map.

The remainder of the article is organized as follows. The “Related works” section discusses the related works. The “Preliminary” section introduces important definitions and preliminary knowledge on Hidden Markov Model (HMM) for map matching. The “Data representation and problem formulation” section discusses the used datasets, the problem description as well as the proposed method. In the “Case study” section, a case study is presented to demonstrate and verify the proposed approach. The performance of the proposed approach is evaluated in the “Experimental results and analysis” section. The “Conclusion and further work” section concludes the article and discusses future work.

Related works

Since we are not aware of any reported works that address the same problem considered in this article, in this section, we review related works that can be categorized into the following two categories.

Map-matching algorithms

A large number of map-matching algorithms have been reported to map a trajectory (sequence of GPS points) to road networks. In the early stage, the closeness (between trajectory points and road network nodes from GPS data)⁴ and the shape of the road segments are used as the imported features to map a trajectory point to a road segment.⁵ However, urban areas typically have dense road networks, and the closest road segment may not always be the correct one to match with⁶ thus, the connectivity of road segment is also considered to reduce the errors of map matching,⁷ which is typically known as a topological map-matching algorithm when it utilizes the geometry and the connectivity of the road segments. These algorithms considering the connectivity of the road network usually model the constraints with speed, travel time, or shortest path distance to filter out outliers in order to obtain better results.

However, most of the above methods do not work well when matching on the intersections. This is because an intersection typically has multiple adjacent road segments; thus, it is difficult to determine the correct road segment to match with simply based on the closeness and topology of the road network. To address this challenge, many probabilistic methods have been proposed, which require the error region estimation of GPS observations and jointly consider multiple influencing factors (e.g. multicasting,⁸ heading, connectivity, and closeness).⁹ Recently, more advanced methods have been proposed based on Kalman filter,¹⁰ HMM,¹¹ HMM model with precomputation,¹² fuzzy logic,¹³ and multiple hypotheses.¹⁴ In addition, the above-mentioned methods were also used in a hybrid manner.¹⁵

Map matching for sparse trajectories

The methods discussed in the previous section typically cannot produce good results when the trajectories are sparse (e.g. one point per 1–2 min). There are still many issues, such as improving the accuracy of map matching for sparse GPS trajectories, developing efficient online algorithms, and reducing matching errors at junctions, that have not been well addressed. A few works have been reported aiming at addressing the above issues, including the HMM-based methods (e.g. off-line map matching,¹⁶ real-time map matching¹⁷), the conditional random field (CRF) model,¹⁸ the path inference filter (PIF)¹⁹ as well as the ST (spatial–temporal) matching method.^20,21 However, these approaches typically utilize a lot of impact features to assist the matching procedures such as driving behavior, length of the path, the number of traffic signals along the path, and average travel time. Most of these factors are difficult to obtain for the general situation or could cause considerable errors when compared with the actual situation. In addition, the prior knowledge of the trajectory point distribution among road networks also becomes a problem. These prerequisites prohibit the generalization of the above methods because most GPS datasets only contain the simplest information such as latitude, longitude, and the time stamp. In fact, the problem considered in this article is that the trajectory points even do not have the time stamp information; thus, the above-mentioned methods cannot be directly applied.

In this article, we do not intend to improve the procedure of map matching by designing more sophisticated algorithm. Instead, we try to improve the quality of trajectory which can be treated as a preprocess and can be combined with existing map-matching algorithm. We demonstrated the performance of our proposed method by combining our approach with an existing HMM-based method. Note that there exist several research efforts to assign GTFS (General Transit Feed Specification) bus stops to road networks.²² They either rely on the shortest path model²³ or maintain a set of candidate paths connecting successive geographic points.²⁴ The difference between the above works and our work is that we focus on developing efficient method to discover the sequence of road segments covered by a bus line, instead of only the road segments corresponding to the bus stops of the bus line.

In particular, the most similar work was reported in Vuurstaek et al.²⁵ The differences between Vuurstaek et al.²⁵ and our work are as follows: (1) first, our work and Vuurstaek et al.²⁵ rely on different datasets; and (2) our work and Vuurstaek et al.²⁵ focus on different tasks with different challenges. Vuurstaek et al.²⁵ aim to match the bus stops to a road network, which faces the challenges of inaccuracy of GPS locations, while our work aims to find a connected road sequence that is covered by a given bus line (denoted as a trajectory of GPS points). The challenge in our problem is mainly caused by the sparse GPS points in the trajectory such that there may exist multiple feasible road paths between two consecutive points. As such, our work and Vuurstaek et al.²⁵ focus on different specific tasks and propose different frameworks to solve their problems.

Preliminary

Definitions and notations

Definition 1

Route distance: The total sum of the distance of all consecutive point pairs along the bus route.

There are many types of route shapes of bus line segments such as straight line, folding line, and curve shape. In particular, we partition the line shape into two categories, that is, one-way road and two-way road, as shown in Figure 1.

Figure 1.

The actual bus route category.

Definition 2

Road segment curvature: Let two points A and B be the end points of directed road segments (from point A to point B); we use notation $\bar{AB}$ to represent the road segment. The distance of route segment $\bar{AB}$ is denoted as $| \bar{AB} |$ . On the other hand, the Euclidean distance of point A to point B is denoted as $∥ \bar{AB} ∥$ . The curvature $C_{(\bar{AB})}$ of the road segment $\bar{AB}$ is defined as

C_{(\bar{AB})} = \frac{| \bar{AB} |}{∥ \bar{AB} ∥}

(1)

Since it always satisfies that $| \bar{AB} | \geq ∥ \bar{AB} ∥$ , thus, we have $C \geq 1$ . Road curvature reflects the straightness of the road network. Larger curvature value corresponds to curved and complex road segment as illustrated in Figure 2. When $C = 1$ , road segment $\bar{AB}$ is a strictly straight line.

Figure 2.

Demonstration of the road segment curvature: (a) on curved road and (b) on folding line road.

Definition 3

We define road segment $\bar{AB}$ as a near-straight segment, if the road segment $\bar{AB}$ satisfies $1 \leq C_{(\bar{AB})} \leq 1 + δ$ . Otherwise, it is called as a curved segment. The $δ$ is a parameter threshold of small value.

Definition 4

Deviation degree: As shown in Figure 3, there is a point O located near the road segment $\bar{AB}$ . The distance of point O to the route segment $| AB |$ is denoted by $h_{(O ⊥ AB)}$ . As mentioned above, the Euclidean distance of point A to point B is denoted as $∥ \bar{AB} ∥$ . Then we define the deviation of a point O relative to road segment $\bar{AB}$ as the ratio of $h_{(O ⊥ AB)}$ to $∥ \bar{AB} ∥$ , calculated by

D_{(O, \bar{AB})} = \frac{h_{(O ⊥ AB)}}{∥ \bar{AB} ∥}

(2)

Figure 3.

Deviation degree: (a) on straight road and (b) on curved road.

Deviation degree D introduced in Figure 3 is the closeness degree of point O to the road segment. Smaller deviation degree indicates a close relation between the point and the route segment, and vice versa.

HMM for map matching

Map matching solves the problem of how to match a set of ordered GPS coordinates to the road networks. Map-matching algorithm typically takes a sequence of vehicle/human trajectory as input and obtains the sequence of road segments traveled by the vehicle/human of the trajectory. Due to the existence of errors caused by a natural random noise source, simply matching each trajectory point to its nearest road will lead to considerable matching error.

The HMM map-matching algorithm¹⁷ utilized in this article, reduce the noise by a series of filters, but is limited by the connectivity of the road network. Then it calculates the different matching probabilities of several possible paths and selects the best candidate of travel paths. The main ideas are illustrated in Figure 4.

Figure 4.

The six possible travel routes of the trip.

For the GPS point $P 1$ at time $t 1$ , there are three road segments that are close to point $P 1$ . In other words, there are three candidate road segments, which are denoted as $R 1$ , $R 2$ , and $R 3$ , respectively. The three road segments are represented by three black solid nodes. For point $P 2$ at time $t 2$ , there are two candidate road segments nearby, as shown in the figure. For point $P 3$ at time $t 3$ , there is only one potential road segment nearby. From time $t 1$ to $t 3$ , there are six possible travel routes of the input trajectory: $R 1_{t 1} \to R 2_{t 2} \to R 3_{t 3}$ , $R 1_{t 1} \to R 2_{t 2} \to R 1_{t 3}$ , $R 2_{t 1} \to R 1_{t 2} \to R 1_{t 3}$ , $R 2_{t 1} \to R 2_{t 2} \to R 1_{t 3}$ , $R 3_{t 1} \to R 1_{t 2} \to R 1_{t 3}$ , and $R 3_{t 1} \to R 2_{t 2} \to R 1_{t 3}$ . The HMM algorithm considers all the states from the first point to the end point of the GPS trajectory, and then find the route with the highest possibility (please refer Reham et al.¹⁷ for the details of the algorithm).

Data representation and problem formulation

Dataset description

As mentioned before, we rely on two datasets to construct dense hybrid BS trajectories, that is, bus stop data and historical bus location data.

Bus stop data: The bus stop data are given in the form of bus lines, which are sequences of ordered points where each point corresponds to a bus stop and contains the information of the ID of the bus stop, the GPS coordinates (i.e. latitude and longitude) of the bus stop, and the distance between consecutive bus stops. With the bus line data, we can easily construct a BS trajectory since the order of the bus stops is known. However, the obtained BS trajectory is very sparse as the distances between consecutive bus stops are large ranging from 200 to 500 m in Singapore. Because of this, there typically exist multiple road paths connecting the consecutive bus stops, which makes it a challenging problem for matching the sparse BS trajectory to road networks.

Historical bus location data: The second dataset has large volume of historical bus location records. In particular, each record has the information of GPS coordinates of the bus location, the bus service ID, bus moving direction, and time stamp of the record. However, the records do not have the information of bus ID, meaning that it is not known which bus locations belong to the same bus trajectory, because there are usually multiple buses in operation simultaneously along the travel route of the bus service. In other words, the second dataset contains historical bus locations that certainly locate on the bus line but out of order. In addition, the bus location records suffer from measurement errors, redundancy, outliers, and disorder. As such, there is a need to develop efficient method to select the suitable bus location records to insert into the BS trajectory.

Road network data: We use the road network of Singapore, which comprised of 41,732 nodes and 98,539 road segments. The road network is utilized as an input to the map-matching algorithm together with the constructed high-quality hybrid BS trajectory.

Problem description

As discussed before, we can easily construct a BS trajectory from the bus line data since the GPS coordinates and the order of the bus stops are known. On the other hand, since the BS trajectory is sparse, we treat the BS trajectory as an initial trajectory to be refined. Then we develop novel methods to select a certain amount of bus locations and insert them into the initial BS trajectory to form high-quality trajectory, that is, hybrid dense BS trajectory.

Let $BSL = < S_{1}, S_{2}, \dots, S_{i}, \dots, S_{n} >$ be the initial trajectory of Bus Stop Locations (BSL) and $BL = {G_{1}, G_{2}, \dots, G_{j}, \dots, G_{m}}$ represents the set of historical Bus Locations (BL) which are disordered. In this section, we present the proposed methods to construct a hybrid Dense BS Trajectory of Locations (DTL) as follows

DTL = < S_{1}, G_{p}, S_{2}, \dots, G_{q}, \dots, S_{i}, \dots, G_{r}, \dots, S_{n} >

(3)

where $G_{p}, G_{q}, G_{r}, \dots, \in BL$ .

The major challenges for building the DTL include the following:

How to extract a certain amount of suitable bus locations from dataset BL in the existence of measurement errors, redundancy, and outliers.

How to insert the selected points into the correct position of the hybrid dense trajectory DTL. We refer to this process as the positional interpolation procedure, as shown in Figure 5.

Figure 5.

An example of inserting historical bus locations to form dense BS trajectory.

We next present the proposed methods for addressing the above two tasks.

Main idea

Our proposed method jointly solves the two challenges. Based on definition 3.2 and definition 3.3, we first identify all the candidate paths of road segments that connect the same consecutive bus stops and partition them into clusters by choosing proper parameter $δ \in [0, + \infty)$ , that is, cluster of near-straight routes and cluster of curved routes. For each cluster, we develop different interpolation strategy that is specific for the cluster.

We adapt the number of bus locations to be selected and inserted accordingly based on the curvature degree of the road segment. Larger curvature degree requires that more historical bus locations need to be selected and inserted into the BS trajectory, in order to improve the quality of map matching. For each pair of consecutive bus stops, we select a certain amount of historical bus locations and insert them into proper position of the BS trajectory between the two bus stops.

In particular, we first calculate the distance between the consecutive bus stops using the dataset of bus line information. With the GPS information of two points (including longitude and latitude), equation (5) can be used to calculate spherical distance. The spherical distance between the two consecutive GPS points can be regarded as the Euclidean distance. Due to the existence of device error, the recorded GPS information sometimes could be very far from the actual location of the bus. However, it is difficult to identify all the outliers in a raw GPS trajectory. As a result, we do not try to obtain an ordered accurate GPS sequence through a single interpolation process. Instead, the interpolation process needs to be repeated many times to obtain a high-quality trajectory.

We conduct an “inaccurate interpolation, accurate culling” procedure as shown in Figure 6 to preprocess the GPS points from the raw bus location dataset (GPS points), that is, selecting sufficient historical bus locations and inserting them into BS trajectory to improve the trajectory quality. First, we perform distance-based sampling to select a certain amount of historical bus locations and insert them into the interval between points S and D of the BS trajectory. That is, the distances of consecutive bus location pairs are approximately equal. We first treat point S as the current point, and repeatedly find a successor point for the current point and update the new inserted point as the current point. The process repeats until point D can be regarded as the successor point for the current point. Second, we examine the quality of the sampled bus locations and remove anomaly locations and the locations that are inserted in incorrect position (order). We repeat the process until a dense ordered high-quality BS trajectory is obtained. We next present the detailed description on the interpolation processes.

Figure 6.

The entire map-matching scheme process.

Insertion procedure

Suppose that we are given a point S and we need to find a historical bus location to be the successor point of S. Let $λ$ be road width, and let notations C and D stand for curvature and deviate distance, respectively. According to the value of the curvature, we dynamically determine the number of points that need to be inserted in each road path (connecting consecutive bus stops).

Case 1: If the road segment $\bar{AB}$ is a near-linear segment (see Figure 2(a) and (b)). For a point O, we check the following properties:

1. $∥ AB ∥ = \max {∥ AB ∥, ∥ OA ∥, ∥ OB ∥}$ .

2. The deviate distance of point O relative to road segment $\bar{AB}$ is $0 \leq D_{(O, \bar{AB})} < ξ$ , where $ξ \in (0, \frac{λ}{∥ AB ∥})$ .

If the above properties hold, then point O can be regarded as a historical bus location to be inserted into road segment $\bar{AB}$ .

Note that (1) a (geometric) projection of a point O on road segment $\bar{AB}$ can be guaranteed to be located between A and B as there are large volumes of historical bus locations, and (2) with the aid of equation (2), we have $0 \leq h_{(O ⊥ AB)} < λ$ . It means that the distance of point O to straight line $| AB |$ is no more than road width. This can be used to eliminate the interference of anomaly points (the bus locations with large localization errors as well as the points on close roadways).

Case 2: If the road segment $\bar{AB}$ is a curved segment (see Figure 7). For a point O, we check the following properties:

1. $∥ AB ∥ = \max {∥ AB ∥, ∥ OA ∥, ∥ OB ∥}$ .

2. The distance of point O to straight line $| AB |$ ranges like $0 \leq h_{(O ⊥ AB)} < α$ where $α \in (0, \frac{∥ AB ∥}{2})$ .

Figure 7.

The candidate trajectory insertion point: (a) candidate bus locations to be selected on curved line and (b) candidate bus locations to be selected on folding line.

If the above properties hold, then point O can be regarded as a candidate historical bus location to be inserted into road segment $\bar{AB}$ , since it has high probability to be located on the correct BS trajectory with small GPS error.

In this case, (1) a (geometric) projection of a point O on road segment $\bar{AB}$ is on the line segment $\bar{AB}$ , and (2) the parameter $α$ is used to determine which points and how many points to be inserted.

As shown in Figure 7, the circle radius $r = α$ , the points within the circle are the candidate points (historical bus locations). Most of the points identified using this method are potential candidates to road segment $\bar{AB}$ . However, there exist a few points that also are located in the circles but actually not valid candidates for inserting into road segment $\bar{AB}$ , such as the points located in the red triangle of Figure 7(b). These points usually cannot be correctly inserted into BS trajectory in proper order; thus, they will result in mistakes in the subsequent map-matching algorithms. We next introduce how to remove those anomaly points from the identified candidate set.

Removing anomaly points

Suppose that the above interpolation process outputs a refined dense BS trajectory of a bus line: $RPL = < G_{1}, G_{2}, G_{3}, \dots, G_{i}, G_{j}, G_{k}, \dots, G_{n} >$ . We denote route distance (Definition 1) between points $G_{i}$ and $G_{j}$ as $| \bar{d_{i} d_{j}} |$ , while denote straight-line distance (Euclidean distance) between points $G_{i}$ and $G_{j}$ as $∥ \bar{d_{i} d_{j}} ∥$ . It is easy to verify that $| \bar{d_{i} d_{k}} | = \max {| \bar{d_{i} d_{j}} |, | \bar{d_{j} d_{k}} |, | \bar{d_{i} d_{k}} |}$ if the points in the BS trajectory are in correct order. When the trajectory is dense and the distance values between consecutive point pairs are small, the above equation can be reformulated as the following equation

∥ d_{i} d_{k} ∥ = \max {∥ d_{i} d_{j} ∥, ∥ d_{j} d_{k} ∥, ∥ d_{i} d_{k} ∥}

(4)

The above equation (i.e. equation (4)) can be used to remove the anomaly points and the points that are inserted to incorrect positions. By repeatedly inserting historical bus locations and removing anomaly points, a refined high-quality BS trajectory with dense GPS points can be obtained.

Case study

We take the bus service 241 in Singapore as an example to demonstrate the map-matching result. With the sequence of ordered bus stops as the original BS trajectory, we use the “fuzzy interpolation, accurate culling” strategy to insert a certain amount of historical locations into the BS trajectory points. The resultant refined high-quality BS trajectory is shown in Figure 8.

Figure 8.

The obtained high-quality BS trajectory which is fed into map-matching algorithm.

With the obtained dense BS trajectory, an HMM-based map-matching algorithm, as described in the “Map matching for sparse trajectories” section, is performed to find the best road segment path that matches the BS trajectory on the road network. Figure 9 shows the obtained GPS point sequence obtained from the map-matching algorithm, while Figure 10 shows the resultant road sequence traveled by the bus service 241.

Figure 9.

The GPS point sequence obtained from the map-matching algorithm.

Figure 10.

The corresponding sequence of road segments covered by the bus line 179 obtained from map matching.

All bus lines are matched to the digital map, and the entire bus line network is shown in Figure 11, where different colors are used to indicate different travel routes.²⁶

Figure 11.

The demonstration of the discovered road sequences of all the bus lines considered.

Experimental results and analysis

We evaluated our proposed method using real-world bus line data involving more than 400 services in Singapore. The results are verified by comparing with Baidu Map through visualization, as shown in Figures 9 and 10. In addition, we use the following method to verify the results of all bus lines.

Suppose there are n bus stops on a bus route, and the total length of the route is L. Let RPL be the output of the map-matching algorithm

RPL = < S_{1}, G_{1}, G_{2}, \dots, G_{p}, S_{2}, G_{p} + 1, G_{p} + 2, \dots, G_{q}, \dots, S_{i}, \dots, G_{r}, \dots, S_{n} >

(5)

where $S_{i} (i \in [1, n])$ is a bus stop while $G_{p}, G_{q}, G_{r}, \dots$ are other GPS points in the GPS sequence obtained from the map-matching algorithm.

We partition RPL into $n - 1$ road segments based on the bus stop points. Let $l_{i}$ be the length of the ith road segment along the bus line, that is, the distance between the ith and the $i + 1$ bus stop, then $L = l_{1} + l_{2} + \dots + l_{i} + \dots + l_{n - 1}$ . In the same way, the bus route after map-matching process is divided into $n - 1$ road segments. The distance of road segment i is denoted as $\bar{l_{i}}$ . The $l_{i}$ can be computed based on the bus line dataset, while $\bar{l_{i}}$ is the sum of all Euclidean distances of the adjacent point pairs on the road segment i. The Mean Absolute Relative Error (MARE) is used as the evaluation metrics for map-matching accuracy, which is calculated as follows

M A R E = \frac{1}{n - j} \sum_{i = 1}^{n - j} | \frac{l_{i, i + 1, \dots, i + j - 1} - \bar{l_{i, i + 1, \dots, i + j - 1}}}{l_{i, i + 1, \dots, i + j - 1}} | \times 100 %

(6)

where $l_{i, i + 1, \dots, i + j - 1} = \sum_{k = i}^{i + j - 2} l_{k}$ is the distance of a route path connecting j consecutive bus stops, and $\bar{l_{i, i + 1, \dots, i + j - 1}} = \sum_{k = i}^{i + j - 2} \bar{l_{k}}$ , j is a hyperparameter to control the length of the route paths. The effect of j will be evaluated later.

Figure 12 demonstrates the MARE with varying parameter j on four randomly selected bus lines. $j = 1$ corresponds to $M A R E = (1 / n - 1) \sum_{i = 1}^{n - 1} | l_{i} - \bar{l_{i}} / l_{i} | \times 100 %$ . The equation gives a reasonable indicator of relative error of each bus line segment (the segment between consecutive bus stops), which can effectively measure and validate the map-matching results. Because the accuracy of road segment distance provided in the bus line dataset is measured by a precision of 100 m, so when $j = 1$ , the results are not reliable.

Figure 12.

The obtained MAREs with varying parameter j on four randomly selected bus lines.

As such, we should use $j > 1$ which helps to reduce the ratio of precision error of the distances between consecutive bus stops. The figure also shows that when $j = 2, 3, 4$ , the impact of the occasional error on the validation results is significantly reduced. The MARE of the same bus line tends to decrease as j increases, which leads to increased denominator of MARE. When $j = n - 1$ , the relative average error of route is degenerated to relative error of the entire route, where $M A R E = | L - \bar{L} / L | \times 100 %$ .

In order to mitigate the impact of the precision errors of the distances between consecutive bus stops, and to effectively measure the details of map matching, we choose $j = 2, 3, 4$ as evaluation parameters for evaluating the matching performance (see Table 1). The obtained results on over 400 bus lines are shown in Figure 13.

Table 1.

Simulation parameters, where $λ$ is road width and $∥ AB ∥$ is the Euclidean distance between A and B.

Parameters	Values
j	2, 3, 4
Total number of bus services	413
$δ$	0.05
$ξ$	Randomly generated $ξ \in (0, \frac{λ}{∥ AB ∥})$
$α$	Randomly generated $α \in (0, \frac{∥ AB ∥}{2})$

Figure 13.

The results of MAREs on more than 400 bus lines.

Figure 13 shows the MARE values obtained from more than 400 bus lines. For the ease of observation, the bus lines’ index is sorted in increasing order of the obtained MARE values. From the figure, we can observe a consistent trend that larger j leads to smaller values of MARE. When $j = 2$ , the obtained results are slightly worse than the other two scenarios; however, it can reflect the fine-grained results, that is, the map-matching results on short bus line segment. On the other hand, when $j = 2$ , almost 50% of bus lines can be map-matched with the obtained MARE above 90%, while over 80% of bus lines obtain matching MARE above 80%. Most of the map-matched bus lines obtained by our proposed method are consistent with that obtained by the Baidu Map, as shown in Figure 14.

Figure 14.

The results on bus 3: (a) result by Baidu Map and (b) result by our method.

However, some bus lines go through very complex road segment sequence, such as the cases shown in the green box in Figure 15. It is very challenging to obtain the correct road sequence for those scenarios automatically. One method to address this issue is to manually insert a certain amount of historical bus locations. We will consider the problem of automatically addressing these challenging issues as our future work.

Figure 15.

The results on bus 36: (a) result by Baidu Map and (b) result by our method.

Conclusion and further work

This article investigated the problem of discovering the road segment sequence traveled by any given bus line. We rely on bus stop information as well as historical bus location records. We utilized the sequence of bus stop locations as an initial sparse trajectory and developed novel methods to insert a certain amount of historical bus locations to form dense trajectory. With the obtained high-quality trajectory, the sequence of road segments traveled by a given bus line can be efficiently recovered. It also provides a reasonable evaluation metric and verification scheme for the map-matching result in the absence of ground truth. The performance has been evaluated using real-world data involving more than 400 bus lines in Singapore.

Footnotes

Handling Editor: Syed Hassan Ahmed

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research project is funded by the Science and Technology Development Strategy Research Program of Tianjin under Grant No. 15ZLZLZF00070.

ORCID iD

Ying Zhou

References

Beaudoin

Farzin

Lawell

Public transit investment and sustainable transportation: a review of studies of transit’s impact on traffic congestion and air quality. Res Transp Econ 2015; 52: 15–22.

Everlyn

. Public transportation in Singapore, http://worksingapore.com/articles/live_4.php

Koller

Widhalm

Dragaschnig

et al . Fast Hidden Markov Model map-matching for sparse and noisy trajectories. In: IEEE international conference on intelligent transportation systems-ITSC, Las Palmas, 15–18 September 2015, pp.2557–2561. New York: IEEE.

Guo

Yang

. Method and use of aggregated dead reckoning sensor and GPS data for map matching. In: Proceedings of the 15th International technical meeting of the satellite division of the Institute of Navigation (ION GPS), Portland, OR, 24–27 September 2002, pp.430–437. New York: IEEE.

White

Bernstein

Kornhauser

Some map matching algorithms for personal navigation assistants. Transport Res C: Emer 2000; 8(1): 91–108.

Greenfeld

. Matching GPS observations to locations on a digital map. In: Transportation Research Board 81st annual meeting, Washington, DC, 13–17 January 2002. Washington, DC: The National Academies of Sciences, Engineering, and Medicine.

Quddus

Ochieng

Zhao

et al . A general map matching algorithm for transport telematics applications. GPS Solut 2003; 7(3): 157–167.

Quddus

Ochieng

Noland

RB.

Reliable multicast in data center networks. Transport Res C: Emer 2007; 15: 312–328.

Bierlaire

Chen

JM.

A probabilistic map matching method for smartphone GPS data. Transport Res C: Emer 2013; 26: 78–98.

10.

Obradovic

Lenz

Schupfner

Fusion of map and sensor data in a modern car navigation system. J Signal Process Sys 2006; 18(45): 112–122.

11.

Atia

Hilal

Stellings

et al . A low-cost lane-determination system using GNSS/IMU fusion and HMM-based multistage map matching. IEEE T Intell Transp 2017; 18(11): 3027–3037.

12.

Yang

Gidofalvi

Fast map matching, an algorithm integrating Hidden Markov Model with precomputation. Int J Geogr Inf Sci 2018; 32(3): 547–570.

13.

Quddus

Noland

Ochieng

A high accuracy fuzzy logic based map matching algorithm for road transport. J Intell Transport S 2006; 10(3): 103–115.

14.

Abdallah

Nassreddine

Denoeux

et al . A multiple-hypothesis map-matching method suitable for weighted and box-shaped state estimation for localization. IEEE T Intell Transp 2011; 12(4): 1495–1510.

15.

Zheng

Trajectory data mining: an overview. ACM T Intell Syst Tech 2015; 6(3): 1–41.

16.

Newson

Krumm

Hidden Markov map matching through noise and sparseness. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA, 4–6 November 2009, pp.336–343. New York: ACM.

17.

Reham

Moustafa

. Accurate and efficient map matching for challenging environments. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, Dallas, TX, 4–7 November 2014, pp.401–404. New York: ACM.

18.

Liu

et al . A ST-CRF map-matching method for low-frequency floating car data. IEEE T Intell Transp 2017; 18(5): 1241–1254.

19.

Hunter

Abbeel

Bayen

The path inference filter: model based low latency map matching of probe vehicle data. IEEE T Intell Transp 2014; 15(2): 507–529.

20.

Lou

Zhang

Zheng

et al . Map-matching for low-sampling-rate GPS trajectories. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, Seattle, WA, 4–6 November 2009, pp.352–361. New York: ACM.

21.

Jan-Henrik

Benedikt

. An algorithm for map matching given incomplete road data. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, Redondo Beach, CA, 7–9 November 2012, pp.510–513. New York: ACM.

22.

Ordonez

SA.

Semi-automatic tool for bus route map matching. In: Horni

Nagel

Axhausen

(eds) The multi-agent transport simulation MATSim, London: Ubiquity Press, 2016, pp.115–122.

23.

JQ.

Match bus stops to a digital road network by the shortest path model. Transport Res C: Emer 2012; 22: 119–131.

24.

Perrine

Khani

Ruiz

JN.

Map matching algorithm for applications in multimodal transportation network modeling. TRB Res Rec 2015; 2537: 62–70.

25.

Vuurstaek

Cich

Knapen

et al . GTFS bus stop mapping to the OSM network. Future Gener Comp Sy 2018, https://doi.org/10.1016/j.future.2018.02.020

26.

http://geojson.io/