Sage Journals: Discover world-class research

Abstract

Real-time urban traffic monitoring is crucial for effective smart city management. Despite the increasing number of sensors collecting large-scale datasets in real time, challenges such as privacy concerns, high capital and maintenance costs, and limited coverage persist, impeding precise network traffic monitoring. General Transit Feed Specification (GTFS) Realtime data, an emerging real-time data source generated by public transit, exhibits high potential to monitor traffic given its public accessibility, low cost, and lack of privacy concerns. This study developed a new methodology leveraging GTFS Realtime data for citywide network sensing. Specifically, the proposed methodology uncovers the typical travel patterns of buses by isolating their operational events, involving boarding and alighting passengers at bus stops. Two algorithms, the segment-trip extraction algorithm and the segment speed estimation algorithm, were developed to implement the proposed methodology. The validation process used Bluetooth data collected in Gainesville, Florida, as the ground truth, while Google Traffic data served as a benchmark for comparison. Results indicate that the space mean speed estimated from GTFS Realtime data can better capture link speed trends and variations, similar to those observed in Bluetooth data. Furthermore, bus travel times derived from GTFS Realtime data demonstrated relatively high correlations with Bluetooth data and low prediction errors compared with estimates based on Google Traffic data. The proposed methodology and findings of this study can be directly used to complement and improve existing real-time traffic monitoring technologies.

Keywords

urban traffic monitoring GTFS Realtime data public transit space-mean speed smart city management

Real-time network monitoring plays a critical role in the development of smart cities. It encompasses the continuous monitoring of various types of traffic information, including speed, travel time, flow, and density. Real-time traffic information provides fundamental inputs for various intelligent transportation systems (ITS) applications, including route planning, traffic control, and incident detection and management. For instance, accurate real-time speed and travel time information can help commuters make informed decisions, such as optimizing their travel routes and minimizing travel time ( 1 ). Moreover, precise real-time traffic flow information enables transportation councils to dynamically optimize traffic signals, thus reducing total travel times and enhancing urban mobility ( 2 ). Additionally, real-time traffic information is invaluable to transportation councils, facilitating the detection of recurrent and non-recurrent incidents, evaluating the severity of traffic impacts ( 3 ), and enabling the timely activation of suitable traffic management strategies ( 4 ).

With the recent advances in sensors and communications technology, an enormous amount of traffic data is continuously captured from various devices each second, paving the way for a promising future in real-time network monitoring. Traffic data can generally be collected from two groups of devices: on-road fixed detectors and mobile data sources. On-road fixed detectors, such as loop detectors, Bluetooth, and video cameras, provide accurate and comprehensive traffic information at specific fixed locations over time. However, their high capital and maintenance costs restrict their deployment to only a subset of links within a network, thereby limiting the spatial coverage of network monitoring ( 5 ). On the other hand, mobile data sources, such as floating car data, connected vehicle (CV) data, and location-based service (LBS) data, offer full network coverage but suffer from some major limitations, such as high cost, privacy concerns, and potential corporate confidentiality issues. While certain traffic councils may have access to certain mobile data sources, these limitations remain key obstacles that hinder the full market penetration and real-time accessibility of these data sources ( 6 ). Moreover, the challenges of appropriate data fusion and handling multi-source data with varying resolutions persist. As a result, precise real-time traffic monitoring remains a challenging and complex task.

Recently, Google has introduced General Transit Feed Specification (GTFS) Realtime data, which are a feed specification that allow public transit councils to provide real-time updates about their fleets to application developers ( 7 ). The dynamic GTFS Realtime data offers frequent and standardized updates ( 8 ), including the real-time locations, travel speed, and stop times of buses. Similar to CV, these buses circulate with other normal traffic within the network, effectively functioning as floating vehicles with trajectories reflecting real-time traffic conditions. Unlike CV, buses usually follow fixed routes, which may result in incomplete spatial coverage. However, the strategic planning of their routes typically aims to maximize passenger catchments, resulting in a relatively extensive and representative spatial coverage of the network. Most importantly, GTFS Realtime data are publicly available in real time at no with no cost, privacy concerns, or confidentiality issues. Therefore, we hypothesize that the GTFS Realtime data can be a complementary dataset for real-time urban traffic monitoring.

Nevertheless, using GTFS Realtime data for urban real-time traffic monitoring still faces several challenges. Firstly, the unique operational events of buses, including boarding/alighting with the potential acceleration/deceleration behavior adjacent to bus stops, introduce multiple observations near bus stops that differ from the conditions of normal traffic. Thus, it is necessary to filter out these irrelevant operational events of buses. Secondly, buses typically operate at lower speed compared with regular traffic, reflecting a distinct speed pattern in their movement. Therefore, it is critical to establish the connection between the travel patterns of buses and normal traffic.

This study aims to assess the potential of GTFS Realtime data and develop a new methodology by leveraging GTFS Realtime data for real-time network sensing. We propose a method to monitor real-time urban traffic by estimating bus speed, which can be capable of automatically identifying bus-specific operational patterns by analyzing and tracking variations in speed. Particularly, buffer zones are set up around bus stops to distinguish between bus stopping (for passenger boarding and alighting) and bus skipping (i.e., bus drivers skip bus stops). For the detected bus stopping, the travel speeds within the bus stop buffer zone were imputed to uncover the general traffic patterns.

Data were extracted and processed to validate the proposed method from various sources including GTFS Realtime, Bluetooth, and Google Traffic. This analysis focused on different road hierarchies in Gainesville, Florida, spanning two weeks in October 2023. While Bluetooth data served as on-road fixed detectors and provided accurate travel speed measurements, their spatial coverage was limited. Therefore, a subset of segments with installed Bluetooth devices was chosen for validation purposes. The results show that GTFS Realtime data can effectively capture link speed variations, exhibiting similarities to the distributions of the observations made using Bluetooth data. The contributions of this work are threefold. Firstly, to the best of our knowledge, this is the first instance of using GTFS Realtime data for real-time urban traffic. Secondly, we proposed a novel methodology to estimate real-time traffic speed by combining GTFS Realtime, GTFS Static, and road network data. We explicitly addressed inaccuracies in traffic state estimation caused by bus passenger boarding and alighting activities. Thirdly, we demonstrated GTFS Realtime data’s capability to accurately capture link speed variations, offering a comparative analysis with Bluetooth detection devices. The proposed methodology and findings of this work can complement existing approaches and enhance real-time traffic monitoring across a broader network.

The remainder of the paper is structured as follows. First, we present a literature review of previous studies and available data sources on real-time urban traffic condition monitoring. Next, we describe the data, including GTFS Static and GTFS Realtime data. We then present the proposed method for bus speed estimation and provide the corresponding algorithm. Afterward, we describe a case study using the GTFS Realtime data in the city of Gainesville, Florida, followed by the validation. Finally, we conclude the paper by summarizing findings, identifying limitations, and suggesting future work.

Literature Review

Real-time urban traffic monitoring denotes the continuous collection and analysis of traffic data for obtaining real-time road condition information on urban roads to optimize the safety of urban traffic ( 9 ). As discussed in the introduction, the traffic monitoring datasets may be categorized into two major categories: on-road fixed detector data and mobile data sources. Typically, on-road fixed detector data sources include the data collected from devices such as inductive loop detectors, traffic cameras, Bluetooth sensors, microwave radars, ultrasonic sensors, and acoustic sensors. While the mobile data sources include floating car data, CV data, LBS data, public transit data, and data from other sources.

On-Road Fixed Detectors

Traditional traffic monitoring relies on on-road fixed detectors, which are devices capable of sensing changes in movement, weight, pressure, and other variables in the surrounding environment ( 10 – 12 ). Real-time urban traffic information is summarized as common macroscopic traffic quantities, including traffic flow, speed, density, and vehicular count by types.

Inductive Loop Detectors: An inductive loop detector operates by installing a wire coil under the road surface to detect electromagnetic changes when vehicles pass over it. This allows the system to analyze traffic flow, defined as the number of vehicles passing per unit of time, and occupancy, representing the fraction of total time that a vehicle occupies a loop ( 13 , 14 ). Inductive loop detectors can stably and continuously acquire traffic data when operating normally, without raising privacy concerns related to travelers. However, inductive loop detectors come with certain drawbacks. First, the large deployment and maintenance costs limit their spatial coverage. For example, ( 15 ) reported that the cost of a single-loop detector ranges between $900 and $2,000, depending on its type. Second, missing data as a result of aging facilities or extreme weather conditions could compromise the continuity of data collection for real-time traffic monitoring ( 16 ). Thus, more advanced algorithms are needed for data imputation. Third, during congestion with frequent stop-and-go traffic, when the same vehicles remain stationary over loop detectors for an extended period, the detected traffic data can be erroneous, making it unsuitable for direct estimation of real-time traffic metrics ( 17 ). Furthermore, certain traffic parameters, such as real-time speed and vehicle classification data, cannot be directly measured by single-loop detectors. Addressing this limitation would require upgrading to dual-loop detectors ( 18 ), incurring a substantial cost.

Traffic Cameras: Traffic cameras have become one of the most widely used devices for stationary traffic data collection, capturing videos with rich traffic information. However, processing these videos for analysis poses challenges, including the removal of background interference ( 19 ). Various vehicle identification methods, such as cascade classifiers and convolutional neural networks, are applied to enhance detection accuracy. Despite providing accurate results at fixed points, this traditional method has limitations in coverage. However, expanding coverage areas incurs a high total cost. Additionally, privacy concerns arise as vehicle information is often obtained by matching license plates ( 10 ). Furthermore, processing real-time traffic information from imagery data is relatively complex ( 20 ).

Bluetooth Sensors: Bluetooth sensors detect anonymous Bluetooth signals from passing vehicles between two intersections where the devices are placed. These sensors can identify the unique 48-bit Bluetooth Mac addresses of enabled devices, such as vehicle navigation systems, mobile phones, and headsets, within the traffic stream. Real-time traffic information, including segment travel time, speed, and vehicle counts, is obtained by matching subsequent detections from Bluetooth devices along the road through rigorous filtering processes ( 21 ). Despite serving as a valuable data source for traffic monitoring, the data collected by Bluetooth devices may exhibit biases. For example, simultaneous recording of multiple Bluetooth devices can result in numerous similar paired data counts, making it difficult to determine the accuracy of the recorded data. In addition, given the high maintenance and deployment costs, Bluetooth devices have low spatial coverage.

Microwave Radars: Microwave radars for traffic applications commonly operate on X-band (10 GHz) or K-band (24 GHz), with fixed antennas ( 22 ). These off-road detectors are employed for collecting traffic information, such as traffic flow, speed, and occupancy ( 30 ). However, microwave radars cannot operate continuously, resulting in the data collected having a low temporal continuity. Additionally, false alarms generated by the device itself can lead to interruptions in data collection ( 22 , 23 , 30 ).

Ultrasonic/Acoustic Sensors: Similar to microwave radars, ultrasonic sensors collect traffic data by emitting sound waves toward the target vehicles from the road shoulder and calculating the interval between the sending and receiving time ( 24 ). However, these sensors are susceptible to interference from ice or other debris ( 30 ). Acoustic sensors, which capture the acoustic features of vehicles approaching or moving away, face limited adoption because of the challenge of distinguishing vehicle acoustic signals from background noise ( 30 ). This difficulty often results in a significant amount of missing data.

Mobile Data Sources

Mobile data sources encompass transportation data collected through mobile devices and technologies, primarily relying on GPS, to acquire real-time information. These data are processed to comprehend traffic patterns, forecast congestion, and monitor urban traffic conditions. Common data sources include floating car data, CV data, LBS data, data from other councils, and public transit data generated through probes.

Floating Car Data: The prevalence of GPS receivers in vehicles and cell phones has made floating cars a common and widely used source for traffic data collection. This approach captures various types of traffic information, including vehicle position, speed, travel direction, and travel time ( 25 , 26 ). Floating cars, in contrast to on-road fixed detectors, can traverse most road segments in a city ( 6 ), providing extensive spatial coverage and measuring travel speed and time of each road to derive the full traffic map. In addition, collecting traffic data using floating cars is often more cost-effective than deploying on-road fixed detectors. However, the data accuracy remarkably depends on the penetration rate of floating cars ( 27 , 28 ) and the signal quality ( 29 ). Since floating cars represent discrete points in the traffic flow, insufficient data points can make it challenging to estimate traffic flow information like traffic flow density ( 30 ). Additionally, floating car data may collect travelers’ personal information, such as IDs, which can raise privacy concerns ( 31 ).

Connected Vehicle (CV) Data: The integration of CVs as a source of probe vehicle data has emerged as a new research trend, primarily because of their relatively low cost compared with on-road fixed sensing infrastructures ( 32 ) and their broader scope and temporal coverage compared with floating car data ( 33 ). Existing literature has demonstrated that CV-based probe vehicle data can be used effectively to estimate travel speed, travel time ( 34 – 37 ), traffic volumes ( 38 , 39 ), queue lengths ( 40 , 41 ), and other traffic information ( 33 , 42 ). CV data provide a higher resolution compared with floating car data ( 33 ), allowing for more precise monitoring and evaluation of traffic flow conditions ( 43 ). They enable the characterization of each sampled vehicle directly with higher accuracy, eliminating the need to divide road segments into grids ( 44 , 45 ) when monitoring real-time traffic. However, the use of CV data raises privacy concerns among travelers and relevant authorities.

Location-Based Service (LBS) data: LBS data, obtained from third-party apps on smartphones using GPS technology or location sensing capabilities, provides valuable real-time traffic-related information. This includes traffic flow analysis ( 46 ), congestion and incident detection ( 47 ), route optimization ( 48 ), and traffic management ( 47 ). With the widespread adoption of GPS, LBS data becomes easily accessible and is a valuable source for real-time traffic monitoring. However, the collection of LBS data, which involves tracking and recording individuals’ movements, raises significant privacy concerns. Moreover, not all users have access to location-based services, which can lead to the uneven temporospatial distribution of data accuracy.

Public Transit as Probes: In previous studies related to using transit buses as probes, some research used bus cellular data or GPS data to monitor urban traffic conditions in real time. For instance, ( 49 ) employed bus cellular data to develop an intelligent traffic management system capable of real-time monitoring and predicting urban traffic conditions. ( 50 ) concentrated on forecasting real-time passenger travel demand for the enhancement of urban mobility by leveraging bus cellular data. ( 51 ) identified urban traffic real-time congestion points based on bus cellular data. ( 52 ) leveraged cellular data from public transport passengers’ smartphones to monitor real-time urban traffic conditions. Despite the benefits of using bus cellular data for monitoring urban traffic, challenges related to privacy protection and data processing persist. First, if the collected location information is not properly anonymized, it could lead to potential privacy issues. Second, the noise and inaccuracies in the data necessitate sophisticated data processing methods, increasing the complexity and cost of research.

In recent years, GTFS is an emerging dataset characterized by a standardized format for public transportation schedules and associated geographic information, comprising Static and Realtime parts. GTFS Static data provide scheduled information about transit services, such as schedules, stops, and routes, which are useful for planning services. GTFS Realtime data offer live updates as transit buses traverse routes, offering information such as vehicle positions, speed, and service alerts, which are crucial for day-to-day operations and real-time monitoring of traffic conditions. Therefore, GTFS data empowers developers to create comprehensive transit applications that provide both planned and real-time information to road users. Notably, GTFS data are easily accessible and completely open to the public without any privacy concerns. Researchers have used GTFS data to understand real-time bus operations. For example, GTFS Realtime data have been used to predict systemic and stochastic bus delays on road segments ( 8 ), and to assess on-time performance and route speed metrics for a public transit system ( 53 ).

Additionally, the traditional methods that use buses to monitor real-time urban traffic mainly use average speed to monitor real-time urban traffic without considering the impact of bus skipping or passenger boarding and alighting activities. For instance, ( 54 , 55 ) updated historical speed by calculating the average bus speed within an estimation interval to predict and monitor urban traffic speed; ( 56 ) evaluated real-time traffic conditions by leveraging vehicle-based sensor data, with mean speed as the performance metric; ( 26 ) measured real-time traffic speed and vehicle travel time by using floating car data based on the cellular phone. However, given the potential for passenger boarding and alighting activities during bus operations, which can lead to a significant number of zero-speed data points, it is not appropriate to merely calculate the average speed of these points. Such issues necessitate the application of data imputation techniques for resolution.

Google Traffic data: Google Traffic data serve as a valuable dataset provided by Google, capturing real-time urban traffic conditions through crowd-sourced road congestion data collected from smartphones with the Google Maps Application. This Application returns the traffic time metric (i.e., “duration_in_traffic”) of any given road section by using a dedicated application programming interface (API) ( 53 ). Despite its utility, the underlying model powering the Google Traffic system remains undisclosed in detail, making it still a black box ( 53 ). Moreover, there is a limit on the number of free API requests allowed during a 24-h period.

Summary

In summary, Table 1 provides a comprehensive comparison of the strengths and weaknesses of various urban real-time traffic monitoring datasets, considering key quality metrics such as cost effectiveness, accuracy, temporal contiguity, preprocessing simplicity, spatial coverage, accessibility, privacy protection, and penetration rate. The number of “*” in Table 1 indicates the performance level of each grading variable. For example, intrusive detectors exhibit a high cost with minimal privacy concerns. Note that the classification levels in the table are qualitative analyses, offering insights into the strengths and weaknesses of various datasets from different perspectives.

Table 1.

Comparison of the Pros and Cons for DataSets Used for Urban Traffic Monitoring

Quality Metrics	On-Road Fixed Detectors			Mobile Data Sources
Quality Metrics	Loop detectors	Traffic camera	Traffic sensors	Floating car data	CV data	LBS data	Google traffic data	GTFS data
Cost effectiveness	*	*	*	*	*	*	*	***
Accuracy	*	**	*	**	***	*	*	***
Temporal contiguity	**	*	**	**	**	*	***	**
Preprocessing simplicity	*	*	*	*	*	*	***	***
Spatial coverage	*	*	*	***	***	**	***	**
Accessibility	*	*	*	*	*	*	**	***
Privacy protection	**	*	**	*	*	*	**	***
Penetration rate	NA	NA	NA	*	*	**	***	**

Note: CV = connected vehicle; LBS = location-based service ; GTFS = General Transit Feed Specification; NA = not available. The number of “*” represents the performance of the grading variables; Cost effectiveness: the overall cost effectiveness of collecting data; Accuracy: the degree of different data sources precisely reflects the real-world situation; Temporal contiguity: time continuity of the different data sources; Preprocessing simplicity: the straightforwardness of processing the raw data; Spatial coverage: the geographical spatial coverage of different data sources; Accessibility: the easiness of data acquisition; Privacy protection: the protection of travelers’ private information across different data sources; Penetration rate: the market share of data sources in real-time urban traffic monitoring.

Data

GTFS data encompass two main formats: GTFS Static and GTFS Realtime. The architecture of GTFS data is illustrated in Figure 1. The GTFS Static format defines fixed schedules and geographic information for public transport services, while GTFS Realtime data provide dynamic updates, including real-time information on vehicle locations, and instantaneous speed at different timestamps.

Figure 1.

Comprehensive overview of General Transit Feed Specification (GTFS) data architecture.

GTFS Static Data

The GTFS Static dataset is organized into required and optional files. The core dataset includes six requisite text files: “agency.txt,”“routes.txt,”“trips.txt,”“stop_times.txt,”“stops.txt,” and “calendar.txt” which encompass critical details on public transit services. Supplemental files, such as “shapes.txt” and “calendar_dates.txt,” exist to broaden the scope of GTFS Static information. Analyzing these GTFS Static feed files reveals comprehensive insights into the bus networks, including stops and timetables. This dataset is crucial in complementing GTFS Realtime data to facilitate the observation and analysis of traffic conditions in real time.

GTFS Realtime Data

GTFS Realtime data, an extension of the GTFS Static data, serve as a valuable feed specification that enables public transportation councils to provide real-time updates to application developers. The GTFS Realtime specification requires vendors to publish certain data feeds as JSON or XML files, typically accessible through public APIs provided by each vendor. GTFS Realtime primarily comprises three essential components: 1) trip update, which delivers real-time updates for specific trips, including estimated arrival and departure times at stops, enabling passengers to understand the real-time operational status of specific vehicles, such as their punctuality and any potential delays; 2) service alert, which provides information on changes to public transit services, including route adjustments, delays, cancellations, and emergency situations, helping passengers with timely notifications and information about alterations to services; and 3) vehicle position, which it offers real-time information in relation to the current positions of buses, including their geographic coordinates (latitude and longitude), direction, and current status (e.g., in transit, at a stop).

Despite there being slight variations from different providers in GTFS Realtime data feeds, vehicle position updates are usually consistent. These updates are often recorded in the getvehicles fields, providing speed and location information of all real-time available transit vehicles in the system. The attributes of getvehicles are listed in Table 2. GTFS Realtime trip update data were collected from the BusTime Developer application programming interface (API) at intervals of 15 s, which is a common update frequency for US transit systems ( 57 ). A data sample is as follows:

{

′bustime-response′: {

′vehicle′: ( [

{

′vid′: ′2112′,

′tmstmp′: ′20230717 05:52:40′,

′lat′: ′29.64615666666667′,

′lon′: ′-82.32275999999999′,

′hdg′: ′352′,

′pid′: 64,

′rt′: ′2′,

′des′: ′Rosa Parks/Downtown′,

′pdist′: 29151,

′dly′: False,

′spd′: 0,

′tatripid′: ′13972′,

′origtatripno′: ′797689′,

′tablockid′: ′10021′,

′zone′: ′′,

′mode′: 0,

′psgld′: ′EMPTY′,

′stst′: 19800,

′stsd′: ′2023-10-13′

}

]

}

Table 2.

Attributes Fields Description in getvehicles

Attribute	Description	Type
vid	Alphanumeric string representing the vehicle ID	Integer
tmstmp	Date and local time of the last positional update of the vehicle	Date
lat	Latitude position of the vehicle	Float
lon	Longitude position of the vehicle	Float
hdg	Heading of vehicle (degree)	Integer
pid	Pattern ID of trip currently being executed	Number
rt	Route that is currently being executed by the vehicle	Integer
des	Destination of the trip being executed by the vehicle	String
pdist	Linear distance in feet that the vehicle has traveled	Integer
dly	Vehicles delay or not	Bool
spd	Vehicles speed (mph)	Integer

Methodology

GTFS Realtime data present significant advantages as a promising data source for real-time urban traffic monitoring. Firstly, buses enabled with GTFS Realtime data function, circulate with normal traffic within the network, offering trajectories that can capture real-time normal traffic information. Secondly, the strategic planning of bus routes ensures extensive and representative spatial coverage. Most importantly, the public and free accessibility of GTFS Realtime data make them highly suitable for real-time urban traffic sensing.

The proposed method, outlined in Figure 2, aims to monitor real-time urban traffic by estimating bus speed using the GTFS Realtime data. The data preprocessing involves three key steps: 1) data collection, which gathers GTFS Static and Realtime data, along with road information data; 2) static data processing, mapping bus routes and stops based on the GTFS Static data, establishing route buffers, projecting all bus stops onto bus routes, and defining stop buffers for detecting bus operational events; and 3) GTFS Realtime data processing, removing the data points outside the road segment buffers, filtering data within stop buffers to identify boarding/alighting events.

Figure 2.

Conceptual diagram of the proposed methodology.

On completing the data preprocessing, single bus trips on each road segment are identified based on the method presented in the Segment-Trip Extraction subsection. The bus travel time for each trip on the specific segment k during time interval $T_{q}$ is then estimated using the method detailed in the Travel Time Estimation subsection. Subsequently, during the specified time interval $T_{q}$ , the space mean speed for a road segment k, which is used for urban traffic monitoring, is evaluated by dividing the total distance traveled by the total time spent. In practice, for real-time monitoring, these intervals should be set as short as possible. The notations for all parameters used in this paper are summarized in Table 3.

Table 3.

A List of Symbols and Notations

Notation	Description
$Δ t_{1, k}$	Type 1 travel time in segment $k$
$Δ t_{2, k}$	Type 2 travel time in segment $k$
$Δ t_{3, k}$	Type 3 travel time in segment $k$
$Δ t_{k}$	Total travel time in segment $k$
$N$	Number of subsegments neither close to intersections nor within stop buffers
$M$	Number of stops where bus halts
$P$	Number of stops where bus skips
$t_{s}^{i}, t_{e}^{i}$	Timestamp of first/last data point in subsegment $i$
$j, l$	Stop buffer $j$ (where bus skips) and $l$ (where bus stops)
$v_{s - 1}^{l}, v_{e + 1}^{l}$	Instantaneous speed of the last/first data point before/after buffer $l$
$t_{s - 1}^{j}, t_{s - 1}^{l}$	Timestamp of the last data point before stop buffer $j, l$
$t_{e + 1}^{j}, t_{e + 1}^{l}$	Timestamp of the first data point after stop buffer $j, l$
$t_{s}^{k}, t_{e}^{k}$	Timestamp of the first/last data point in segment $k$
$t_{e}^{k - 1}, t_{s}^{k + 1}$	Timestamp of the last/first data point in segment $k - 1 / k + 1$
$x_{s - 1}^{l}, x_{e + 1}^{l}$	Locations of the last/first data point before/after stop buffer $l$
$x_{s}^{k}, x_{e}^{k}$	Locations of the first/last data point in segment $k$
$x_{o}^{k}, x_{o}^{k + 1}$	Locations of the geographical starting point in segment $k / k + 1$
$x_{e}^{k - 1}, x_{s}^{k + 1}$	Locations of the last/first data point in segment $k - 1 / k + 1$
$Ω$	The set of all intersections within study area
$S$	The set of all road segments within study area
$s_{k}$	The k-th route segment
$ω_{k}$	The k-th intersection
$ξ$	Trip ID
$η$	Trip date
$T_{k}$	The set of all single trips in $s_{k}$ on $η$
T	The set of all trips
D	The set of all GTFS Realtime data records
$D_{m}$	The m-th GTFS Realtime data record in $ξ$
$b_{m}$	Bus location corresponding to $D_{m}$ in $ξ$
$r_{seg}$	Route buffer radius
$T$	The set of all time intervals
$T_{q}$	The q-th time interval
${\bar{v}}_{k}^{q}$	Space mean speed in $s_{k}$ during $T_{q}$
$δ_{k}^{q}$	Total travel time in $s_{k}$ during $T_{q}$
$ψ_{k}^{q}$	Total number of trips in $s_{k}$ during $T_{q}$
$τ_{a}$	Single trip in $s_{k}$
$Δ t_{k}^{a}$	Travel time in $s_{k}$ for trip $τ_{a}$
$t$	Timestamp of the first point of trip $τ_{a}$ in $s_{k}$
$u_{k}$	The set of space mean speed in $s_{k}$ for $T$

Note: GTFS = General Transit Feed Specification.

Segment-Trip Extraction

Since the space mean speed for a road segment is taken as the state variable for real-time urban traffic monitoring, it is crucial to extract single trips within each road segment and time interval. Algorithm 1 is the segment-trip extraction algorithm. The complete set of all of the road segments within the study area, denoted by $S$ , is first determined by leveraging this algorithm. Specifically, $S$ is set to be an empty set initially. Then, for each intersection $ω_{k}$ in the intersection location set $Ω$ , $s_{k}$ is defined as the road segment extending from $ω_{k}$ to its subsequent neighbor $ω_{k + 1}$ and is added to the segment set $S$ .

Algorithm 1 Segment-Trip Extraction Algorithm
1: input GTFS Realtime dataset $D$ , intersection locations $Ω$ , segment buffer radius $r_{seg}$ , the set of all trips $T$
2: $S \leftarrow \emptyset$
3: for $ω_{k}$ do in $Ω$
4: $s_{k} \leftarrow$ road segment between $ω_{k}$ and $ω_{k + 1}$
5: $S \leftarrow S \cup {s_{k}}$
6: end for
7: $n \leftarrow$ number of road segments in S
8: $T \leftarrow {T_{1}, T_{2}, \dots, T_{n}}$ where $T_{1}, T_{2}, \dots, T_{n}$ are $\emptyset$
9: for $s_{k}$ in $S$ do
10: for $D_{m}$ do in $D$
11: $tid \leftarrow$ trip id of $D_{m}$
12: $date \leftarrow$ date of $D_{m}$
13: $b_{m}$ ← bus location of $D_{m}$
14: $d \leftarrow$ distance between $s_{k}$ and $b_{m}$
15: if $d < r_{seg}$ then
16: $T_{k}$ ← $T_{k}$ ∪ ( $ξ$ , $η$ )
17: end if
18: end for
19: end for
20: output $T$ , $S$

Once the road segment set $S$ is established, the subsequent objective is to identify the set of single trips, $T_{k}$ , within each segment $s_{k}$ . A single trip is uniquely identified by a combination of its trip ID $ξ$ and the date $η$ on which it occurred, as the same $ξ$ can appear on different dates. Therefore, for each GTFS Realtime record $D_{m}$ , $ξ$ , date $η$ , and location $b_{m}$ are extracted. Following this, the distance $d$ between the road segment and bus location $b_{m}$ is obtained. If the distance $d$ to a road segment $s_{k}$ is less than the route buffer radius $r_{seg}$ , then the trip with that specific $ξ$ and $η$ is considered to be within segment $s_{k}$ , and the combination of $ξ$ and $η$ is subsequently added to the set $T_{k}$ . The sets for the road segments and associated trips on each road segment are then used for travel time estimation.

Travel Time Estimation

The estimation of space mean speed along a specific road segment k is performed by dividing the total distance traveled by the total travel time spent. This total travel time spent within the segment k involves three distinct types of travel time, as depicted in Figure 3: 1) travel time type 1, $Δ t_{1, k}$ , the travel time for passing through subsegments that are neither near intersections nor within the stop buffers; 2) travel time type 2, $Δ t_{2, k}$ , the travel time for passing through the stop buffers; and 3) travel time type 3, $Δ t_{3, k}$ , the travel time for passing through subsegments adjacent to the intersections.

Figure 3.

Segment travel time estimation.

Three types of travel times can be obtained using Equations 1 to 3. The total travel time of a trip on a segment is determined by Equation 4. For Type 1 travel time, when a bus traverses subsegments that are neither near intersections nor within stop buffers, it typically does not exhibit abnormal speed patterns because of operational events. Therefore, Type 1 travel time does not need imputation, which can be obtained directly as the sum of the time differences between the first and last data points in each subsegment.

For Type 2 travel time, when a bus approaches a bus stop, it either skips the stop or halts for passengers boarding/alighting. In the latter cases, bus speed are influenced by these operational events, including the necessary acceleration and deceleration for stopping. Consequently, a method is proposed to automatically identify bus-stopping actions and mitigate the impact of these operational events. Firstly, a buffer zone with a 20-m radius is set around each bus stop within the segment. This specific radius is determined by taking into account several factors, including the distances required for bus acceleration and deceleration, the space needed for safely merging into regular traffic flows, and the length of the bus ( 58 ). Subsequently, for each stop buffer within the road segment, timestamps and instantaneous speed are collected from GTFS Realtime data at two specific points: one immediately before entering the buffer zone and the other immediately on exiting. Then, two time durations are computed: the first duration represents the actual time taken by the bus to traverse the buffer, obtained directly as the difference between the two aforementioned timestamps; the second duration is the imputed time required to cross the stop buffer, derived by dividing the distance between these two points by the average speed of the instantaneous speed recorded at these points. The process of passengers boarding and alighting the bus is assumed to require 15 s to complete ( 59 ). Therefore, if the imputed duration is at least 15 s less than the actual time difference between these two points, the bus is inferred to have dwelled at the stop. In such instances, the imputed time duration is added to Type 2 travel time. Otherwise, the bus is considered to have skipped the stop, and the actual time difference is added to Type 2 travel time.

To obtain Type 3 travel time, additional points encompassing the last point of the preceding segment $k - 1$ , and the first point of the subsequent segment $k + 1$ are used. The estimation consisted of summing two weighted time differences: 1) the time difference between the last point of segment $k - 1$ and the first point of segment $k$ , weighted by the ratio of the distance from the start of segment $k$ to its first point to the distance between these two points; and 2) the time difference between the last point of segment $k$ and the first point of segment $k + 1$ , weighted by the ratio of the distance from the last point of segment $k$ to its endpoint to the distance between these two points.

Δ t_{1, k} = \sum_{i = 1}^{N} (t_{e}^{i} - t_{s}^{i})

(1)

where $N$ represents the number of subsegments that are neither close to intersections nor within the stop buffers, and $t_{s}^{i}$ and $t_{e}^{i}$ denote the timestamps of starting and ending points, respectively, of subsegment $i (\in k)$ .

Δ t_{2, k} = \sum_{j = 1}^{P} (t_{e + 1}^{j} - t_{s - 1}^{j}) + \sum_{l = 1}^{M} \frac{2 (x_{e + 1}^{l} - x_{s - 1}^{l})}{(v_{e + 1}^{l} + v_{s - 1}^{l})}

(2)

where

$P$ is the number of stops that the bus skips,

$M$ is the number of stops where the bus halts,

$t_{s - 1}^{j}$ , $t_{s - 1}^{l}$ represent the timestamps of the last points before stop buffers $j$ and $l$ , respectively,

$t_{e + 1}^{j}$ , $t_{e + 1}^{l}$ denote the timestamps of the first points after stop buffers $j$ and $l$ , respectively, and

$x_{s - 1}^{l}$ , $v_{s - 1}^{l}$ , $x_{e + 1}^{l}$ , $v_{e + 1}^{l}$ denote the locations and instantaneous speed at the last points before stop buffer $l$ and the first point after stop buffer $l$ , respectively.

Δ t_{3, k} = \frac{(t_{s}^{k} - t_{e}^{k - 1}) (x_{s}^{k} - x_{o}^{k})}{(x_{s}^{k} - x_{e}^{k - 1})} + \frac{(t_{s}^{k + 1} - t_{e}^{k}) (x_{o}^{k + 1} - x_{e}^{k})}{(x_{s}^{k + 1} - x_{e}^{k})}

(3)

where

$t_{s}^{k}$ , $t_{e}^{k - 1}$ , $t_{e}^{k}$ , $t_{s}^{k + 1}$ represent the timestamps of the first point in segment $k$ , the last point in segment $k - 1$ , the last point in segment $k$ , and the first point in segment $k + 1$ , respectively, and

$x_{s}^{k}$ , $x_{o}^{k}$ , $x_{e}^{k - 1}$ , $x_{e}^{k}$ , $x_{o}^{k + 1}$ , $x_{s}^{k + 1}$ are the locations of the first point in segment $k$ , the starting location of segment $k$ , the last point in segment $k - 1$ , the last point in segment $k$ , the starting location of segment $k + 1$ (i.e., the ending location of segment $k$ ), and the first point of segment $k + 1$ , respectively.

Δ t_{k} = Δ t_{1, k} + Δ t_{2, k} + Δ t_{3, k}

(4)

where $Δ t_{k}$ is the total travel time of a trip on segment k.

Space Mean Speed Estimation

After estimating the travel time for each trip across every road segment, the subsequent step involves estimating the space mean speed for each segment in different time intervals. The segment speed estimation algorithm is outlined in Algorithm 2. For each specified time interval $T_{q}$ and road segment $s_{k}$ , the length of the road segment is determined and the associated set of trips $T_{k}$ is selected. Subsequently, the total travel time, which accounts for the cumulative travel times of all trips on road segment $s_{k}$ during the time interval $T_{q}$ , is recorded as $δ_{k}^{q}$ . Additionally, $ψ_{k}^{q}$ is recorded as the total number of trips on segment $s_{k}$ during the time interval $T_{q}$ . The space mean speed, denoted as ${\bar{v}}_{k}^{q}$ , is then estimated by the product of $ψ_{k}^{q}$ and the length of $s_{k}$ divided by the total travel time $δ_{k}^{q}$ . These segment space mean speed can then be used for monitoring urban traffic. However, it is worth noting that in megacities with extensive dedicated bus lanes, the direct application of the proposed method for real-time traffic monitoring may not be applicable because of the isolation of bus flow from the normal traffic flow.

Algorithm 2 Segment Space Mean Speed Estimation Algorithm
1: input road segments $S$ , the set of all trips $T$ , time intervals $T$
2: $u \leftarrow \emptyset$
3: for $s_{k}$ in S do
4: length← length of $s_{k}$
5: for $T_{q}$ in $T$ do
6: $δ_{k}^{q} \leftarrow 0$
7: $ψ_{k}^{q} \leftarrow 0$
8: for $τ_{a}$ in $T_{k}$ do
9: $Δ t_{k}^{a}$ ← estimated travel time of trip $τ_{a}$
10: $t$ ← the timestamp of the first point of the trip $τ_{a}$ on segment $s_{k}$
11: if $t \in T_{q}$ then
12: $δ_{k}^{q} \leftarrow δ_{k}^{q} + Δ t_{k}^{a}$
13: $ψ_{k}^{q} \leftarrow ψ_{k}^{q} + 1$
14: end if
15: end for
16: ${\bar{v}}_{k}^{q} \leftarrow ψ_{k}^{q} * length / δ_{k}^{q}$
17: $u \leftarrow u \cup {{\bar{v}}_{k}^{q}}$
18: end for
19: end for
20: output $u$

Case Study

Study Area

The city of Gainesville, a midsize city with a populations of about 140,000, was chosen to assess the potential of GTFS Realtime data in the context of real-time urban traffic monitoring. As depicted in Figure 4, the city has a well-established public transportation system with 38 bus routes with two directions in operation. These bus routes are strategically planned with roughly uniform coverage across the important links of the city’s network, making it a suitable case study. Python 3.11.4 and GeoPandas library 0.12.2 were employed to analyze and visualize GTFS Realtime data.

Figure 4.

Bus spatial coverage in research area.

Data Acquisition

GTFS data were used as the primary data source for the analysis and traffic conditions monitoring in this paper. The GTFS Static data, containing the bus schedule data for the fall of 2023, were obtained directly from the Gainesville Department of Transportation. Additionally, GTFS Realtime data for all bus routes depicted in Figure 4 were collected from BusTime Developer API at 15-s intervals. The dataset used for validation covered various road hierarchies over a two-week period in October 2023. The GTFS Realtime dataset encompassed extensive observation hours, ranging from 6 a.m. to 11 p.m., capturing the full range of traffic states throughout the day. Initially, the raw dataset comprised 39 bus routes, each with two directions, totaling 1,193,613 data points. Following the preprocessing step (described in the Methodology section.), 1,061,207 data points remained, accounting for 88.9% of the original dataset.

The other two data sources used in this paper were Bluetooth data and Google Traffic data. The city of Gainesville deployed over 100 Bluetooth devices at the entry and exit points of road segments. These devices used Bluetooth technology to measure travel time and estimate vehicle speed by tracking signal propagation time between devices at different locations. Additionally, they collected comprehensive traffic data such as vehicle counts, types, travel time, and estimated speed, which were crucial for traffic flow analysis and management decision making. This data source effectively captured vehicle movement along specific road segments and provided a means to validate the accuracy of GTFS Realtime data in estimating space mean speed. However, it is worth noting that approximately one-third of the Bluetooth devices were inactive. This situation underscores the robustness of GTFS Realtime data, which can serve as a complementary data source to address the limitations of Bluetooth data availability.

Google Traffic data is derived from the Google Traffic model, which collected data through the Google Maps Directions API. The model used crowd-sourced road congestion data collected from smartphones installed with the Google Maps App ( 53 ). Through this approach, real-time information on specific road segments traversed by smartphones was provided, enabling analysis of urban traffic patterns and facilitating route optimization. Although the details of the underlying model powering the service were not fully disclosed, the API documentation specified that “the returned duration_in_traffic should be the best estimate of travel time given what was known about both historical traffic conditions and live traffic” and “live traffic becomes more important the closer the departure_time is to now.” Therefore, the parameter of departure_time was set to “now,” and the results of duration_in_traffic were recorded and used to estimate the space mean speed in road segments.

It should be noted that the Bluetooth and Google Traffic datasets provided the operating speed and travel times for regular traffic, while GTFS Realtime data described the traffic information of buses. Given the typical speed difference between regular traffic and buses, the space mean speed derived from GTFS Realtime data needed to be scaled to align with those derived from the Bluetooth data and Google Traffic data. This adjustment enables a standardized comparison between the space mean speed of buses and that of general traffic, thereby enabling a coherent analysis of urban transit dynamics. Additionally, it is essential to ensure consistency in the validation time periods and road segments across the three data sources.

Hypothesis Testing for Speed Variation Pattern

This section presents the results of hypothesis testing for speed variation patterns to validate the accuracy of the space mean speed derived from GTFS Realtime data. Bluetooth data, being direct travel time measurements of randomly sampled vehicles, were considered as ground truth. In contrast, Google Traffic data were outputs of an unknown black-box algorithm ( 53 ), and the space mean speed derived from it were used as a benchmark for comparison. Given the high capital and maintenance cost, Bluetooth devices are typically installed only on a limited number of links within the network. For the purposes of validation, the analysis specifically focused on those road segments for which data were concurrently available from all three aforementioned sources.

As illustrated in Figure 5, five road segments positioned at different locations within the research area were selected as validation segments. These road segments exhibited different road hierarchies and speed limits, as outlined in Table 4, thereby enabling thorough validation of the applicability of the proposed method.

Figure 5.

Selected validation segments in the research area.

Table 4.

Characteristics of Road Segments in Validation Area

Validation segment	Length (m)	Speed limit (mph)	Bus route	Road hierarchy	Data collection period
Segment 1	1,997	45	1I, 38I, 150I	Arterials	10.09–10.13
Segment 2	1,673	40	20I	Collectors	10.09–10.13
Segment 3	1,987	45	1O, 38O, 150O	Arterials	10.25–10.29
Segment 4	2,190	35	3O	Local roads	10.25–10.29
Segment 5	1,606	45	43O	Collectors	10.25–10.29

Note: In “Bus route” field, “I” represents bus routes that enter the downtown direction; ’O’ represents the opposite.

To ensure the coherence of the analysis, the space mean speed for each road segment was estimated based on the Bluetooth data and Google Traffic data extracted from the same time period as the GTFS Realtime data. The space mean speed was estimated by setting different time intervals, primarily based on the amount of available data and the level of accuracy required. Specifically, in validation segments 1 to 3, where there were more high-frequency bus lines, a 30-min time interval was selected. Conversely, for segments 4 to 5, with fewer bus lines, a 1-h time interval was chosen.

The space mean speed variations based on the three data sources across time for the five road segments are illustrated in Figure 6. It is shown that the variation patterns of GTFS Realtime data closely resembled that of the Bluetooth data. However, it is crucial to acknowledge that Bluetooth and Google Traffic data primarily captured travel patterns from regular traffic, while GTFS Realtime data recorded travel information from buses. As a result, a speed gap between these data types was evident.

Figure 6.

Space mean speed validation compared with Bluetooth data and Google Traffic.

To assess whether the speed variations of buses and regular traffic were significantly different from each other, Kolmogorov–Smirnov (KS) tests were conducted. Given the inherent speed differences between buses and regular traffic, the speed value of each data point was divided by the sum of speed to obtain a normalized space mean speed $v_{k, i}^{m'}$ in Equation 5:

v_{k, i}^{m'} = \frac{v_{k, i}^{m}}{\sum_{i = 1}^{n} v_{k, i}^{m}}

(5)

where

$m$ is the dummy variable, indicating the index of the data type, such as GTFS Realtime data, Bluetooth data and Google Traffic data,

$k$ is the dummy variable, indicating the index of the road segment, $\forall k \in [1, 5]$ ,

$i$ is the index presenting the $i - th$ data point,

$n$ is the total number of data points for data type $m$ on segment $k$ , and

$v_{i, k}^{m}$ is the space mean speed of data point $i$ on segment $k$ of data type $m$ .

The null and alternative hypotheses for the KS tests were defined as follows:

$H_{0}$ : The normalized space mean speed obtained from Bluetooth data and GTFS Realtime data/Google Traffic data follow the same distribution;

$H_{A}$ : The normalized space mean speed obtained from Bluetooth data and GTFS Realtime data/Google Traffic data do not follow the same distribution.

Table 5 presents the values of test statistics of the KS tests, followed by the p-value in parentheses. With a significance level of α = 0.05, the majority of the results for BL–GTFS were not statistically significant, indicating that the space mean speed estimated from GTFS Realtime data exhibits similar temporal variation patterns to those derived from Bluetooth data. Nevertheless, results from multiple cases in the BL–GT comparison showed statistical significance, indicating disparities in the temporal distribution patterns between the two datasets. These findings suggest that GTFS Realtime data are more proficient in capturing real-time speed variations compared with Google Traffic data.

Table 5.

Kolmogorov–Smirnov Test Results For All Validation Segments

Date	BL–GTFS	BL–GT
Segment 1
10/09/2023	0.273 (0.173)	0.515 (0.000)
10/10/2023	0.121 (0.973)	0.333 (0.050)
10/11/2023	0.152 (0.851)	0.364 (0.025)
10/12/2023	0.091 (0.999)	0.242 (0.290)
10/13/2023	0.333 (0.051)	0.303 (0.097)
Segment 2
10/09/2023	0.273 (0.173)	0.394 (0.011)
10/10/2023	0.182 (0.654)	0.286 (0.635)
10/11/2023	0.121 (0.973)	0.364 (0.025)
10/12/2023	0.143 (0.999)	0.286 (0.635)
10/13/2023	0.121 (0.973)	0.385 (0.300)
Segment 3
10/25/2023	0.121 (0.973)	0.273 (0.173)
10/26/2023	0.273 (0.173)	0.308 (0.588)
10/27/2023	0.182 (0.654)	0.242 (0.290)
10/28/2023	0.357 (0.343)	0.455 (0.001)
10/29/2023	0.214 (0.921)	0.273 (0.173)
Segment 4
10/25/2023	0.143 (0.999)	0.357 (0.343)
10/26/2023	0.214 (0.921)	0.214 (0.921)
10/27/2023	0.308 (0.588)	0.286 (0.635)
10/28/2023	0.152 (0.851)	0.286 (0.635)
10/29/2023	0.308 (0.588)	0.214 (0.921)
Segment 5
10/25/2023	0.214 (0.921)	0.308 (0.588)
10/26/2023	0.385 (0.300)	0.154 (0.999)
10/27/2023	0.143 (0.999)	0.231 (0.898)
10/28/2023	0.308 (0.588)	0.308 (0.588)
10/29/2023	0.154 (0.999)	0.385 (0.300)

Note: GTFS = General Transit Feed Specification Realtime; BL = Bluetooth; GT = Google Traffic.

Correlation and Error Analysis

Previous studies demonstrated a linear relationship between the average travel time of the automobile (ATT) and the average travel time of bus (BTT) ( 53 , 60 ). The linear model, expressed in Equation 6, can be used to estimate ATT based on BTT.

ATT = a + bBTT

(6)

where $a$ and $b$ are the calibrated model parameters.

To further validate the effectiveness of GTFS Realtime data in real-time urban traffic monitoring, 80% of the data were used as the training dataset to calibrate the Bluetooth-based travel time against GTFS Realtime–based travel time relationship (i.e., BL–GTFS), as well as the Bluetooth-based travel time against Google Traffic–based travel time relationship (i.e., BL–GT). Following model fitting, the remaining 20% of the data were employed for validation purposes. Root mean squared error (RMSE) and mean absolute percentage error (MAPE), defined in Equations 7 and 8, were the two chosen metrics measuring the accuracy of the models by comparing the predictions to the actual observations.

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({(t_{i, k}^{m})}^{'} - t_{i, k}^{m})}^{2}}

(7)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} (\frac{| {(t_{i, k}^{m})}^{'} - t_{i, k}^{m} |}{| t_{i, k}^{m} |})

(8)

where

$m$ is the dummy variable, indicating the index of data type, such as GTFS Realtime data, Bluetooth data and Google Traffic data,

$i$ is each single data point,0

$n$ is the total number of data type $m$ on segment $k$ ,

$t_{i, k}^{m}$ is the travel time of data point $i$ on segment $k$ of data type $m$ , and

$t_{i, k}^{m}'$ is the converted travel time from bus to normal based on Equation 6.

Table 6 shows the calibration and validation results. In general, linear relationships were observed with acceptable levels of correlation coefficient, with all values falling within a reasonable range from 0.384 to 0.658. By comparing the MAPE and RMSE metrics for BL–GTFS and BL–GT, it is evident that the errors for BL–GT were consistently greater than those for BL–GTFS in most cases, indicating the superiority of the GTFS Realtime-based method over the Google Traffic–based method. This comparison underscores the robust capability of GTFS Realtime data in capturing temporal variation, affirming its suitability as the dataset for real-time urban traffic network monitoring.

Table 6.

Results of Model Calibration and Validation

	Correlation Coefficient (R)		RMSE		MAPE (%)
Segment	BL–GTFS	BL–GT	BL–GTFS	BL–GT	BL–GTFS	BL–GT
Segment 1	0.658	0.611	17.224	33.228	8.242	12.268
Segment 2	0.384	0.389	21.985	22.344	11.823	14.796
Segment 3	0.588	0.549	20.705	32.397	10.081	15.269
Segment 4	0.413	0.405	33.228	35.784	12.364	17.113
Segment 5	0.481	0.487	22.669	29.853	15.569	14.696

Note: RMSE = root mean square error; MAPE = mean absolute percentage error; GTFS = General Transit Feed Specification Realtime; BL = Bluetooth; GT = Google Traffic.

Discussion and Conclusion

Large-scale deployment of sensors across transportation networks for real-time network traffic monitoring poses significant challenges, such as privacy concerns, high capital and maintenance costs, and limited coverage. In this study, a novel methodology based on GTFS Realtime data, an emerging real-time data source generated by public transit, for citywide network sensing to monitor real-time urban traffic conditions is introduced. Specifically, the proposed method uncovered the typical travel patterns of buses by isolating their operational events, such as boarding and alighting passengers at bus stops. Subsequently, two algorithms, the segment-trip extraction algorithm and the space mean speed estimation algorithm, were developed to implement the proposed methodology for space mean speed estimation.

A case study was conducted to assess the potential and effectiveness of GTFS Realtime data in capturing variations in link speed. Validation was carried out using Bluetooth data as the ground truth to verify the accuracy of the GTFS Realtime data in urban traffic network sensing, while Google Traffic data served as the benchmark for comparison. KS tests were employed to assess whether the speed variations of buses and regular traffic significantly differed from each other. Results indicated that GTFS Realtime data exhibited an overall similar variation pattern and consistent differences compared with Bluetooth data. Additionally, GTFS Realtime outputs were validated through regression analysis, correlation computation, and the validation set approach. The findings consistently demonstrated that GTFS Realtime data closely aligned with Bluetooth data, establishing them as a more reliable and practical traffic data source compared with Google Traffic.

GTFS stands out as a crucial asset in the evolving landscape of public transportation datasets. Its widespread adoption and standardization make it an invaluable resource for various stakeholders including departments of transportation (DOTs) and city councils. For entities such as DOTs and city councils, leveraging GTFS data offers extensive benefits in enhancing operational efficiency, service planning, and public engagement. By democratizing access to transit data, these organizations can embrace a more data-driven approach to decision making, directly affecting urban mobility and sustainability initiatives. Another notable advantage of GTFS Realtime data is their compatibility with other data sources. For example, combining GTFS with GPS data can offer real-time transit tracking, enriching passenger information systems ( 61 ). Additionally, integration with various transportation-related datasets empowers more streamlined route planning and service adjustments ( 62 ). Thus, GTFS Realtime data are a promising data source with great potential for research and practical applications.

While this study has demonstrated the potential of GTFS Realtime data for urban traffic monitoring, it is essential to acknowledge several limitations. First, the temporal resolution of GTFS Realtime data can result in data gaps during periods when buses are not in operation on specific routes. For instance, if no buses are operating on a particular road during a certain time period, the GTFS Realtime data for that period will be missing. This limitation may affect the accuracy and continuity of traffic monitoring. Second, in addition to the Bluetooth and Google Traffic data, other emerging data sources, such as CV data, should be used to provide additional validation for the proposed approach. Third, other important urban macroscopic traffic quantities, such as traffic flow and travel density, should be estimated to comprehensively assess the suitability and feasibility of GTFS Realtime data for traffic monitoring applications.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Shangkun Jiang, Yuran Sun, Wai Wong, Xilei Zhao; data collection: Shangkun Jiang; analysis and interpretation of results: Shangkun Jiang, Wai Wong, Yiming Xu; draft manuscript preparation: Shangkun Jiang, Yuran Sun, Wai Wong, Yiming Xu, Xilei Zhao. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the U.S. Department of Transportation (U.S. DOT) through the Tier 1 University Transportation Center, the Center for Equitable Transit-Oriented Communities (CETOC), under Grant 69A3552348337.

ORCID iDs

Yuran Sun

Wai Wong

Yiming Xu

Xilei Zhao

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of U.S. DOT.

References

Fan

Guthrie

Levinson

Waiting Time Perceptions at Transit Stops and Stations: Effects of Basic Amenities, Gender, and Security. Transportation Research Part A: Policy and Practice, Vol. 88, 2016, pp. 251–264.

Wang

Yang

Liang

Liu

A Review of the Self-Adaptive Traffic Signal Control System Based on Future Traffic Environment. Journal of Advanced Transportation, Vol. 27, 2018, pp. 1–12.

Wong

S. C.

Evaluation of the Impact of Traffic Incidents Using GPS Data. Proceedings of the Institution of Civil Engineers-Transport, Vol. 169, No. 3, 2016, pp. 148–162.

Farradyne

P. B.

Traffic Incident Management Handbook. Prepared for Federal Highway Administration, Office of Travel Management, Washington, D.C., 2000.

Wong

S. C.

Unbiased Estimation Methods of Nonlinear Transport Models Based on Linearly Projected Data. Transportation Science, Vol. 53, No. 3, 2019, pp. 665–682.

Wong

S. C.

Liu

H. X.

Bootstrap Standard Error Estimations of Nonlinear Transport Models Based on Linearly Projected Data. Transportmetrica A: Transport Science Vol. 15, No. 2, 2019, pp. 602–630.

Prommaharaj

Phithakkitnukoon

Demissie

M. G.

Kattan

Ratti

Visualizing Public Transit System Operation with GTFS Data: A Case Study of Calgary, Canada. Heliyon. Vol. 6, No. 4, 2020, pp. 1–16.

Aemmer

Ranjbari

MacKenzie

Measurement and Classification of Transit Delays Using GTFS-RT Data. Public Transport, Vol. 14, No. 2, 2022, pp. 263–285.

Shi

Abdel-Aty

Big Data Applications in Real-Time Traffic Operation and Safety Monitoring and Improvement on Urban Expressways. Transportation Research Part C: Emerging Technologies, Vol. 58, 2015, pp. 380–394.

10.

Martin

P. T.

Feng

Wang

Detector Technology Evaluation. Mountain-Plains Consortium, Fargo, ND, 2003.

11.

Jiang

Jin

C. J.

Microscopic Simulations of Traffic Congestion in Runyang Bridge: Comparisons Between Two Cases. In Sixth International Conference on Traffic Engineering and Transportation System (ICTETS 2022), SPIE, Guangzhou, China, Vol. 12591, 2023, pp. 865–870.

12.

Ashwini

B. P.

Sumathi

Data Sources for Urban Traffic Prediction: A Review on Classification, Comparison and Technologies. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), IEEE, 2020, pp. 628–635.

13.

Ali

S. S.

George

Vanajakshi

Venkatraman

A Multiple Inductive Loop Vehicle Detection System for Heterogeneous and Lane-Less Traffic. IEEE Transactions on Instrumentation and Measurement, Vol. 61, No. 5, 2011, pp. 1353–1360.

14.

Weijermars

W. A.

Van Berkum

E. C.

Detection of Invalid Loop Detector Data in Urban Areas. Transportation Research Record: Journal of the Transportation Research Board, 2006. 1945: 82–88.

15.

Sen

Maurya

Raman

Mehta

Kalyanaraman

Vankadhara

Roy

Sharma

Kyun Queue: A Sensor Network System to Monitor Road Traffic Queues. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems, 2012, pp. 127–140.

16.

Wang

Nihan

N. L.

Freeway traffic speed estimation with single-loop outputs. Transportation Research Record: Journal of the Transportation Research Board, 2000. 1727: 120–126.

17.

Bachmann

Roorda

M. J.

Abdulhai

Moshiri

Fusing a Bluetooth Traffic Monitoring System with Loop Detector Data for Improved Freeway Traffic Speed Estimation. Journal of Intelligent Transportation Systems, Vol. 17, No. 2, 2013, pp. 152–164.

18.

Wang

Nihan

N. L.

Can Single-Loop Detectors Do the Work of Dual-Loop Detectors?. Journal of Transportation Engineering, Vol. 129, No. 2, 2003, pp. 169–176.

19.

Robert

Video-Based Traffic Monitoring at Day and Night Vehicle Features Detection Tracking. In 2009 12th International IEEE Conference on Intelligent Transportation Systems, IEEE, 2009, pp. 1–6.

20.

Khazukov

Shepelev

Karpeta

Shabiev

Slobodin

Charbadze

Alferova

Real-Time Monitoring of Traffic Parameters. Journal of Big Data, Vol. 7, 2020, pp. 1–20.

21.

Kondyli

St. George

Elefteriadou

Comparison of Travel Time Measurement Methods Along Freeway and Arterial Facilities. Transportation Letters, Vol. 10, No. 4, 2018, pp. 215–228.

22.

Fang

Meng

Zhang

Wang

A Low-Cost Vehicle Detection and Classification System Based on Unmodulated Continuous-Wave Radar. In 2007 IEEE Intelligent Transportation Systems Conference, Seattle, Washington, IEEE, 2007, pp. 715–720.

23.

Zhuang

Wang

Edge-Based Traffic Flow Data Collection Method Using Onboard Monocular Camera. Journal of Transportation Engineering, Part A: Systems, Vol. 146, No. 9, 2020, 04020096.

24.

Kianpisheh

Mustaffa

Limtrairut

Keikhosrokiani

Smart Parking System (SPS) Architecture Using Ultrasonic Detector. International Journal of Software Engineering and Its Applications, Vol. 6, No. 3, 2012, pp. 55–58.

25.

Bar-Gera

Evaluation of a Cellular Phone-Based System for Measurements of Traffic Speeds and Travel Times: A Case Study from Israel. Transportation Research Part C: Emerging Technologies, Vol. 15, No. 6, 2007, pp. 380–391.

26.

Herring

Hofleitner

Abbeel

Bayen

Estimating Arterial Traffic Conditions Using Sparse Probe Data. In 13th International IEEE Conference on Intelligent Transportation Systems, IEEE, 2010, pp. 929–936.

27.

Wong

Shen

Zhao

Liu

H. X.

On the Estimation of Connected Vehicle Penetration Rate Based on Single-Source Connected Vehicle Data. Transportation Research Part B: Methodological, Vol. 126, 2019, pp. 169–191.

28.

Jia

Wong

S. C.

Wong

Uncertainty Estimation of Connected Vehicle Penetration Rate. Transportation Science, Vol. 57, No. 5, 2023, pp. 1160–1176.

29.

Brockfeld

Wagner

Lorkowski

Mieth

Benefits and Limits of Recent Floating Car Data Technology–An Evaluation Study. CDROM-WCTR2007, 2007, p.C2-830.

30.

Zhuang

Wang

Edge-Based Traffic Flow Data Collection Method Using Onboard Monocular Camera. Journal of Transportation Engineering, Part A: Systems, Vol. 146, No. 9, 2020, p. 04020096.

31.

Jain

N. K.

Saini

R. K.

Mittal

A Review on Traffic Monitoring System Techniques. In Soft Computing: Theories and Applications: Proceedings of SoCTA 2017, Bundelkhand University, Jhansi, Uttar Pradesh, India, 2019, pp. 569–577.

32.

Zhao

Wong

Zheng

Liu

H. X.

Maximum Likelihood Estimation of Probe Vehicle Penetration Rates and Queue Length Distributions from Probe Vehicle Data. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 7, 2021, pp. 7628–7636.

33.

Guo

Ban

X. J.

Urban Traffic Signal Control with Connected and Automated Vehicles: A Survey. Transportation Research Part C: Emerging Technologies, Vol. 101, 2019, pp. 313–334.

34.

Hofleitner

Herring

Abbeel

Bayen

Learning the Dynamics of Arterial Traffic from Probe Data Using a Dynamic Bayesian Network. IEEE Transactions on Intelligent Transportation Systems, Vol. 13, No. 4, 2012, pp. 1679–1693.

35.

Ramezani

Geroliminis

On the Estimation of Arterial Route Travel Time Distribution with Markov Chains. Transportation Research Part B: Methodological, Vol. 46, No. 10, 2012, pp. 1576–1590.

36.

Zheng

Van Zuylen

Urban Link Travel Time Estimation Based on Sparse Probe Vehicle Data. Transportation Research Part C: Emerging Technologies, Vol. 31, 2013, pp. 145–157.

37.

Jenelius

Koutsopoulos

H. N.

Urban Network Travel Time Prediction Based on a Probabilistic Principal Component Analysis Model of Probe Data. IEEE Transactions on Intelligent Transportation Systems, Vol. 19, No. 2, 2017, pp. 436–445.

38.

Zheng

Liu

H. X.

Estimating Traffic Volumes for Signalized Intersections Using Connected Vehicle Data. Transportation Research Part C: Emerging Technologies, Vol. 79, 2017, pp. 347–362.

39.

Zhao

Zheng

Wong

Wang

Meng

Liu

H. X.

Various Methods for Queue Length and Traffic Volume Estimation Using Probe Vehicle Trajectories. Transportation Research Part C: Emerging Technologies, Vol. 107, 2019, pp. 70–91.

40.

Comert

Cetin

Queue Length Estimation from Probe Vehicle Location and the Impacts of Sample Size. European Journal of Operational Research, Vol. 197, No. 1, 2009, pp. 196–202.

41.

Zhao

Zheng

Wong

Wang

Meng

Liu

H.X.

Estimation of Queue Lengths, Probe Vehicle Penetration Rates, and Traffic Volumes at Signalized Intersections Using Probe Vehicle Trajectories. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 660–670.

42.

Wang

Work

D. B.

Sowers

Multiple Model Particle Filter for Traffic Estimation and Incident Detection. IEEE Transactions on Intelligent Transportation Systems, Vol. 17, No. 12, 2016, pp. 3461–3470.

43.

Yang

Jiang

Liao

Wang

Evaluation of Car-Following Model for Inland Vessel-Following Behavior. Ocean Engineering, Vol. 284, 2023, p. 115196.

44.

Xie

van Lint

Verbraeck

A Generic Data Assimilation Framework for Vehicle Trajectory Reconstruction on Signalized Urban Arterials Using Particle Filters. Transportation Research Part C: Emerging Technologies, Vol. 92, 2018, pp. 364–391.

45.

Jiang

Chen

X. M.

Ouyang

Traffic State and Emission Estimation for Urban Expressways Based on Heterogeneous Data. Transportation Research Part D: Transport and Environment, Vol. 53, 2017, pp. 440–453.

46.

Wang

Chen

Min

Chen

Urban DAS Data Processing and Its Preliminary Application to City Traffic Monitoring. Sensors, Vol. 22, No. 24, 2022, pp. 9976.

47.

Sarrab

Pulparambil

Awadalla

Development of an IoT Based Real-Time Traffic Monitoring System for City Governance. Global Transitions, Vol. 2, 2020, pp. 230–245.

48.

Clemente

R. D.

González

M. C.

Understanding Vehicular Routing Behavior with Location-Based Service Data. EPJ Data Science, Vol. 10, No. 1, 2021, pp. 1–7.

49.

Zhou

Zheng

How Long to Wait? Predicting Bus Arrival Time with Mobile Phone Based Participatory Sensing. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, Association for Computing Machinery, New York, NY, 2012, pp. 379–392.

50.

Demissie

M. G.

Phithakkitnukoon

Sukhvibul

Antunes

Gomes

Bento

Inferring Passenger Travel Demand to Improve Urban Mobility in Developing Countries Using Cell Phone Data: A Case Study of Senegal. IEEE Transactions on Intelligent Transportation Systems, Vol. 17, No. 9, 2016, pp. 2466–2478.

51.

Carli

Dotoli

Epicoco

Angelico

Vinciullo

Automated Evaluation of Urban Traffic Congestion Using Bus as a Probe. In 2015 IEEE International Conference on Automation Science and Engineering (CASE), Gothenburg, Sweden, IEEE, 2015, pp. 967–972.

52.

Zhou

Jiang

Urban Traffic Monitoring with the Help of Bus Riders. In 2015 IEEE 35th International Conference on Distributed Computing Systems, Columbus, OH, IEEE, 2015, pp. 21–30.

53.

Wessel

Widener

M. J.

Discovering the Space–Time Dimensions of Schedule Padding and Delay from GTFS and Real-Time Transit Data. Journal of Geographical Systems, Vol. 19, 2017, pp. 93–107.

54.

Petersen

N. C.

Rodrigues

Pereira

F. C.

Multi-Output Bus Travel Time Prediction with Convolutional LSTM Neural Network. Expert Systems with Applications, Vol. 120, 2019, pp. 426–435.

55.

Lin

Long

Real-Time Estimation of Urban Street Segment Travel Time Using Buses as Speed Probes. Transportation Research Record: Journal of the Transportation Research Board, 2009. 2129: 81–89.

56.

Shu

Huang

H. Y.

Luo

P. E.

M. Y.

Performance Evaluation of Vehicle-Based Mobile Sensor Networks for Traffic Monitoring. IEEE Transactions on Vehicular Technology, Vol. 58, No. 4, 2008, pp. 1647–1653.

57.

Liu

Porr

Miller

H. J.

Realizable Accessibility: Evaluating the Reliability of Public Transit Accessibility Using High-Resolution Real-Time Data. Journal of Geographical Systems, Vol. 25, No. 3, 2023, pp. 429–451.

58.

Kirchner

Schubert

Haas

C. T.

Characterisation of Real-World Bus Acceleration and Deceleration Signals. Journal of Signal and Information Processing,Vol. 2014, 2014, pp. 8–13.

59.

Sun

Tirachini

Axhausen

K. W.

Erath

Lee

D. H.

Models of Bus Boarding and Alighting Dynamics. Transportation Research Part A: Policy and Practice, Vol. 69, 2014, pp. 447–460.

60.

Chakroborty

Kikuchi

Using Bus Travel Time Data to Estimate Travel Times on Urban Corridors. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1870. 18–25.

61.

Antrim

Barbeau

S. J.

The Many Uses of GTFS Data–Opening the Door to Transit and Multimodal Applications. In Location-Aware Information Systems Laboratory at the University of South Florida, University of South Florida, Tampa, FL, Vol. 4, 2013.

62.

Kujala

Weckström

Darst

R. K.

Mladenović

M. N.

Saramäki

A Collection of Public Transport Network Data Sets for 25 Cities. Scientific Data, Vol. 5, No. 1, 2018, pp. 1–4.

Real-Time Urban Traffic Monitoring Using Transit Buses as Probes

Abstract

Keywords

Literature Review

On-Road Fixed Detectors

Mobile Data Sources

Summary

Data

GTFS Static Data

GTFS Realtime Data

Methodology

Segment-Trip Extraction

Travel Time Estimation

Space Mean Speed Estimation

Case Study

Study Area

Data Acquisition

Hypothesis Testing for Speed Variation Pattern

Correlation and Error Analysis

Discussion and Conclusion

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References