Sage Journals: Discover world-class research

Abstract

Driving cycles are a set of driving conditions and are crucial for the existing emission estimation model to evaluate vehicle performance, fuel efficiency, and emissions, by matching them with average speed to calculate the operating modes, such as braking, idling, and cruising. Although existing emission estimation models, such as the Motor Vehicle Emission Simulator (MOVES), are powerful tools, their reliance on predefined driving cycles can be limiting, as these cycles often do not accurately represent regional driving conditions, making the models less effective for city-wide analyses. To solve this problem, this paper proposes a modular neural network-based framework to estimate operating mode distributions bypassing the driving cycle development phase, utilizing macroscopic variables such as speed, flow, and link infrastructure attributes. The proposed method is validated using a well-calibrated microsimulation model of Brookline MA, the United States. The results indicate that the proposed framework outperforms the operating mode distribution calculated by MOVES based on default driving cycles, providing a closer match to the actual operating mode distribution derived from trajectory data obtained from the microsimulation. Specifically, the proposed model achieves an average RMSE of 0.04 in predicting operating mode distribution, compared with 0.08 for MOVES. The average error in emission estimation across pollutants is 8.57% for the proposed method, lower than the 32.86% error for MOVES. In particular, for the estimation of CO₂, the proposed method has an error of just 4%, compared with 35% for MOVES. The proposed model can be utilized for real-time emissions monitoring by providing rapid and accurate emissions estimates with easily accessible inputs.

Keywords

traffic emissions operating mode distribution EPA MOVES modular neural networks traffic simulation

In 2021, the transportation sector in the U.S. accounted for 67% of the country’s total petroleum consumption, with light-duty vehicles responsible for 63% of this usage ( 1 ). The extensive use of petroleum contributes significantly to greenhouse gas (GHG) emissions, with the transportation sector responsible for 29% of total U.S. GHG emissions in 2022, making it the largest contributor of direct emissions ( 2 ). Given the adverse environmental, social, and economic impacts of transportation-related emissions, researchers and practitioners have been intensively working to quantify these emissions.

Vehicle emissions are influenced by various factors, including driving style, traffic congestion, traffic control devices, vehicle performance, fuel quality, and ambient operating conditions ( 3 – 5 ). Therefore, existing emission models have incorporated a range of different variables to better reflect the impact of traffic conditions on emissions ( 6 , 7 ). The current emission modeling system comprises several models developed to estimate traffic emissions. These models can be broadly categorized into two types: fuel-based and travel-based ( 8 ). Fuel-based models directly use fuel consumption data, which is available from tax records, to estimate GHG based on emission factors expressed in grams per unit of fuel consumed. A notable example of the fuel-based models is the Computer Programme to Calculate Emissions from Road Transport (COPERT), developed by the European Environment Agency (EEA) ( 9 ), whereas travel-based emission models combine emission factors for specific regions with travel data to generate emission inventories. These models use emission factors expressed in emissions per unit of driving activity, which can be obtained through dynamometer tests or on-road emission testing.

In the U.S., two primary transportation emission models are currently in use: MOtor Vehicle Emission Simulator (MOVES), developed by the U.S. Environmental Protection Agency (EPA), and EMFAC, developed by the California Air Resources Board (CARB). These models estimate emissions using emission factors expressed as grams of emission per unit of driving activity, primarily based on traditional dynamometer tests of predefined driving cycles. As of July 29, 2024, MOVES4 is the latest version and is used by EPA for State Implementation Plans and transportation conformity analyses outside California ( 10 ).

Driving cycles, also known as driving schedules, are used in emission models for calculation of emissions, certification, and testing of new vehicles and engines ( 11 ). A driving cycle includes data points representing vehicle velocity at various times, reflecting real-world driving scenarios to assess vehicle performance metrics such as emissions and fuel economy. Driving cycles are categorized into modal and transient types: modal driving cycles involve constant acceleration and speed phases, whereas transient driving cycles feature frequent and dynamic changes in velocity ( 12 ).

In the U.S., the National Renewable Energy Laboratory (NREL) has advanced driving cycle development with its DRIVE (Drive-Cycle Rapid Investigation, Visualization, and Evaluation) tool ( 13 , 14 ). This tool utilizes Global Positioning System (GPS) and controller area network data to create custom driving cycles based on real-world activity. NREL also offers tools like DriveCAT and the Fleet DNA repository, which provide valuable insights for overcoming technical barriers and enhancing transportation technologies ( 15 – 19 ). Much research has been conducted to improve driving cycle accuracy. Zhang et al. analyzed start and idling activities to refine emission estimates in MOVES using the FleetDNA and CE-CERT databases, highlighting the need for specific data collection by fleet type ( 20 , 21 ). Ivanič developed driving cycles for residential refuse trucks in New York using thirty-three parameters, such as minimum, average, maximum, and standard deviation of speed and acceleration, and so forth ( 22 ). Shi et al. employed a chase car method and specified twelve parameters, including average road power ( 23 ), while Kamble et al. used five velocity and acceleration parameters ( 24 ). Lai et al. included average road resistance among ten parameters for city-specific bus driving cycles ( 25 ). Other studies, such as those by Galgamuwa et al. ( 26 ), Badusha and Ghosh ( 27 ), and Nesamani and Subramanian ( 28 ), have explored various methods and parameters for driving cycle development, with some suggesting weighted factors for different parameters. Kondaru et al. introduced weighing factors for different parameters according to their importance to develop real-world driving cycle ( 12 ). However, their study does not propose a proper methodology for determining the weighting factors.

Worldwide, several driving cycles have been established, including the Japanese Cycle (JC08) ( 29 ), Federal Test Procedure (FTP-75) ( 30 ), New European Driving Cycle (NEDC) ( 31 ), and Worldwide harmonized Light duty driving Test Cycle (WLTC) ( 13 ), CARB unified (LA92) cycle ( 32 ), and cycles for cities such as Athens, Melbourne, and Beijing ( 33 ). NEDC includes the urban drive and high-speed motorway drive sub-cycles ( 12 ). WLTC, developed by the United Nations Economic Commission for Europe, is the latest and aims to closely resemble real-world driving scenarios worldwide, though it does not account for regional variations. Country-specific or regional driving cycles can provide more accurate vehicle performance predictions in certain areas ( 12 , 13 ). A common approach to develop driving cycles involves selecting microtrips that best represent speed–time data traces. Microtrips are segments of data bounded by idle modes. The LA01 cycle uses a Monte Carlo simulation approach and Markov process theory to describe actual driving processes, creating cycles that match target Speed-Acceleration-Frequency Distributions ( 34 ). The quality of developed driving cycles depends on the selection of performance measures and the development method used. Most methodologies employ random or quasi-random methods for selecting microtrips ( 35 ).

Developing representative driving cycles is inherently challenging because of the need to capture the diverse and dynamic nature of real-world driving behaviors. These behaviors vary widely according to geography, traffic conditions, driver habits, and vehicle types. The penetration rate of those data is usually low because of the small number of dedicated vehicles used for data collection. Most developed local driving cycles suffer from small sample sizes collected from a few vehicles over short periods, making it difficult to represent all driving conditions accurately.

Studies suggest that standardized driving cycles often fail to distinguish between separate phases of urban roads, rural roads, and motorways, leading to inaccuracies in emissions estimation ( 33 , 36 ). Additionally, drive cycle development requires several parameters and different studies use varying numbers of parameters with different weights, complicating the process further. There is no standardized method for weighing these parameters, adding another layer of complexity to developing accurate and representative driving cycles. EPA’s MOVES model uses forty-nine drive cycles to represent all driving conditions and vehicle types. At the network level, emissions are estimated based on the link-averaged speed, which is critical in choosing the driving cycle that matches the closest average speed. Based on selected driving cycles, MOVES calculates speed and vehicle specific power (VSP), assigns operating modes based on speed and VSP ranges, and then computes the fraction of time spent in each mode. The operating mode fractions are adjusted to account for the difference between the link’s speed and the driving cycle speed through interpolation. This whole process may take longer running times (e.g., several days) to compute emissions in a traffic network on a city-wide scale ( 37 ). However, this approach does not account for link features such as speed limit, lanes and traffic control, and so forth. Consequently, using default driving cycles to estimate the operating mode distributions can be misleading, and developing local driving cycles remains challenging.

Given the challenges and limitations associated with traditional driving cycles and an activity-based model such as MOVES, there is a need for simpler and efficient models that rely on macroscopic traffic variables and network features. By developing models that relate operating mode distributions to easily accessible data, such as average speed, traffic volume, free flow speed, number of lanes, and intersection types, the emission estimation process can be simplified. Such models would provide more accurate and reliable emissions estimates while being easier to implement and less data-intensive, making them highly valuable for researchers and policymakers. Li et al. were the first to estimate operating mode distributions directly using macroscopic variables by building models between operating mode distributions and average speed to facilitate emission estimation ( 6 ). They found that arterials and collectors have different operating mode distributions even at the same average speed. However, their model only considers average speed and ignores infrastructure-related features that affect driving patterns.

To fill the gap, this study develops a methodology to directly estimate the city-wide operating mode distributions of traffic links in a traffic network. This estimation, using a modular neural network (MNN), leverages macroscopic traffic variables and link infrastructure features, providing a more efficient and potentially more accurate approach compared with traditional methods.

Currently, the typical application of MOVES for estimating emissions in an area involves using average link speeds to match default driving cycles, which are then used to calculate operating mode distributions and ultimately determine emissions.

The proposed approach suggests the use of easily accessible infrastructure and loop detector data to estimate the most appropriate operating mode distributions. The emission estimation process using MOVES and the proposed model are shown in Figure 1. MOVES Function-1 takes the average speed as input and selects the appropriate driving cycles from its default database. Function-2 processes the selected driving cycles to calculate operating mode distributions through VSP and speed bins. MOVES Function-3 processes the operating mode distribution and all other inputs and calculates the final emissions.

Figure 1.

Calculating emissions using MOVES and the proposed model. The proposed model learns traffic dynamics from aggregated traffic and infrastructure data to directly infer accurate operating mode distributions in significantly less runtime, whereas MOVES uses infrastructure features not related to operating modes and bases its operating modes on drive cycles selected by speed.

To the best of our knowledge, this is the first study to use a machine learning model to estimate the distribution of all twenty-three operating modes in MOVES for every link in a city-wide area. The proposed approach has the potential to support traffic emissions analysis for research and decision making at various levels. The contributions of this paper are multifold. First, it develops an MNN designed to directly estimate the operating mode distribution of traffic links leveraging macroscopic traffic variables and link infrastructure features, providing an efficient and accurate method compared with traditional techniques. Second, it presents a framework for estimating traffic emissions on a city-wide scale. This framework is designed to operate efficiently within a reasonable runtime without compromising accuracy, thus addressing the limitations of existing methodologies.

In the rest of the paper, we present the proposed approach, followed by a case study to apply and validate the proposed model followed by conclusions with future work recommendations.

Approach

The critical step in the traffic emissions estimation process is to infer the appropriate distribution of the operating modes in the given area from the typical traffic and network data. For this, we use an MNN to relate traffic characteristics. The approach, in addition to speed, uses other variables such as number of lanes, speed limit, traffic volume, road class, traffic control type, and so forth. However, and for the purpose of this study, to validate the approach we use a microscopic traffic simulation model to obtain detailed trajectories (and thus driving cycles) that allow us to estimate emissions accurately with MOVES and establish the ground truth.

The methodological framework is shown in Figure 2. The framework for estimating operating mode distribution involves a traffic simulation model, input data, and an MNN. The traffic simulation model utilizes comprehensive traffic network data such as loop detector data, Origin–Destination (OD) data, traffic signal plans, and various road attributes like lanes, link length, speed limit, road class, and intersection traffic control. The simulation model generates trajectory data, which are processed to create the operating mode distributions. The operating mode distribution, aggregated traffic data, and infrastructure features are used as training data for MNN. The simulation data are needed solely for the initial training of the model. Once trained, the model requires only the easily accessible aggregated loop detector and network infrastructure data for subsequent use. The MNN network includes a shared layer followed by specialized layers for different speed categories: braking/idling, low speed, moderate speed, and high speed. The outputs from these layers are combined using a softmax function to produce the final operating mode distribution, which is passed to the corresponding MOVES module to estimate the traffic emissions.

Figure 2.

Methodological framework of the proposed model.

Artificial neural networks (ANNs) are powerful tools for modeling complex, nonlinear systems and have demonstrated strong performance in diverse fields, including environmental monitoring, transportation analytics, and dynamic system modeling ( 38 , 39 ). Their flexibility enables them to capture high-dimensional patterns and learn representations from noisy or incomplete data ( 40 ). Building on this foundation, an MNN architecture is proposed to model the operating mode distribution using infrastructure, average speed, and volume data. MNN employs multiple ANNs as individual modules, each responsible for different aspects of the problem. The system operates by dividing the problem into smaller subproblems and assigning each subproblem to a different module. The results from these modules are then combined to produce the final output of the entire system ( 41 ). MNNs are particularly suited for problems with features common to several groups, as they allow for both shared learning and specialized processing ( 42 ).

In the context of MOVES, vehicle operating modes are categorized based on speed and VSP bins. Speed bins group operating modes into major categories such as idling, low speed, moderate speed, and high speed. Each major category, defined by a common speed, is handled by a separate module within the proposed architecture.

The detailed architecture of MNN is shown in Figure 3. The architecture includes an input layer, two shared layers, four specialized modules, an integrator, and an output layer. The vector $x = [x_{1}, x_{2}, \dots, x_{n}]$ represents the input vector and $h_{m}^{k}$ represents the $k^{th}$ hidden feature in layer $m$ . ${\hat{y}}_{i}$ indicates the estimated fraction of operating mode $i$ . The input layer passes the inputs to the shared layers. The input vector consists of thirteen features, including six numeric and three categorical features. The categorical features are one-hot encoded, resulting in thirteen features. The shared layers are responsible for extracting common features from the input data. The first layer is a fully connected hidden layer with 128 neurons, which processes the input data using the Rectified Linear Unit (ReLU) activation function to capture nonlinear relationships. The output from the first layer is then fed into a second fully connected layer with sixty-four neurons, which continues to refine these features using ReLU. The shared feature extraction serves as a foundation for the specialized modules.

Figure 3.

Modular neural network architecture with input nodes, hidden layers, and output layer.

Following the shared layers, the architecture employs four specialized modules to handle different operating modes. For the Braking (0) and Idling (1) modes, the data are processed through a fully connected layer with thirty-two neurons, followed by another fully connected layer with two neurons that outputs the probabilities for Bin 0 (B0) and Bin 1 (B1). These two modes are important for understanding vehicle behavior in urban traffic conditions where frequent stops and starts are common.

The low-speed module handles the prediction of operating modes 11 to 16, which correspond to low-speed conditions. In the low-speed module, the data are first processed by a fully connected layer with thirty-two neurons and then through another fully connected layer with six neurons, which predicts the probabilities for bins B11 to B16. Vehicles operating in B11–B16 operating modes are navigating through congested or low-speed areas.

The moderate-speed module is responsible for predicting the probabilities of operating modes 21 to 30, by processing the hidden features from the shared layer through a fully connected layer of thirty-two neurons followed by a layer with nine neurons. These moderate speeds are characteristic of smoother traffic flow conditions, such as those found on arterial roads. The high-speed module focuses on high-speed operating modes 33 to 40, which passes hidden features through a fully connected layer with thirty-two neurons and then a layer with six neurons that provides the probabilities for B33 to B40. The outputs from the specialized modules are integrated into a single prediction vector, which is then normalized using a softmax function to produce a probability distribution across all operating modes.

Case Study

To validate the proposed model, the paper uses Brookline, Massachusetts, as a case study. Brookline presents a diverse mix of urban and residential traffic patterns. Its road network includes a variety of street types, from major arteries to quiet residential roads. The town features a blend of dense commercial areas, residential neighborhoods, and university campuses. This land use mix results in a wide range of traffic scenarios with varying vehicle speeds. Brookline’s geographical and traffic characteristics make it a good candidate to apply and validate the proposed model. Additionally, the integration of approximately 100 air quality sensors within Brookline, as part of a larger project, provides extensive environmental data that enhance the scope of this research, enabling future studies to explore the interplay between traffic patterns and air quality. This infrastructure not only supports the immediate needs of this study but also lays the groundwork for subsequent investigations, thereby underscoring the suitability of Brookline as a model validation site.

Traffic Simulation

A comprehensive microsimulation model has been developed in TransModeler ( 43 ) for the City of Brookline, Massachusetts. The road geometry and lane information are obtained from OpenStreetMap (OSM) ( 44 ). Detailed information, such as lane width, the number of lanes at intersections, and lane connectivity are acquired from Google Street View (GSV) ( 45 ). The study area is divided into twenty-seven traffic analysis zones using MassDOT transportation planning data. Origin and destination were defined at each zone, as well as entry and exit points of the various roads that cross the boundaries for a total of 169 origins. Additionally, traffic signal plans of fifty-seven intersections and fixed loop detector data were obtained from the City of Brookline.

The network comprises 1814 links, each of which may consist of multiple segments to represent changes in road geometry accurately. Figure 4 displays the traffic simulation network of Brookline in TransModeler, highlighting the inclusion of almost every significant link. Red dots mark the locations where traffic count sensors are installed for calibration purposes. The simulation utilizes traffic input data from the morning peak hour (8 to 9 a.m.) to reflect the city’s traffic conditions during a critical period accurately.

Figure 4.

Model Calibration

The calibration process involves adjusting the model parameters to align simulated traffic patterns with observed data, particularly in the context of OD matrix estimation. Because of its importance, OD matrix estimation has been extensively studied by various researchers. Osorio studied dynamic OD matrix calibration for large-scale networks using simulation-based optimization ( 46 ). Tympakianaki et al. proposed a robust simultaneous perturbation stochastic approximation algorithm for dynamic OD matrix estimation ( 47 , 48 ). Toledo et al. presented methods for calibration and validation of microscopic models ( 49 , 50 ). Antoniou et al. presented calibration models and approaches for offline and online dynamic traffic assignment systems ( 51 – 53 ).

The calibration process is illustrated in Figure 5. A historical OD matrix (available from previous studies and planning models) is fed into the model, and the simulated traffic counts are then compared against the actual traffic counts obtained from loop detectors placed throughout the network. If the discrepancy between the simulated and actual traffic counts exceeds a predefined error threshold, the OD flows are updated accordingly, and the process is repeated. This iterative process continues until all simulated counts match the actual loop detector counts within an acceptable error threshold.

Figure 5.

Traffic simulation model origin–destination (OD) flow calibration.

The calibration process ensures that the model accurately reflects the current traffic conditions. Figure 6 shows the observed and simulated traffic counts at the sensor locations before and after the calibration. Although some discrepancies between simulated and observed data still exist, the remaining differences are acceptable for this specific application.

Figure 6.

Observed versus simulated traffic counts before and after the calibration.

Trajectory Data

Microscopic traffic simulation models output trajectory data for all vehicles during the entire simulation period at a high frequency (e.g., 1 Hz). Speed and acceleration are particularly important variables used to calculate VSP, a key input for determining the operating modes of vehicles at a detailed level. VSP is a measure of the power required by a vehicle to overcome various forces such as rolling resistance, aerodynamic drag, and inertia. By calculating VSP, vehicles can be classified into different operating modes, which are indicative of their driving behavior and energy consumption.

Operating Mode Distribution

Table 1 presents the operating mode bins as defined by the MOVES model, which categorizes vehicle operating modes based on VSP and speed. For instance, Bin 11 represents conditions where VSP is less than 0 and speed is between 1 mph and 25 mph, whereas Bin 35 represents conditions where VSP is between 6 and 12, and speed is greater than 50 mph. The braking Bin 0 refers to the condition when instantaneous deceleration is less than 2 mph/s, or the deceleration of continuous three-second data is less than 1 mph/s.

Table 1.

EPA MOVES Operating Mode Bins

Vehicle specific power (VSP)	$1 mph \leq Speed < 25 mph$	$25 mph \leq Speed < 50 mph$	$50 mph \leq Speed$
$0 > VSP$	Bin 11	Bin 21	Bin 33
$0 \leq VSP < 3$	Bin 12	Bin 22	NA
$3 \leq VSP < 6$	Bin 13	Bin 23	NA
$6 \leq VSP < 9$	Bin 14	Bin 24	Bin 35
$9 \leq VSP < 12$	Bin 15	Bin 25	NA
$12 \leq VSP < 18$	Bin 16	Bin 27	Bin 37
$18 \leq VSP < 24$	NA	Bin 28	Bin 38
$24 \leq VSP < 30$	NA	Bin 29	Bin 39
$30 \leq VSP$	NA	Bin 30	Bin 40
Braking	Bin 0	NA	NA
Idling	Bin 1	NA	NA

Note: NA = not available.

The instantaneous VSP at time $t$ is given by:

VS P_{t} = \frac{c_{1}}{c_{2}} \frac{A v_{t}}{m} + \frac{c_{1}^{2}}{c_{2}} \frac{{Bv}_{t}^{2}}{m} + \frac{c_{1}^{3}}{c_{2}} \frac{{Cv}_{t}^{3}}{m} + c_{1}^{2} v_{t} a_{t}

(1)

where $A$ is the rolling resistance coefficient ( $kWsec / m$ ); $B$ is the rotational resistance coefficient ( $kWse c^{2} / m^{2}$ ); $C$ is the aerodynamic drag coefficient ( $kWse c^{3} / m^{3}$ ); $m$ is the vehicle mass (lb); $v_{t}$ is the instantaneous velocity at time $t$ ( $mi / hr$ ); $a_{t}$ is the instantaneous acceleration at time $t$ ( $mi / hr / \sec$ ); $c_{1}$ is the conversion factor for speed, and $c_{2}$ is the conversion factor for vehicle weight.

The trajectory data are processed to calculate VSP for each time step, and operating modes are assigned based on the computed VSP values. The result is a detailed distribution of operating modes across all traffic segments.

Infrastructure Data

The infrastructure data provide a detailed representation of various parameters essential for traffic modeling and analysis. The road segments are characterized by multiple attributes, including the number of lanes, segment length, travel lanes, free flow speed, speed limit, road class, control type, and priority. These attributes collectively define the physical and regulatory environment of the segments, influencing vehicle behavior, route choices, and overall traffic dynamics.

The classification of roads, indicated by the class attribute, categorizes the segments into various types, including arterial, collector, access road, and local street. The control attribute defines the type of traffic control device present at the intersection of the segment’s end. A value of 0 indicates that there is no control device, and the segment continues onto another link. Other control types include actuated, pretimed, and roundabout, each representing different traffic signal operations or roundabout presence. Actuated signals adjust their phases based on real-time traffic conditions, whereas pretimed signals follow a fixed schedule. Priority codes are used to determine right-of-way between conflicting turning movements at intersections without explicit signals or signs.

Training

The simulation model was used to generate a training dataset. The resulting dataset was subsequently divided into training and testing subsets in an 80:20 ratio. The features within the data were normalized using the StandardScaler from the sklearn library in python, which standardizes features with zero mean and unit variance ( 54 ). To facilitate the training process, the training and testing datasets were converted into TensorDataset objects, and data loaders were employed to handle the data in mini-batches ( 55 ). Specifically, a batch size of thirty-two was utilized, meaning that thirty-two samples were processed before the model parameters were updated. The data loader ensures that during each epoch the training data are shuffled, thereby promoting better generalization by preventing the model from learning the order of the samples.

The model training used the mean square error (MSE) for the loss function, which is a common metric for regression tasks. An Adam optimizer ( 56 ) was employed for model optimization, with a learning rate set to 0.001. Adam optimizer is efficient and effective in training deep learning models. The model was trained for 500 epochs. The training of the proposed model was conducted on a system with a 13th Gen Intel(R) Core(TM) i7-1355U processor (1.70 GHz) and 16.0 GB of RAM. Figure 7 shows the training and test loss curves for the proposed model using MSE as a loss metric. Both curves exhibit a sharp decline in loss during the initial 200 epochs, indicating effective learning. However, beyond 200 epochs, the rate of decrease becomes less pronounced, with the training loss continuing to decrease steadily while the test loss shows only minimal improvement.

Figure 7.

Training and test loss of the proposed model.

Evaluation

The study compares emissions estimated using two distinct approaches to demonstrate the performance of the proposed model. First, ground truth emissions are established by using a microscopic traffic simulation model to obtain detailed vehicle trajectories. These trajectories are then processed to calculate the operating mode distributions, which are subsequently used to estimate emissions accurately using MOVES. Second, MOVES default methodology is utilized to estimate emissions, where link speeds are fed to MOVES and MOVES’s default approach is used to select driving cycles and operating mode distributions.

The model evaluation was performed on a dataset consisting of 2779 traffic segments, each represented by thirteen features. The computation was carried out on a system equipped with a 13th Gen Intel(R) Core(TM) i7-1355U processor (1.70 GHz) and 16.0 GB of RAM. The execution, conducted using the PyTorch library, completed in just 81.54 ms.

To evaluate the model performance, we use the root mean square error (RMSE) and $R^{2}$ score, from python’s sklearn library, of the results using the proposed MNN model and MOVES, in comparison with the ground truth emissions. The results show that the proposed model generally outperforms MOVES, particularly in bins with higher fractions of specific driving conditions.

Figure 8 shows the comparison of operating mode fractions using the proposed model and MOVES. In the figure, bins are selected from each of the four main modules of the proposed model to show their individual training and overall performance in the MNN. Each point in the plot represents the actual versus predicted operating mode fractions for a link. Figure 8a and b present the estimated operating mode fraction against the actual fraction for Bin 0 and Bin 1 (braking/idling module), respectively. Figure 8c and d show the estimated versus actual fractions for Bin 11 (low-speed module) and Bin 22 (moderate-speed module). Similarly, Figure 8e and f present the estimated versus actual fractions for Bin 30 and Bin 35 (high-speed module), respectively. The results show that bins with higher fractions result in better performance of the proposed model, as evidenced by the closer alignment of the estimated fractions with the actual data.

Figure 8.

Comparison of operating model fractions obtained using the proposed model and MOVES against ground truth. (a) Estimated versus true operating mode fractions of Bin 0. (b) Estimated versus true operating mode fractions of Bin 1. (c) Estimated versus true operating mode fractions of Bin 11. (d) Estimated versus true operating mode fractions of Bin 22. (e) Estimated versus true operating mode fractions of Bin 30. (f) Estimated versus true operating mode fractions of Bin 35.

The best performance is observed for operating mode B1, which represents stop-and-go traffic typically encountered during urban peak hours. It has the lowest RMSE (0.0447) and the highest $R^{2}$ score (0.9687) for the proposed model, indicating an accurate fit to the actual data. The default process in MOVES, although still reasonable in this Bin, has higher RMSE (0.1491) and a lower $R^{2}$ score (0.6516). The results indicate that the proposed model is able to capture the dynamics of stop-and-go traffic.

For bins such as B0, B11, and B22, the proposed model also shows better performance with lower RMSE values and higher $R^{2}$ scores compared with MOVES. Interestingly, some bins such as B13, B21, B24, and B30 exhibit negative $R^{2}$ scores for MOVES, which indicate that its predictions are worse than simply using the mean of the actual data ( 54 ). It demonstrates that MOVES, which relies on default driving cycles, fails to accurately represent the actual driving conditions for most of the operating modes. In contrast, the proposed model consistently shows better performance, though it is not always perfect for bins having low fractions.

Figure 9 illustrates the RMSE in the estimated fraction for each operating mode bin using the proposed model and MOVES. The green and orange bars represent the RMSE for the proposed model and MOVES, respectively. The results show that the proposed model consistently exhibits lower RMSE values across nearly all bins compared with MOVES. Operating modes associated with higher speeds, such as B39, are less common in urban areas. Consequently, these modes are under-represented in the training dataset, leading to a slightly higher RMSE when compared with MOVES. Augmenting the training dataset with additional data that increase the representation of such operating modes could improve the model performance.

Figure 9.

Root mean square error in estimated operating mode fraction using the proposed model and MOVES.

The operating mode distributions obtained from the two approaches are passed to the relevant module of MOVES to estimate the corresponding emissions. Figure 10 provides a comparison of emissions of different pollutants using the detailed trajectories (used as ground truth), the proposed model, and MOVES. The pollutants analyzed include hydrocarbons (HC), carbon monoxide (CO), nitrogen oxides (NOx), nitric oxide (NO), carbon dioxide (CO₂), and particulate matter (PM2.5). Each subplot in Figure 10 shows the emissions for a specific pollutant, with the bars representing the actual emissions in blue, the emissions estimated by the proposed model in green, and the emissions estimated by MOVES in orange.

Figure 10.

Comparison of total emissions during the peak hour (8–9 a.m.) in Brookline, estimated using: (i) second-by-second speed profiles derived from simulated trajectory data mapped to each link and processed with vehicle specific power-based MOVES approach (“actual”), (ii) the proposed model, and (iii) EPA MOVES default drive cycle-based approach using link average speeds. Note: The “actual” emissions are not measured or taken directly from a microsimulation platform; rather, they are computed by post-processing simulated trajectory data to generate detailed speed profiles, allowing MOVES to avoid default drive cycles.

For HC, the proposed model and MOVES both overestimate emissions compared with the actual data, but the proposed model’s estimate is closer to the actual value. In the case of CO, the proposed model overestimates emissions, whereas MOVES significantly underestimates them. For NOx and NO, both models overestimate emissions, with the proposed model providing a better estimate than MOVES. For ${CO}_{2}$ , both models overestimate ground truth emissions, but the proposed model’s estimate is more accurate than MOVES. Lastly, for PM2.5, the proposed model overestimates emissions, whereas MOVES underestimates them, with the proposed model providing a closer estimate to the actual value.

Figure 11 shows the percentage emission estimation error for each pollutant, for the two approaches. The green and orange bars represent the percentage error for the proposed model and MOVES, respectively. The results show that for all pollutants, the proposed model exhibits a lower percentage error compared with MOVES. For instance, the proposed model shows significantly lower errors for pollutants such as HC, NOx, and PM2.5. MOVES, on the other hand, consistently has higher percentage errors across the pollutants. The results indicate that emissions estimates based on the default operating modes may be erroneous, and in some cases underestimate important pollutants such as PM2.5.

Figure 11.

Percentage error in pollutant estimation using the proposed model and MOVES.

Conclusion

The study addresses the problem of developing representative driving cycles that reflect real-world driving behaviors. The study introduces a modular neural network-based approach that aims to estimate the operating mode distributions for a city-wide urban network as a function of easily accessible traffic and network features such as average speed, average volume, number of lanes, traffic control, and so forth. The study involved a comprehensive microsimulation of an urban traffic network using OD flow data, calibrated against sensor count data. The detailed traffic simulation framework underpinned the development of the NN-based model, which uses macroscopic traffic variables and link infrastructure features to learn the operating mode distributions.

The results from the proposed model show improvements over the traditional MOVES approach to approximate operating mode distribution from just average speed. For instance, the RMSE and $R^{2}$ score highlighted the better performance of the proposed model in almost all operating mode bins. Notably, operating mode Bin B1, which represents stop-and-go traffic in urban peak hour conditions, showed the best performance. It underscores the capability of the proposed model to handle complex urban traffic patterns effectively using easily available macroscopic traffic variables, traffic control devices, and infrastructure features. Moreover, several bins, such as B13, B21, B24, and B30, exhibited poor performance for MOVES, indicating its limitations, where default driving cycles fail to accurately represent specific local driving conditions. In contrast, the proposed model showed consistently better performance.

The analysis of emissions estimation validates the performance of the proposed approach, which provided closer estimates to the actual emissions across pollutants like HC, CO, NO, NO₂, CO₂, and PM2.5. The proposed model achieves an average RMSE of 0.04 in predicting operating mode distributions, compared with 0.08 for MOVES. Furthermore, the average error in emission estimation across pollutants is 8.57% for the proposed method, lower than the 32.86% error for MOVES. Notably, for CO₂ estimation, the proposed method has an error of just 4%, compared with 35% for MOVES.

The potential integration of the proposed approach with emerging technologies presents a promising avenue for enhancing real-time traffic management systems. Edge computing could facilitate real-time inference at the network edge, enabling adjustments to traffic flow based on dynamic operating mode predictions. This could improve traffic management in urban settings, reducing congestion and optimizing route choices in real time. Furthermore, coupling the proposed approach with air quality prediction models could provide a more comprehensive tool for urban planners and environmental agencies. By predicting emission hotspots in real time, city authorities could implement more targeted air quality improvement measures, such as dynamic congestion pricing or emission-aware routing. The proposed approach can also be extended to interdisciplinary applications, particularly in smart city ecosystems. For instance, the precise estimation of operating modes at the link level allows for the implementation of emission-aware routing algorithms. Such applications could inform drivers or autonomous vehicle systems of the optimal routes that minimize emissions, thereby contributing to lower urban air pollution.

The proposed framework offers an improvement in computational efficiency over traditional emission estimation methods, such as the widely used MOVES framework. Traditional models, particularly when applied at a city-wide scale, often entail substantial computation times that can extend to several days, which limits their applicability in scenarios requiring real-time analysis and decision making. By leveraging aggregated traffic data from field sensors, the proposed model bypasses the lengthy computation phases. This efficiency makes the model viable even for real-time evaluation, with potential applications in dynamic traffic management and policy interventions. These advancements hold promise for scalable and responsive emission monitoring solutions that can adapt to the complex and variable demands of city-wide traffic systems, enhancing both environmental management and urban planning strategies.

Although the proposed model demonstrates improvements in accuracy over the default MOVES procedure, this paper primarily serves as a proof of concept for the feasibility of a NN-based approach. Important future work will include testing the model in contrasting environments to evaluate its adaptability and determine the data and effort thresholds required for retraining or fine-tuning. Additionally, future research will focus on training the model using readily available real-world trajectory data, such as GPS traces from OSM and other open-source platforms, to reduce reliance on simulation and improve the model’s accuracy and generalizability. This will include refining the neural network architecture to better capture actual operating mode distributions. We will also investigate potential data gaps and develop strategies to address them, ensuring the model’s applicability to diverse urban and non-urban settings. The test and training loss curves suggest that there is room for improvement in model performance, and future efforts will also explore parameter optimization and architectural enhancements to ensure robust and consistent results across diverse regions. Also, future research will explore model robustness under varying data quality conditions, including sensor noise and missing values. By leveraging trajectory data, we aim to conduct detailed sensitivity analyses and implement mitigation strategies such as data imputation and regularization techniques. Future work will also include roadway classification-level analysis of operating mode distributions as the framework is expanded to multi-city and national-scale applications, where robust comparisons across arterial, collector, and local road types become statistically viable.

Footnotes

Acknowledgements

The authors are grateful for the support of Northeastern University. The Large Language Model ChatGPT 3.5/4o was used to improve grammar, phrasing, and clarity of the manuscript.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: M. Usama, H. Koutsopoulos; data collection: L. Wang; analysis and interpretation of results: M. Usama, H. Koutsopoulos, Z. He; draft manuscript preparation: M. Usama, Koutsopoulos, Z. He. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Zhengbing He is a member of Transportation Research Record’s Editorial Board. Apart from this, the authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been supported by the Northeastern University Impact Engine Program.

Website:

ORCID iDs

Muhammad Usama

Haris N. Koutsopoulos

Zhengbing He

Lijiao Wang

Any opinions, findings, conclusions, or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the funding agencies.

References

Davis

S. C.

Boundy

R. G.

Transportation Energy Data Book: Edition 40. Oak Ridge National Laboratory, TN, 2022.

U.S. Environmental Protection Agency. Sources of Greenhouse Gas Emissions. https://www.epa.gov/ghgemissions/sources-greenhouse-gas-emissions. Accessed July 23, 2024.

Yang

Deng

Huang

An Overview of Chassis Dynamometer in the Testing of Vehicle Emission. In MATEC Web of Conferences, Vol. 175, EDP Sciences, 2018, p. 02015.

Zhang

Jia

Estimating Carbon Dioxide Emissions of Freeway Traffic: A Spatiotemporal Cell-Based Model. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, 2020, pp. 1976–1986.

Usama

Shen

Zahoor

Towards an Energy Efficient Solution for Bike-Sharing Rebalancing Problems: A Battery Electric Vehicle Scenario. Energies, Vol. 12, No. 13, 2019, p. 2503.

Peng

Development of Operating Mode Distribution Models for Light-Duty Vehicles on Unrestricted Access Roadways. Transportation Research Record: Journal of the Transportation Research Board, 2020. 2674: 416–428.

Zahoor

Usama

Bao

Abbas

Shen

Chen

LNG Bus Emissions Prediction Using Neural Network. Proc., 19th COTA International Conference of Transportation Professionals, CICTP 2019, Nanjing, China, American Society of Civil Engineers, Reston, VA, 2019, pp. 4156–4168.

Zhang

Zhao

D.-J.

Shen

J.-M.

A Synthesis of Methodologies and Practices for Developing Driving Cycles. Energy Procedia, Vol. 16, 2012, pp. 1868–1873.

Ntziachristos

Gkatzoflias

Kouridis

Samaras

COPERT: A European Road Transport Emission Inventory Model. Proc., Information Technologies in Environmental Engineering: Proceedings of the 4th International ICSC Symposium, Thessaloniki, Greece, Springer, Berlin, Heidelberg, May 28–29, 2009, pp. 491–504.

10.

U.S. Environmental Protection Agency. MOVES4: Latest Version of Motor Vehicle Emission Simulator. https://www.epa.gov/moves/latest-version-motor-vehicle-emission-simulator-moves. Accessed July 23, 2024.

11.

Giakoumis

E. G.

Driving and Engine Cycles, Vol. 1. Springer, Cham, 2017.

12.

Kondaru

M. K.

Telikepalli

K. P.

Thimmalapura

S. V.

Pandey

N. K.

Generating a Real World Drive Cycle – A Statistical Approach. SAE Technical Paper 2018-01-0325. SAE International, 2018.

13.

Tutuianu

Bonnel

Ciuffo

Haniu

Ichikawa

Marotta

Pavlovic

Steven

Development of the Worldwide Harmonized Light Duty Test Cycle (WLTC) and a Possible Pathway for Its Introduction in the European Legislation. Transportation Research Part D: Transport and Environment, Vol. 40, 2015, pp. 61–75.

14.

National Renewable Energy Laboratory. DRIVE: Drive-Cycle Rapid Investigation, Visualization, and Evaluation Analysis Tool. https://www.nrel.gov/transportation/drive. Accessed July 23, 2024.

15.

Duran

Phillips

Perr-Sauer

Kelly

Konan

Leveraging Big Data Analysis Techniques for U.S. Vocational Vehicle Drive Cycle Characterization, Segmentation, and Development. SAE Technical Paper 2018-01-1199. SAE International, 2018.

16.

National Renewable Energy Laboratory. DriveCAT: Drive Cycle Analysis Tool. https://www.nrel.gov/transportation/drive-cycle-tool/. Accessed July 23, 2024.

17.

Walkowicz

Duran

Burton

Fleet DNA: Commercial Fleet Vehicle Operating Data. http://www.nrel.gov/fleetdna. Accessed July 23, 2024.

18.

Kotz

A. J.

Kelly

K. J.

MOVES Activity Updates Using Fleet DNA Data: Interim Report. National Renewable Energy Laboratory, Golden, CO, 2019.

19.

Walkowicz

Duran

Burton

Fleet DNA Project Data Summary Report. National Renewable Energy Laboratory, 2014. https://books.google.com/books?id=InFB0AEACAAJ.

20.

Zhang

Ficenec

Kotz

Kelly

Sonntag

Fulper

Brakora

Ballare

Heavy-Duty Vehicle Activity Updates for MOVES Using NREL Fleet DNA and CE-CERT Data. National Renewable Energy Laboratory, Golden, CO, 2022.

21.

Zhang

Ficenec

Kotz

Kelly

Sonntag

Fulper

Brakora

Ballare

Developing a Heavy-Duty Vehicle Activity Database to Estimate Start and Idle Emissions. Transportation Research Part D: Transport and Environment, Vol. 105, 2022, p. 103251.

22.

Ivanic

. Data Collection and Development of New York City Refuse Truck Duty Cycle. SAE Technical Paper 2007-01-4118. SAE International, 2007.

23.

Shi

Zheng

Wang

The Study of a New Method of Driving Cycles Construction. Procedia Engineering, Vol. 16, 2011, pp. 79–87.

24.

Kamble

S. H.

Mathew

T. V.

Sharma

G. K.

Development of Real-World Driving Cycle: Case Study of Pune, India. Transportation Research Part D: Transport and Environment, Vol. 14, No. 2, 2009, pp. 132–140.

25.

Lai

Song

Guo

Chen

Development of City-Specific Driving Cycles for Transit Buses Based on VSP Distributions: Case of Beijing. Journal of Transportation Engineering, Vol. 139, No. 7, 2013, pp. 749–757.

26.

Galgamuwa

Perera

Bandara

A Representative Driving Cycle for the Southern Expressway Compared to Existing Driving Cycles. Transportation in Developing Economies, Vol. 2, 2016, pp. 1–8.

27.

Badusha

A. A.

Ghosh

Driving Cycle for Buses in Delhi City. SAE Technical Paper 990036, 1999.

28.

Nesamani

K. S.

Subramanian

K. P.

Development of a Driving Cycle for Intra-City Buses in Chennai, India. Atmospheric Environment, Vol. 45, No. 31, 2011, pp. 5469–5476.

29.

Zheng

Zhou

Real-World Driving Cycles and Energy Consumption Informed by Large-Sized Vehicle Trajectory Data. Journal of Cleaner Production, Vol. 223, 2019, pp. 564–574.

30.

Trung

K. N.

T. T.

Study of Emissions and Fuel Economy for Series-Parallel HEVs on FTP-75 and US-Highway-Cycle Driving Cycles. Proc., Advances in Engineering Research and Application: Proceedings of ICERA 2018, Springer, Cham, 2019, pp. 545–552.

31.

Shim

B.-J.

Park

K.-S.

Koo

J.-M.

Jin

S.-H.

Work and Speed Based Engine Operation Condition Analysis for New European Driving Cycle (NEDC). Journal of Mechanical Science and Technology, Vol. 28, 2014, pp. 755–761.

32.

U.S. Environmental Protection Agency. LA92 “Unified” Dynamometer Driving Schedule. https://www.epa.gov/emission-standards-reference-guide/la92-unified-dynamometer-driving-schedule. Accessed July 23, 2024.

33.

Gebisa

Gebresenbet

Gopal

Nallamothu

R. B.

Driving Cycles for Estimating Vehicle Emission Levels and Energy Consumption. Future Transportation, Vol. 1, No. 3, 2021, pp. 615–638.

34.

Hung

W. T.

Tong

H. Y.

Lee

C. P.

Pao

L. Y.

Development of a Practical Driving Cycle Construction Methodology: A Case Study in Hong Kong. Transportation Research Part D: Transport and Environment, Vol. 12, No. 2, 2007, pp. 115–128.

35.

Quirama

L. F.

Giraldo

Huertas

J. I.

Tibaquirá

J. E.

Cordero-Moreno

Main Characteristic Parameters to Describe Driving Patterns and Construct Driving Cycles. Transportation Research Part D: Transport and Environment, Vol. 97, 2021, p. 102959.

36.

Zhao

Sun

Wang

Liu

Construction and Optimization of Representative Actual Driving Cycles Based on the Improved Autoencoder. Scientific Reports, Vol. 14, No. 1, 2024, p. 2351.

37.

U.S. Environmental Protection Agency. Motor Vehicle Emission Simulator: MOVES4. Version 4.0.1, 2024. https://github.com/USEPA/EPA_MOVES_Model/blob/master/docs/TipsForFasterMOVESRuns.pdf.

38.

Usama

Hart

Wojcik

Physics-Informed Neural Networks (PINNs)-Based Traffic State Estimation: An Application to Traffic Network. Algorithms, Vol. 15, No. 12, 2022, p. 447.

39.

Usama

Application of Machine Learning Techniques for Traffic State Estimation, Pattern Recognition, and Crash Detection. PhD dissertation. The University of Alabama in Huntsville, 2023.

40.

Ibrar

M. A.

Usama

Salman

A. M.

A Machine Learning Model for Detecting and Quantifying Tropical Cyclone Related Disturbance and Recovery in Estuaries. Scientific Reports, Vol. 15, 2025, p. 5230. https://doi.org/10.1038/s41598-025-89196-6.

41.

Shukla

Tiwari

Kala

Modular Neural Networks. In Towards Hybrid and Adaptive Computing: A Perspective ( A.

Shukla

Tiwari

Kala

, eds.), Springer, Berlin, Heidelberg, 2010, pp. 307–335.

42.

Murre

J. M. J.

Learning and Categorization in Modular Neural Networks. Psychology Press, New York, 2014.

43.

Yang

Slavin

High Fidelity, Wide Area Traffic Simulation Model. Caliper Corporation, Boston, MA, 2002.

44.

OpenStreetMap Contributors. Planet Dump. 2017. https://planet.openstreetmap.org/.

45.

Google. Google Street View Image API. https://developers.google.com/maps/documentation/streetview/. Accessed July 23, 2024.

46.

Osorio

Dynamic Origin-Destination Matrix Calibration for Large-Scale Network Simulators. Transportation Research Part C: Emerging Technologies, Vol. 98, 2019, pp. 186–206.

47.

Tympakianaki

Koutsopoulos

H. N.

Jenelius

Robust SPSA Algorithms for Dynamic OD Matrix Estimation. Procedia Computer Science, Vol. 130, 2018, pp. 57–64.

48.

Tympakianaki

Koutsopoulos

H. N.

Jenelius

C-SPSA: Cluster-Wise Simultaneous Perturbation Stochastic Approximation Algorithm and Its Application to Dynamic Origin–Destination Matrix Estimation. Transportation Research Part C: Emerging Technologies, Vol. 55, 2015, pp. 231–245.

49.

Toledo

Koutsopoulos

H. N.

Davol

Ben-Akiva

M. E.

Burghout

Andréasson

Johansson

Lundin

Calibration and Validation of Microscopic Traffic Simulation Tools: Stockholm Case Study. Transportation Research Record: Journal of the Transportation Research Board, 2003. 1831: 65–75.

50.

Toledo

Ben-Akiva

M. E.

Darda

Jha

Koutsopoulos

H. N.

Calibration of Microscopic Traffic Simulation Models with Aggregate Data. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1876: 10–19.

51.

Antoniou

Balakrishna

Koutsopoulos

H. N.

Ben-Akiva

Off-Line and On-Line Calibration of Dynamic Traffic Assignment Systems. IFAC Proceedings Volumes, Vol. 42, No. 15, 2009, pp. 104–111.

52.

Antoniou

Balakrishna

Koutsopoulos

H. N.

Ben-Akiva

Calibration Methods for Simulation-Based Dynamic Traffic Assignment Systems. International Journal of Modelling and Simulation, Vol. 31, No. 3, 2011, pp. 227–233.

53.

Antoniou

Ben-Akiva

Koutsopoulos

H. N.

Nonlinear Kalman Filtering Algorithms for On-Line Calibration of Dynamic Traffic Assignment Models. IEEE Transactions on Intelligent Transportation Systems, Vol. 8, No. 4, 2007, pp. 661–670.

54.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

, et al. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 2011, pp. 2825–2830.

55.

Paszke

Gross

Massa

Lerer

Bradbury

Chanan

Killeen

, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, Vol. 32, 2019, pp. 8024–8035.

56.

Kingma

D. P.

Adam: A Method for Stochastic Optimization. arXiv Preprint arXiv:1412.6980, 2014.

Estimating City-wide Operating Mode Distribution of Light-Duty Vehicles: A Neural Network-based Approach

Abstract

Keywords

Approach

Case Study

Traffic Simulation

Model Calibration

Trajectory Data

Operating Mode Distribution

Infrastructure Data

Training

Evaluation

Conclusion

Footnotes

Acknowledgements

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References