Abstract
Driving cycles are a set of driving conditions and are crucial for the existing emission estimation model to evaluate vehicle performance, fuel efficiency, and emissions, by matching them with average speed to calculate the operating modes, such as braking, idling, and cruising. Although existing emission estimation models, such as the Motor Vehicle Emission Simulator (MOVES), are powerful tools, their reliance on predefined driving cycles can be limiting, as these cycles often do not accurately represent regional driving conditions, making the models less effective for city-wide analyses. To solve this problem, this paper proposes a modular neural network-based framework to estimate operating mode distributions bypassing the driving cycle development phase, utilizing macroscopic variables such as speed, flow, and link infrastructure attributes. The proposed method is validated using a well-calibrated microsimulation model of Brookline MA, the United States. The results indicate that the proposed framework outperforms the operating mode distribution calculated by MOVES based on default driving cycles, providing a closer match to the actual operating mode distribution derived from trajectory data obtained from the microsimulation. Specifically, the proposed model achieves an average RMSE of 0.04 in predicting operating mode distribution, compared with 0.08 for MOVES. The average error in emission estimation across pollutants is 8.57% for the proposed method, lower than the 32.86% error for MOVES. In particular, for the estimation of CO2, the proposed method has an error of just 4%, compared with 35% for MOVES. The proposed model can be utilized for real-time emissions monitoring by providing rapid and accurate emissions estimates with easily accessible inputs.
Keywords
In 2021, the transportation sector in the U.S. accounted for 67% of the country’s total petroleum consumption, with light-duty vehicles responsible for 63% of this usage ( 1 ). The extensive use of petroleum contributes significantly to greenhouse gas (GHG) emissions, with the transportation sector responsible for 29% of total U.S. GHG emissions in 2022, making it the largest contributor of direct emissions ( 2 ). Given the adverse environmental, social, and economic impacts of transportation-related emissions, researchers and practitioners have been intensively working to quantify these emissions.
Vehicle emissions are influenced by various factors, including driving style, traffic congestion, traffic control devices, vehicle performance, fuel quality, and ambient operating conditions ( 3 – 5 ). Therefore, existing emission models have incorporated a range of different variables to better reflect the impact of traffic conditions on emissions ( 6 , 7 ). The current emission modeling system comprises several models developed to estimate traffic emissions. These models can be broadly categorized into two types: fuel-based and travel-based ( 8 ). Fuel-based models directly use fuel consumption data, which is available from tax records, to estimate GHG based on emission factors expressed in grams per unit of fuel consumed. A notable example of the fuel-based models is the Computer Programme to Calculate Emissions from Road Transport (COPERT), developed by the European Environment Agency (EEA) ( 9 ), whereas travel-based emission models combine emission factors for specific regions with travel data to generate emission inventories. These models use emission factors expressed in emissions per unit of driving activity, which can be obtained through dynamometer tests or on-road emission testing.
In the U.S., two primary transportation emission models are currently in use: MOtor Vehicle Emission Simulator (MOVES), developed by the U.S. Environmental Protection Agency (EPA), and EMFAC, developed by the California Air Resources Board (CARB). These models estimate emissions using emission factors expressed as grams of emission per unit of driving activity, primarily based on traditional dynamometer tests of predefined driving cycles. As of July 29, 2024, MOVES4 is the latest version and is used by EPA for State Implementation Plans and transportation conformity analyses outside California ( 10 ).
Driving cycles, also known as driving schedules, are used in emission models for calculation of emissions, certification, and testing of new vehicles and engines ( 11 ). A driving cycle includes data points representing vehicle velocity at various times, reflecting real-world driving scenarios to assess vehicle performance metrics such as emissions and fuel economy. Driving cycles are categorized into modal and transient types: modal driving cycles involve constant acceleration and speed phases, whereas transient driving cycles feature frequent and dynamic changes in velocity ( 12 ).
In the U.S., the National Renewable Energy Laboratory (NREL) has advanced driving cycle development with its DRIVE (Drive-Cycle Rapid Investigation, Visualization, and Evaluation) tool ( 13 , 14 ). This tool utilizes Global Positioning System (GPS) and controller area network data to create custom driving cycles based on real-world activity. NREL also offers tools like DriveCAT and the Fleet DNA repository, which provide valuable insights for overcoming technical barriers and enhancing transportation technologies ( 15 – 19 ). Much research has been conducted to improve driving cycle accuracy. Zhang et al. analyzed start and idling activities to refine emission estimates in MOVES using the FleetDNA and CE-CERT databases, highlighting the need for specific data collection by fleet type ( 20 , 21 ). Ivanič developed driving cycles for residential refuse trucks in New York using thirty-three parameters, such as minimum, average, maximum, and standard deviation of speed and acceleration, and so forth ( 22 ). Shi et al. employed a chase car method and specified twelve parameters, including average road power ( 23 ), while Kamble et al. used five velocity and acceleration parameters ( 24 ). Lai et al. included average road resistance among ten parameters for city-specific bus driving cycles ( 25 ). Other studies, such as those by Galgamuwa et al. ( 26 ), Badusha and Ghosh ( 27 ), and Nesamani and Subramanian ( 28 ), have explored various methods and parameters for driving cycle development, with some suggesting weighted factors for different parameters. Kondaru et al. introduced weighing factors for different parameters according to their importance to develop real-world driving cycle ( 12 ). However, their study does not propose a proper methodology for determining the weighting factors.
Worldwide, several driving cycles have been established, including the Japanese Cycle (JC08) ( 29 ), Federal Test Procedure (FTP-75) ( 30 ), New European Driving Cycle (NEDC) ( 31 ), and Worldwide harmonized Light duty driving Test Cycle (WLTC) ( 13 ), CARB unified (LA92) cycle ( 32 ), and cycles for cities such as Athens, Melbourne, and Beijing ( 33 ). NEDC includes the urban drive and high-speed motorway drive sub-cycles ( 12 ). WLTC, developed by the United Nations Economic Commission for Europe, is the latest and aims to closely resemble real-world driving scenarios worldwide, though it does not account for regional variations. Country-specific or regional driving cycles can provide more accurate vehicle performance predictions in certain areas ( 12 , 13 ). A common approach to develop driving cycles involves selecting microtrips that best represent speed–time data traces. Microtrips are segments of data bounded by idle modes. The LA01 cycle uses a Monte Carlo simulation approach and Markov process theory to describe actual driving processes, creating cycles that match target Speed-Acceleration-Frequency Distributions ( 34 ). The quality of developed driving cycles depends on the selection of performance measures and the development method used. Most methodologies employ random or quasi-random methods for selecting microtrips ( 35 ).
Developing representative driving cycles is inherently challenging because of the need to capture the diverse and dynamic nature of real-world driving behaviors. These behaviors vary widely according to geography, traffic conditions, driver habits, and vehicle types. The penetration rate of those data is usually low because of the small number of dedicated vehicles used for data collection. Most developed local driving cycles suffer from small sample sizes collected from a few vehicles over short periods, making it difficult to represent all driving conditions accurately.
Studies suggest that standardized driving cycles often fail to distinguish between separate phases of urban roads, rural roads, and motorways, leading to inaccuracies in emissions estimation ( 33 , 36 ). Additionally, drive cycle development requires several parameters and different studies use varying numbers of parameters with different weights, complicating the process further. There is no standardized method for weighing these parameters, adding another layer of complexity to developing accurate and representative driving cycles. EPA’s MOVES model uses forty-nine drive cycles to represent all driving conditions and vehicle types. At the network level, emissions are estimated based on the link-averaged speed, which is critical in choosing the driving cycle that matches the closest average speed. Based on selected driving cycles, MOVES calculates speed and vehicle specific power (VSP), assigns operating modes based on speed and VSP ranges, and then computes the fraction of time spent in each mode. The operating mode fractions are adjusted to account for the difference between the link’s speed and the driving cycle speed through interpolation. This whole process may take longer running times (e.g., several days) to compute emissions in a traffic network on a city-wide scale ( 37 ). However, this approach does not account for link features such as speed limit, lanes and traffic control, and so forth. Consequently, using default driving cycles to estimate the operating mode distributions can be misleading, and developing local driving cycles remains challenging.
Given the challenges and limitations associated with traditional driving cycles and an activity-based model such as MOVES, there is a need for simpler and efficient models that rely on macroscopic traffic variables and network features. By developing models that relate operating mode distributions to easily accessible data, such as average speed, traffic volume, free flow speed, number of lanes, and intersection types, the emission estimation process can be simplified. Such models would provide more accurate and reliable emissions estimates while being easier to implement and less data-intensive, making them highly valuable for researchers and policymakers. Li et al. were the first to estimate operating mode distributions directly using macroscopic variables by building models between operating mode distributions and average speed to facilitate emission estimation ( 6 ). They found that arterials and collectors have different operating mode distributions even at the same average speed. However, their model only considers average speed and ignores infrastructure-related features that affect driving patterns.
To fill the gap, this study develops a methodology to directly estimate the city-wide operating mode distributions of traffic links in a traffic network. This estimation, using a modular neural network (MNN), leverages macroscopic traffic variables and link infrastructure features, providing a more efficient and potentially more accurate approach compared with traditional methods.
Currently, the typical application of MOVES for estimating emissions in an area involves using average link speeds to match default driving cycles, which are then used to calculate operating mode distributions and ultimately determine emissions.
The proposed approach suggests the use of easily accessible infrastructure and loop detector data to estimate the most appropriate operating mode distributions. The emission estimation process using MOVES and the proposed model are shown in Figure 1. MOVES Function-1 takes the average speed as input and selects the appropriate driving cycles from its default database. Function-2 processes the selected driving cycles to calculate operating mode distributions through VSP and speed bins. MOVES Function-3 processes the operating mode distribution and all other inputs and calculates the final emissions.

Calculating emissions using MOVES and the proposed model. The proposed model learns traffic dynamics from aggregated traffic and infrastructure data to directly infer accurate operating mode distributions in significantly less runtime, whereas MOVES uses infrastructure features not related to operating modes and bases its operating modes on drive cycles selected by speed.
To the best of our knowledge, this is the first study to use a machine learning model to estimate the distribution of all twenty-three operating modes in MOVES for every link in a city-wide area. The proposed approach has the potential to support traffic emissions analysis for research and decision making at various levels. The contributions of this paper are multifold. First, it develops an MNN designed to directly estimate the operating mode distribution of traffic links leveraging macroscopic traffic variables and link infrastructure features, providing an efficient and accurate method compared with traditional techniques. Second, it presents a framework for estimating traffic emissions on a city-wide scale. This framework is designed to operate efficiently within a reasonable runtime without compromising accuracy, thus addressing the limitations of existing methodologies.
In the rest of the paper, we present the proposed approach, followed by a case study to apply and validate the proposed model followed by conclusions with future work recommendations.
Approach
The critical step in the traffic emissions estimation process is to infer the appropriate distribution of the operating modes in the given area from the typical traffic and network data. For this, we use an MNN to relate traffic characteristics. The approach, in addition to speed, uses other variables such as number of lanes, speed limit, traffic volume, road class, traffic control type, and so forth. However, and for the purpose of this study, to validate the approach we use a microscopic traffic simulation model to obtain detailed trajectories (and thus driving cycles) that allow us to estimate emissions accurately with MOVES and establish the ground truth.
The methodological framework is shown in Figure 2. The framework for estimating operating mode distribution involves a traffic simulation model, input data, and an MNN. The traffic simulation model utilizes comprehensive traffic network data such as loop detector data, Origin–Destination (OD) data, traffic signal plans, and various road attributes like lanes, link length, speed limit, road class, and intersection traffic control. The simulation model generates trajectory data, which are processed to create the operating mode distributions. The operating mode distribution, aggregated traffic data, and infrastructure features are used as training data for MNN. The simulation data are needed solely for the initial training of the model. Once trained, the model requires only the easily accessible aggregated loop detector and network infrastructure data for subsequent use. The MNN network includes a shared layer followed by specialized layers for different speed categories: braking/idling, low speed, moderate speed, and high speed. The outputs from these layers are combined using a softmax function to produce the final operating mode distribution, which is passed to the corresponding MOVES module to estimate the traffic emissions.

Methodological framework of the proposed model.
Artificial neural networks (ANNs) are powerful tools for modeling complex, nonlinear systems and have demonstrated strong performance in diverse fields, including environmental monitoring, transportation analytics, and dynamic system modeling ( 38 , 39 ). Their flexibility enables them to capture high-dimensional patterns and learn representations from noisy or incomplete data ( 40 ). Building on this foundation, an MNN architecture is proposed to model the operating mode distribution using infrastructure, average speed, and volume data. MNN employs multiple ANNs as individual modules, each responsible for different aspects of the problem. The system operates by dividing the problem into smaller subproblems and assigning each subproblem to a different module. The results from these modules are then combined to produce the final output of the entire system ( 41 ). MNNs are particularly suited for problems with features common to several groups, as they allow for both shared learning and specialized processing ( 42 ).
In the context of MOVES, vehicle operating modes are categorized based on speed and VSP bins. Speed bins group operating modes into major categories such as idling, low speed, moderate speed, and high speed. Each major category, defined by a common speed, is handled by a separate module within the proposed architecture.
The detailed architecture of MNN is shown in Figure 3. The architecture includes an input layer, two shared layers, four specialized modules, an integrator, and an output layer. The vector

Modular neural network architecture with input nodes, hidden layers, and output layer.
Following the shared layers, the architecture employs four specialized modules to handle different operating modes. For the Braking (0) and Idling (1) modes, the data are processed through a fully connected layer with thirty-two neurons, followed by another fully connected layer with two neurons that outputs the probabilities for Bin 0 (B0) and Bin 1 (B1). These two modes are important for understanding vehicle behavior in urban traffic conditions where frequent stops and starts are common.
The low-speed module handles the prediction of operating modes 11 to 16, which correspond to low-speed conditions. In the low-speed module, the data are first processed by a fully connected layer with thirty-two neurons and then through another fully connected layer with six neurons, which predicts the probabilities for bins B11 to B16. Vehicles operating in B11–B16 operating modes are navigating through congested or low-speed areas.
The moderate-speed module is responsible for predicting the probabilities of operating modes 21 to 30, by processing the hidden features from the shared layer through a fully connected layer of thirty-two neurons followed by a layer with nine neurons. These moderate speeds are characteristic of smoother traffic flow conditions, such as those found on arterial roads. The high-speed module focuses on high-speed operating modes 33 to 40, which passes hidden features through a fully connected layer with thirty-two neurons and then a layer with six neurons that provides the probabilities for B33 to B40. The outputs from the specialized modules are integrated into a single prediction vector, which is then normalized using a softmax function to produce a probability distribution across all operating modes.
Case Study
To validate the proposed model, the paper uses Brookline, Massachusetts, as a case study. Brookline presents a diverse mix of urban and residential traffic patterns. Its road network includes a variety of street types, from major arteries to quiet residential roads. The town features a blend of dense commercial areas, residential neighborhoods, and university campuses. This land use mix results in a wide range of traffic scenarios with varying vehicle speeds. Brookline’s geographical and traffic characteristics make it a good candidate to apply and validate the proposed model. Additionally, the integration of approximately 100 air quality sensors within Brookline, as part of a larger project, provides extensive environmental data that enhance the scope of this research, enabling future studies to explore the interplay between traffic patterns and air quality. This infrastructure not only supports the immediate needs of this study but also lays the groundwork for subsequent investigations, thereby underscoring the suitability of Brookline as a model validation site.
Traffic Simulation
A comprehensive microsimulation model has been developed in TransModeler ( 43 ) for the City of Brookline, Massachusetts. The road geometry and lane information are obtained from OpenStreetMap (OSM) ( 44 ). Detailed information, such as lane width, the number of lanes at intersections, and lane connectivity are acquired from Google Street View (GSV) ( 45 ). The study area is divided into twenty-seven traffic analysis zones using MassDOT transportation planning data. Origin and destination were defined at each zone, as well as entry and exit points of the various roads that cross the boundaries for a total of 169 origins. Additionally, traffic signal plans of fifty-seven intersections and fixed loop detector data were obtained from the City of Brookline.
The network comprises 1814 links, each of which may consist of multiple segments to represent changes in road geometry accurately. Figure 4 displays the traffic simulation network of Brookline in TransModeler, highlighting the inclusion of almost every significant link. Red dots mark the locations where traffic count sensors are installed for calibration purposes. The simulation utilizes traffic input data from the morning peak hour (8 to 9 a.m.) to reflect the city’s traffic conditions during a critical period accurately.

Traffic microsimulation model of Brookline, MA, in TransModeler, with red dots indicating traffic count sensor locations. Map data ©2025 Google.
Model Calibration
The calibration process involves adjusting the model parameters to align simulated traffic patterns with observed data, particularly in the context of OD matrix estimation. Because of its importance, OD matrix estimation has been extensively studied by various researchers. Osorio studied dynamic OD matrix calibration for large-scale networks using simulation-based optimization ( 46 ). Tympakianaki et al. proposed a robust simultaneous perturbation stochastic approximation algorithm for dynamic OD matrix estimation ( 47 , 48 ). Toledo et al. presented methods for calibration and validation of microscopic models ( 49 , 50 ). Antoniou et al. presented calibration models and approaches for offline and online dynamic traffic assignment systems ( 51 – 53 ).
The calibration process is illustrated in Figure 5. A historical OD matrix (available from previous studies and planning models) is fed into the model, and the simulated traffic counts are then compared against the actual traffic counts obtained from loop detectors placed throughout the network. If the discrepancy between the simulated and actual traffic counts exceeds a predefined error threshold, the OD flows are updated accordingly, and the process is repeated. This iterative process continues until all simulated counts match the actual loop detector counts within an acceptable error threshold.

Traffic simulation model origin–destination (OD) flow calibration.
The calibration process ensures that the model accurately reflects the current traffic conditions. Figure 6 shows the observed and simulated traffic counts at the sensor locations before and after the calibration. Although some discrepancies between simulated and observed data still exist, the remaining differences are acceptable for this specific application.

Observed versus simulated traffic counts before and after the calibration.
Trajectory Data
Microscopic traffic simulation models output trajectory data for all vehicles during the entire simulation period at a high frequency (e.g., 1 Hz). Speed and acceleration are particularly important variables used to calculate VSP, a key input for determining the operating modes of vehicles at a detailed level. VSP is a measure of the power required by a vehicle to overcome various forces such as rolling resistance, aerodynamic drag, and inertia. By calculating VSP, vehicles can be classified into different operating modes, which are indicative of their driving behavior and energy consumption.
Operating Mode Distribution
Table 1 presents the operating mode bins as defined by the MOVES model, which categorizes vehicle operating modes based on VSP and speed. For instance, Bin 11 represents conditions where VSP is less than 0 and speed is between 1 mph and 25 mph, whereas Bin 35 represents conditions where VSP is between 6 and 12, and speed is greater than 50 mph. The braking Bin 0 refers to the condition when instantaneous deceleration is less than 2 mph/s, or the deceleration of continuous three-second data is less than 1 mph/s.
EPA MOVES Operating Mode Bins
Note: NA = not available.
The instantaneous VSP at time
where
The trajectory data are processed to calculate VSP for each time step, and operating modes are assigned based on the computed VSP values. The result is a detailed distribution of operating modes across all traffic segments.
Infrastructure Data
The infrastructure data provide a detailed representation of various parameters essential for traffic modeling and analysis. The road segments are characterized by multiple attributes, including the number of lanes, segment length, travel lanes, free flow speed, speed limit, road class, control type, and priority. These attributes collectively define the physical and regulatory environment of the segments, influencing vehicle behavior, route choices, and overall traffic dynamics.
The classification of roads, indicated by the class attribute, categorizes the segments into various types, including arterial, collector, access road, and local street. The control attribute defines the type of traffic control device present at the intersection of the segment’s end. A value of 0 indicates that there is no control device, and the segment continues onto another link. Other control types include actuated, pretimed, and roundabout, each representing different traffic signal operations or roundabout presence. Actuated signals adjust their phases based on real-time traffic conditions, whereas pretimed signals follow a fixed schedule. Priority codes are used to determine right-of-way between conflicting turning movements at intersections without explicit signals or signs.
Training
The simulation model was used to generate a training dataset. The resulting dataset was subsequently divided into training and testing subsets in an 80:20 ratio. The features within the data were normalized using the StandardScaler from the sklearn library in python, which standardizes features with zero mean and unit variance ( 54 ). To facilitate the training process, the training and testing datasets were converted into TensorDataset objects, and data loaders were employed to handle the data in mini-batches ( 55 ). Specifically, a batch size of thirty-two was utilized, meaning that thirty-two samples were processed before the model parameters were updated. The data loader ensures that during each epoch the training data are shuffled, thereby promoting better generalization by preventing the model from learning the order of the samples.
The model training used the mean square error (MSE) for the loss function, which is a common metric for regression tasks. An Adam optimizer ( 56 ) was employed for model optimization, with a learning rate set to 0.001. Adam optimizer is efficient and effective in training deep learning models. The model was trained for 500 epochs. The training of the proposed model was conducted on a system with a 13th Gen Intel(R) Core(TM) i7-1355U processor (1.70 GHz) and 16.0 GB of RAM. Figure 7 shows the training and test loss curves for the proposed model using MSE as a loss metric. Both curves exhibit a sharp decline in loss during the initial 200 epochs, indicating effective learning. However, beyond 200 epochs, the rate of decrease becomes less pronounced, with the training loss continuing to decrease steadily while the test loss shows only minimal improvement.

Training and test loss of the proposed model.
Evaluation
The study compares emissions estimated using two distinct approaches to demonstrate the performance of the proposed model. First, ground truth emissions are established by using a microscopic traffic simulation model to obtain detailed vehicle trajectories. These trajectories are then processed to calculate the operating mode distributions, which are subsequently used to estimate emissions accurately using MOVES. Second, MOVES default methodology is utilized to estimate emissions, where link speeds are fed to MOVES and MOVES’s default approach is used to select driving cycles and operating mode distributions.
The model evaluation was performed on a dataset consisting of 2779 traffic segments, each represented by thirteen features. The computation was carried out on a system equipped with a 13th Gen Intel(R) Core(TM) i7-1355U processor (1.70 GHz) and 16.0 GB of RAM. The execution, conducted using the PyTorch library, completed in just 81.54 ms.
To evaluate the model performance, we use the root mean square error (RMSE) and
Figure 8 shows the comparison of operating mode fractions using the proposed model and MOVES. In the figure, bins are selected from each of the four main modules of the proposed model to show their individual training and overall performance in the MNN. Each point in the plot represents the actual versus predicted operating mode fractions for a link. Figure 8a and b present the estimated operating mode fraction against the actual fraction for Bin 0 and Bin 1 (braking/idling module), respectively. Figure 8c and d show the estimated versus actual fractions for Bin 11 (low-speed module) and Bin 22 (moderate-speed module). Similarly, Figure 8e and f present the estimated versus actual fractions for Bin 30 and Bin 35 (high-speed module), respectively. The results show that bins with higher fractions result in better performance of the proposed model, as evidenced by the closer alignment of the estimated fractions with the actual data.

Comparison of operating model fractions obtained using the proposed model and MOVES against ground truth. (a) Estimated versus true operating mode fractions of Bin 0. (b) Estimated versus true operating mode fractions of Bin 1. (c) Estimated versus true operating mode fractions of Bin 11. (d) Estimated versus true operating mode fractions of Bin 22. (e) Estimated versus true operating mode fractions of Bin 30. (f) Estimated versus true operating mode fractions of Bin 35.
The best performance is observed for operating mode B1, which represents stop-and-go traffic typically encountered during urban peak hours. It has the lowest RMSE (0.0447) and the highest
For bins such as B0, B11, and B22, the proposed model also shows better performance with lower RMSE values and higher
Figure 9 illustrates the RMSE in the estimated fraction for each operating mode bin using the proposed model and MOVES. The green and orange bars represent the RMSE for the proposed model and MOVES, respectively. The results show that the proposed model consistently exhibits lower RMSE values across nearly all bins compared with MOVES. Operating modes associated with higher speeds, such as B39, are less common in urban areas. Consequently, these modes are under-represented in the training dataset, leading to a slightly higher RMSE when compared with MOVES. Augmenting the training dataset with additional data that increase the representation of such operating modes could improve the model performance.

Root mean square error in estimated operating mode fraction using the proposed model and MOVES.
The operating mode distributions obtained from the two approaches are passed to the relevant module of MOVES to estimate the corresponding emissions. Figure 10 provides a comparison of emissions of different pollutants using the detailed trajectories (used as ground truth), the proposed model, and MOVES. The pollutants analyzed include hydrocarbons (HC), carbon monoxide (CO), nitrogen oxides (NOx), nitric oxide (NO), carbon dioxide (CO2), and particulate matter (PM2.5). Each subplot in Figure 10 shows the emissions for a specific pollutant, with the bars representing the actual emissions in blue, the emissions estimated by the proposed model in green, and the emissions estimated by MOVES in orange.

Comparison of total emissions during the peak hour (8–9 a.m.) in Brookline, estimated using: (i) second-by-second speed profiles derived from simulated trajectory data mapped to each link and processed with vehicle specific power-based MOVES approach (“actual”), (ii) the proposed model, and (iii) EPA MOVES default drive cycle-based approach using link average speeds. Note: The “actual” emissions are not measured or taken directly from a microsimulation platform; rather, they are computed by post-processing simulated trajectory data to generate detailed speed profiles, allowing MOVES to avoid default drive cycles.
For HC, the proposed model and MOVES both overestimate emissions compared with the actual data, but the proposed model’s estimate is closer to the actual value. In the case of CO, the proposed model overestimates emissions, whereas MOVES significantly underestimates them. For NOx and NO, both models overestimate emissions, with the proposed model providing a better estimate than MOVES. For
Figure 11 shows the percentage emission estimation error for each pollutant, for the two approaches. The green and orange bars represent the percentage error for the proposed model and MOVES, respectively. The results show that for all pollutants, the proposed model exhibits a lower percentage error compared with MOVES. For instance, the proposed model shows significantly lower errors for pollutants such as HC, NOx, and PM2.5. MOVES, on the other hand, consistently has higher percentage errors across the pollutants. The results indicate that emissions estimates based on the default operating modes may be erroneous, and in some cases underestimate important pollutants such as PM2.5.

Percentage error in pollutant estimation using the proposed model and MOVES.
Conclusion
The study addresses the problem of developing representative driving cycles that reflect real-world driving behaviors. The study introduces a modular neural network-based approach that aims to estimate the operating mode distributions for a city-wide urban network as a function of easily accessible traffic and network features such as average speed, average volume, number of lanes, traffic control, and so forth. The study involved a comprehensive microsimulation of an urban traffic network using OD flow data, calibrated against sensor count data. The detailed traffic simulation framework underpinned the development of the NN-based model, which uses macroscopic traffic variables and link infrastructure features to learn the operating mode distributions.
The results from the proposed model show improvements over the traditional MOVES approach to approximate operating mode distribution from just average speed. For instance, the RMSE and
The analysis of emissions estimation validates the performance of the proposed approach, which provided closer estimates to the actual emissions across pollutants like HC, CO, NO, NO2, CO2, and PM2.5. The proposed model achieves an average RMSE of 0.04 in predicting operating mode distributions, compared with 0.08 for MOVES. Furthermore, the average error in emission estimation across pollutants is 8.57% for the proposed method, lower than the 32.86% error for MOVES. Notably, for CO2 estimation, the proposed method has an error of just 4%, compared with 35% for MOVES.
The potential integration of the proposed approach with emerging technologies presents a promising avenue for enhancing real-time traffic management systems. Edge computing could facilitate real-time inference at the network edge, enabling adjustments to traffic flow based on dynamic operating mode predictions. This could improve traffic management in urban settings, reducing congestion and optimizing route choices in real time. Furthermore, coupling the proposed approach with air quality prediction models could provide a more comprehensive tool for urban planners and environmental agencies. By predicting emission hotspots in real time, city authorities could implement more targeted air quality improvement measures, such as dynamic congestion pricing or emission-aware routing. The proposed approach can also be extended to interdisciplinary applications, particularly in smart city ecosystems. For instance, the precise estimation of operating modes at the link level allows for the implementation of emission-aware routing algorithms. Such applications could inform drivers or autonomous vehicle systems of the optimal routes that minimize emissions, thereby contributing to lower urban air pollution.
The proposed framework offers an improvement in computational efficiency over traditional emission estimation methods, such as the widely used MOVES framework. Traditional models, particularly when applied at a city-wide scale, often entail substantial computation times that can extend to several days, which limits their applicability in scenarios requiring real-time analysis and decision making. By leveraging aggregated traffic data from field sensors, the proposed model bypasses the lengthy computation phases. This efficiency makes the model viable even for real-time evaluation, with potential applications in dynamic traffic management and policy interventions. These advancements hold promise for scalable and responsive emission monitoring solutions that can adapt to the complex and variable demands of city-wide traffic systems, enhancing both environmental management and urban planning strategies.
Although the proposed model demonstrates improvements in accuracy over the default MOVES procedure, this paper primarily serves as a proof of concept for the feasibility of a NN-based approach. Important future work will include testing the model in contrasting environments to evaluate its adaptability and determine the data and effort thresholds required for retraining or fine-tuning. Additionally, future research will focus on training the model using readily available real-world trajectory data, such as GPS traces from OSM and other open-source platforms, to reduce reliance on simulation and improve the model’s accuracy and generalizability. This will include refining the neural network architecture to better capture actual operating mode distributions. We will also investigate potential data gaps and develop strategies to address them, ensuring the model’s applicability to diverse urban and non-urban settings. The test and training loss curves suggest that there is room for improvement in model performance, and future efforts will also explore parameter optimization and architectural enhancements to ensure robust and consistent results across diverse regions. Also, future research will explore model robustness under varying data quality conditions, including sensor noise and missing values. By leveraging trajectory data, we aim to conduct detailed sensitivity analyses and implement mitigation strategies such as data imputation and regularization techniques. Future work will also include roadway classification-level analysis of operating mode distributions as the framework is expanded to multi-city and national-scale applications, where robust comparisons across arterial, collector, and local road types become statistically viable.
Footnotes
Acknowledgements
The authors are grateful for the support of Northeastern University. The Large Language Model ChatGPT 3.5/4o was used to improve grammar, phrasing, and clarity of the manuscript.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: M. Usama, H. Koutsopoulos; data collection: L. Wang; analysis and interpretation of results: M. Usama, H. Koutsopoulos, Z. He; draft manuscript preparation: M. Usama, Koutsopoulos, Z. He. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Zhengbing He is a member of Transportation Research Record’s Editorial Board. Apart from this, the authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
Any opinions, findings, conclusions, or recommendations expressed in the paper are those of the authors and do not necessarily reflect the views of the funding agencies.
