Abstract
The impact of drivers' subjective heterogeneity on lane-change (LC) decision-making behavior is essential for identifying drivers' LC intentions and predicting risks more accurately, thereby reducing traffic accidents caused by LC at freeway exit ramps. This study, based on the Next Generation Simulation dataset, considered drivers' risk preference heterogeneity and proposed a research framework that integrated three modules: a risk preference quantification module based on cumulative prospect theory (CPT), a LC intention recognition module (combined the Temporal Convolutional Network, the Long Short-Term Memory, and the self-attention), and a real-time LC risk prediction module based on the Light Gradient Boosting Machine (LightGBM). This approach effectively combined LC intention recognition with real-time risk prediction, improving prediction efficiency and identifying the key features influencing LC risk. Results showed CPT could intuitively describe drivers' risk preferences, transforming previous qualitative descriptions into quantitative analysis. Considering drivers' short-term risk preferences improved the model's intention recognition performance, achieving optimal results with a 2-s time window, where each index reached 95%. The LightGBM real-time risk prediction, based on LC intention recognition, achieved prediction performance of more than 93% for all indices and showed high sensitivity to risky LC behaviors, with prediction accuracy reaching 97%. Feature importance analysis suggested that, for left lane changes, drivers needed to prioritize vehicle speed control. In contrast, for right lane changes, the driver's risk preference was the most critical feature. The proposed framework can offer unique insights for vehicle safety warnings, support autonomous and assisted driving development, and provide theoretical references for traffic education and management.
The increasing demand for freeway travel has brought tremendous pressure to freeway traffic safety management. The National Highway Traffic Safety Administration reported that the number of motor vehicle traffic fatalities in 2023 was 40,901, estimating that there would be 39,345 traffic fatalities in 2024 ( 1 ), and traffic accidents caused by unsafe lane-changing (LC) operations account for 11% of all accidents ( 2 ). Therefore, LC behavior has been a significant focus in freeway traffic accident risk research, since LC behavior is often more complex and riskier compared with car-following behavior ( 3 ). The exit ramp usually serves as a connecting section between main roads and other roads ( 4 ). There are many vehicle LC interactions near the exit ramp area, and traffic accident rates here are much higher than in other locations ( 5 ). This is because driving patterns around exit ramps differ significantly: Vehicles traveling straight tend to maintain high speeds, whereas those intending to exit the freeway may make sudden lane changes to the outermost lane through actions such as emergency braking. This leads to numerous LC interactions between vehicles. Therefore, advance recognition of drivers’ LC intentions and prediction of LC risks near the exit ramp area have become a primary focus in research on traffic accident risks in these zones.
Previous studies on LC behavior have mainly been conducted based on driving simulator experiments or actual vehicle experiments ( 6 , 7 ). In the complex traffic environment near the exit ramp area, it is not easy to ensure the safety of drivers in actual vehicle experiments, whereas the human and material costs required for the driving simulator experiment increase with the growth of data volume. When a large amount of data with high resolution is needed, both methods face limitations. The Next Generation Simulation (NGSIM) trajectory dataset, provided by the Federal Highway Administration (FHWA), is a comprehensive, openly accessible, accurate, and high-resolution sample dataset that supports future traffic simulation research ( 8 ). Based on NGSIM, researchers have obtained large amounts of high-precision vehicle trajectory data containing microscopic driving information through data mining and image recognition technologies. Studies have used these data on drivers’ LC intentions and risks through machine learning algorithms and deep learning models ( 9 ). This study provides new insights into vehicle safety warnings in complex environments. It has the potential to significantly improve traffic safety and promote the development of autonomous and advanced driver assistance systems.
Past research on LC behavior has often been based on the erroneous assumption that drivers are fully rational, capable of obtaining all theoretical information and making optimal decisions ( 10 ). In reality, drivers exhibit bounded rationality in their LC behavior ( 11 ); drivers’ LC behavior is influenced by personal emotions, driving style, and other factors, leading to risk preference behavior decisions. Numerous studies have shown that identifying risk preferences and providing personalized driving warning information to drivers with different risk preferences help reduce dangerous driving operations and improve traffic safety ( 12 ). It is beneficial and necessary to consider drivers’ bounded rationality and the heterogeneity of risk preferences when studying LC risk.
Therefore, this study constructs a real-time LC risk prediction framework that starts with analyzing drivers’ risk preferences under bounded rationality and uses the LC intention recognition as a trigger for risk prediction. The framework includes three modules: a risk preference quantification module, a LC intention recognition module, and a real-time LC risk prediction module. The main contributions of this paper are twofold: (1) quantitatively analyzing drivers’ bounded rational risk preferences and using this heterogeneity to recognize LC intentions, and (2) developing a real-time LC risk prediction based on LC intention recognition, proposing new methods for risk quantification, and identifying the key factors influencing LC risk.
Literature Review
Risk Preferences Under Bounded Rationality
The theory of bounded rationality was first proposed by Simon ( 13 ) to explain how individuals make decisions in complex and uncertain environments considering individual differences, cognitive limitations, risk preferences, and limited reasoning abilities. As the study of bounded rationality continues to deepen, many emerging theories have been developed, such as regret theory, satisfaction principle, and cumulative prospect theory (CPT). These theories provide new perspectives for explaining individual decision-making behavior and have promoted the application of bounded rationality in transportation research.
Previous research on bounded rationality in transportation has primarily focused on path selection ( 14 ) and travel decision-making ( 15 ). Mahmassani and Jou ( 16 ) argued that decision-makers do not always pursue optimal decisions but rather seek relative ones that meet their needs. They conducted studies on travel path prediction and dynamic choice of departure times, exploring key influencing factors in the decision-making process ( 17 ). They compared decision outcomes under different theories and found that decision-makers’ risk preferences during the decision-making process were similar to the preferences for gains and losses described in CPT ( 18 ). This discovery provided new perspectives and methods for behavioral decision research. LC behavior, as a classic research area in behavioral decision-making, often exhibits significant characteristics of bounded rationality ( 19 ), which can significantly affect driving safety. However, existing research on behavioral decision-making rarely considers the bounded rationality of LC behavior, leading to discrepancies between research findings and actual decision-making behavior. To reduce the impact of these discrepancies on decision prediction, it is crucial to consider the bounded rationality characteristics of LC behavior.
Previous research on the specific manifestations and impacts of driver subjective heterogeneity in LC behavior has primarily focused on analyzing driving styles ( 20 ), cognitive characteristics, and driving experience ( 21 ). These studies often use various questionnaires, scales, or driving simulation experiments to investigate the subjective effects of these factors on drivers. Although some progress has been made, there is still a lack of specific consideration of the characteristics of bounded rationality in LC behavior. Risk preference, a vital feature of bounded rationality, is generally defined as the tendency to engage in risky activities ( 22 ). Since LC behavior is a decision-making process with inherent risk, drivers’ risk preferences will influence their LC decisions. Therefore, incorporating the concept of bounded rationality’s risk preferences into the study of LC behavior offers highly relevant application conditions and significant potential for future research.
LC Intention Recognition
Identifying LC intentions has always been an important topic in road safety research. The core idea is to extract indicators from vehicle trajectory data that can represent the driver’s potential LC behavior and then use machine learning or deep learning algorithms to identify the driver’s LC intentions for model training. With the rapid development of computer technology, the models and algorithms for identifying LC intentions have been continuously updated. Among them, some common ones include support vector machines (SVM), the hidden Markov model (HMM), the Gaussian mixture model (GMM), gradient boosted decision trees (GBDT), and long short-term memory (LSTM).
Kumar et al. ( 23 ) extracted the vehicle’s lateral position and steering angle parameters and used SVM for multi-class probability analysis, finding that the model could accurately predict driving behavior with an average lead time of 1.3 s. Hou et al. ( 24 ) studied LC behavior using features such as the vehicle’s speed and the relative speed and distance to vehicles in the target lane based on models such as AdaBoost. Liu and Pentland ( 25 ) analyzed trajectory data before the driver performed maneuvers to identify specific driving intentions such as going straight, turning, or LC. They validated their successful research paradigm through driving simulation experiments and HMM. Li et al. ( 26 ) used a hybrid model based on an ensemble Bayesian network and GMM, with input features including the driver’s personality traits, the vehicle’s lateral position, and turn signals to identify LC intentions. Gu et al. ( 27 ) creatively proposed a GBDT-based method for extracting intermediate layer features to identify LC intentions and demonstrated the model’s effectiveness through simulation experiments. Do et al. ( 28 ) proposed a multiple model-based adaptive estimator to infer the LC intentions of surrounding vehicles. Zhang et al. ( 7 ) developed a driver LC intention recognition model using Stacking ensemble learning and found that the LC intention recognition model achieved higher accuracy than traditional algorithms.
An increasing number of studies have begun to apply methods such as deep learning to further improve the predictive accuracy of LC intention recognition models. Huang et al. ( 29 ) considered the complexity of LC behavior, using deep neural networks (DNN) to capture the driver’s operational behavior during the LC process. They analyzed the impact of these behaviors on LC prediction and pointed out the shortcomings of DNN in handling time-series data. Xie et al. ( 30 ) combined LSTM with a deep belief network to study the influencing features of LC behavior deeply and conducted LC decision prediction based on the feature analysis results. Han et al. ( 31 ) established a neural network with embedded LSTM layers to predict the driver’s LC intentions and validated the proposed model using natural driving data. Zhang et al. ( 32 ) used phase space reconstruction and recurrence plot techniques to convert time-series feature parameters into images and introduced the state-of-the-art Swin Transformer algorithm to develop a novel driver LC intention recognition model. Liu et al. ( 33 ) proposed a novel LC labeling algorithm that combined vehicle dynamics and eye tracking and found that the Transformer model outperformed gated recurrent units, LSTM, and convolutional neural network + LSTM models in recognizing LC intentions before the maneuver.
With regard to LC intention recognition, machine learning and deep learning models have addressed the shortcomings of previous methods. However, since the driver’s subjective factors highly influence LC behavior, it is essential to systematically consider both driver heterogeneity and environmental impacts based on past research. Introducing relevant feature parameters can achieve more accurate intention recognition. Additionally, most existing intention recognition studies only analyze a specific moment before or during the LC process, failing to capture the entire LC intention process of the driver effectively.
LC Risk Prediction
Quantifying risk is a crucial prerequisite for risk prediction, and the principle of risk quantification based on traffic conflicts has been one of the most used methods. Park et al. ( 34 ) improved the stopping sight distance (SSD) and proposed an evaluation method combining risk severity and risk likelihood, successfully using vehicle trajectory data to predict LC risk effectively. Huang et al. ( 35 ) proposed a risk assessment model based on field theory to quantify the dynamic driving risk during the LC process. Chen et al. ( 36 ) considered the impact of observational heterogeneity and analyzed the causes of accidents for different LC behaviors. They quantified LC risk based on the Lane-Change Risk Index (LCRI) principle and performed a comparative analysis using the K-means algorithm and logit model. The results showed significant impact differences related to the varying gap distances between vehicles.
With regard to the construction of risk prediction models, Chen et al. ( 37 ) analyzed the unobserved heterogeneity factors influencing LC risk, showing that factors related to vehicle distance and speed differences significantly affect LC risk. Shangguan et al. ( 38 ) constructed an intention recognition module based on the LSTM model and a risk prediction module based on the LightGBM algorithm, closely integrating the two modules to achieve a quantitative analysis of risk levels based on LC intentions. Additionally, researchers analyzed the factors that influence LC risk. Mahajan et al. ( 39 ) proposed a comprehensive set of methods to assess the collision risk of lane keeping (LK) and LC operations. They studied the relationship between LC risk and traffic flow conditions. The results indicated a significant correlation between speed reduction and LC risk. Zhang et al. ( 40 ) used the LightGBM based on Shapley additive explanation to predict LC risk for cautious, normal, and aggressive drivers and to analyze their risk factors.
Summary Note
In summary, in the field of transportation, research on decision-making behavior under bounded rationality is currently focused mainly on aspects such as route choice and travel mode, with relatively few studies considering LC behavior, which has a greater impact on road traffic safety. Compared with route choice and travel mode, the decision-making process for LC behavior is more complex and diverse. Drivers’ bounded rationality and risk preferences significantly influence LC intentions and risks. Machine learning and deep learning models can initially identify and predict driving intentions by exploring potential patterns in the data. However, since LC behavior is highly influenced by drivers’ subjective factors, it is necessary to systematically integrate drivers’ heterogeneity and environmental influences by introducing relevant feature parameters to achieve higher accuracy in intention recognition. With regard to LC risk prediction, existing research on LC behavior risks is often based on traffic conflicts and driving intentions under the assumption of complete rationality, utilizing vehicle kinematics and LC trajectories to predict risks while seldom considering the heterogeneity of drivers in interacting vehicles. Therefore, this study aims to leverage high-precision vehicle trajectory data to accurately identify drivers’ LC intentions near exit ramps under bounded rationality and risk preference psychology using machine learning and deep learning models. The LC intention recognition moment will then be used as the starting point for real-time LC risk prediction research.
Methodology
This study’s methodological framework is shown in Figure 1, which primarily includes three modules: risk preference quantification, LC intention recognition, and real-time LC risk prediction.

General methodology framework of this study.
Data Preparation
The NGSIM Trajectory Dataset
This study utilized data from the I-80 segment of the NGSIM dataset as the research subject. This is an open traffic dataset provided by the US FHWA. The I-80 data recorded the trajectories of vehicles on the freeway. The data were collected using video cameras and LiDAR devices at a frequency of 0.1 s per frame. Information, including vehicle position, speed, and acceleration, was recorded based on video image detection technology, covering various traffic conditions and driving behaviors ( 8 ). Additionally, this study calculated the vehicle’s lateral speed and acceleration to better describe the driver’s lateral driving behavior characteristics during the LC process.
Data Denoising and Reconstruction
To ensure the accuracy and validity of the data, this study denoised and reconstructed the data using the discrete wavelet transform. The removed components predominantly consist of outliers and observational errors characterized by significant deviations from neighboring values or physically implausible magnitudes. The discrete wavelet transform is given by (41):
where DWT (m, n) is the discrete wavelet transform,
For lateral speed and acceleration calculations, we adopted the same methodology used for longitudinal speed and acceleration in the NGSIM dataset, deriving them through the first-order and second-order derivatives of lateral positional data, respectively. According to Figure 2, comparing speed and acceleration before and after wavelet denoising (taking vehicle no. 1,234 as an example), the two types of outliers in the original data have essentially disappeared after denoising. There are no large data fluctuations, and the curves have become smoother and more continuous, demonstrating better data continuity and more closely reflecting the actual speed and acceleration changes.

Comparison of speed and acceleration before and after denoising. (a) Speed. (b) Acceleration.
Data Extraction
This study focuses on the single LC behavior of small vehicles near the exit ramp area. The research on the LC process is concentrated on the LC intention phase and the LC execution phase. The study specifically includes the following four steps:
Step 1: Identification of different driving behavior vehicles
As shown in Figure 3, by using vehicle ID and lane ID, the changes in lane ID for vehicles in adjacent time frames can be identified to determine whether a vehicle has performed a LC. The time point and type of successful LC, including left lane change (LCL), right lane change (LCR), and LK, are marked. On the I-80 section, which includes seven lanes from inside to outside, the lane ID decreases for LCL and increases for LCR.

Illustration of LC scenarios.
Step 2: Extraction of LC vehicle trajectory data
After identifying the LC time points, the start and end points of the LC are determined by searching before and after the LC time point. Using 0.1 s as the time step, the first point with continuous change in the horizontal coordinate is traced backward as the LC starting point, and the last point with continuous change in the horizontal coordinate is traced forward as the LC end point. The time window for extracting LC intention data starts 5 s before the LC starting point (defined as the moment when the lane ID changes) and continues until the lane change is completed. This window is designed to fully capture the driver’s decision-making process and intention dynamics both before and during the LC behavior.
Step 3: Extraction of LK vehicle trajectory data
Vehicles whose lane ID does not change throughout their journey are defined as LK vehicles. According to relevant literature, the time to complete a single LC on the freeway is approximately 8 s ( 42 ). To maintain consistent sample lengths, the trajectory data for vehicles that do not change lanes from the data collection start point to 8 s later are extracted as the LK vehicle trajectory data.
Step 4: Adjustment for sample imbalance
There is a sample imbalance, since LC behavior occurs less frequently than LK behavior. Comprehensive sampling is necessary to prevent this imbalance from affecting subsequent research. For majority-class under-sampling, we applied a random under-sampling approach while retaining representative samples to ensure that key behavioral patterns were not lost. For minority-class over-sampling, we used SMOTE (Synthetic Minority Over-sampling Technique), which generates new samples through interpolation in the feature space rather than simple duplication, thus preserving data diversity and consistency.
Feature Extraction
Extracting effective LC features is a prerequisite for LC intention recognition and risk prediction. Direct and specific feature parameters should be prioritized. Therefore, the LC features extracted in this study include the target vehicle, surrounding vehicles (the specific definitions are shown in Figure 4), and their interaction features. The specific feature variables are shown in Table 1, with unit conversions already performed.
Initial Extracted Feature Parameters
Note: LV-OL means the lead vehicle in the original line; FV-OL means the following vehicle in the original line; LV-TL means the lead vehicle in the target line; FV-TL means the following vehicle in the target line.

Definition of surrounding vehicles.
Risk Preference Quantification Module
Quantitative Methods of Risk Preference
LC behavior is a typical risk decision-making behavior. To better quantify drivers’ risk preferences, this study introduces CPT ( 43 ), combining the decision-making concepts of bounded rationality with the risk decision-making preferences under LC conditions.
CPT divides the decision-making process into two stages: editing and evaluation. It mainly includes the selection of reference points, the construction of value functions, and probability weighting functions, involving research concepts such as reference dependence, value judgment, loss aversion, and probability distortion ( 44 ). The specific process is shown in Figure 5.

Decision process under cumulative prospect theory.
CPT describes risk preferences in the decision-making process as a process of judging gains and losses, asserting that judgments of gains and losses are based on relative changes from a reference point rather than absolute changes ( 45 ). Based on this reference point, the value function is divided into gain and loss regions, presenting an S-shape with left concavity and right convexity. This vividly demonstrates the decision-makers’ risk-averse psychology in gain decisions and risk-seeking preference in loss decisions—a phenomenon known as “loss aversion” ( 46 )—and it exhibits diminishing sensitivity. The expression of the value function is given by:
where
In addition, there is an imbalance in the decision-making weight distribution of event probabilities. Low-probability events often receive higher weights, whereas high-probability events are underestimated. This phenomenon is known as probability distortion. Probability distortion is an essential manifestation of the decision-maker’s bounded rationality. Therefore, CPT sets a probability weighting function to reflect this phenomenon.
The expressions of the probability weighting function are given by:
where
where
The results mapped from a series of utilities
This allows the decision outcomes to be calculated for gains
Quantification of Risk Preference for LC Based on CPT
Editing Phase
CPT incorporates the decision-maker’s dependence on the reference point into decision analysis, providing a quantitative evaluation of the relative value of gains or losses. The mean, median, and mode of relevant decision characteristics are commonly referenced in past research. For LC behavior, the safe following distance and the speed relationship between the LC vehicle and surrounding vehicles are the main characteristic factors affecting the safety of LC behavior. Therefore, this paper uses SSD as the safe following distance for vehicles, and considering the impact of speed, it uses the average safe following distance and the average following speed of vehicles as dual reference points for value judgment in CPT. The calculation of SSD is given by:
where
CPT reflects the utility and preferences of drivers in different risk decisions by comparing the value function with the reference point value (
47
). To better quantify drivers’ LC risk preferences, this study considers the temporal nature of NGSIM trajectory data and proposes a method combining cumulative prospect and short-term cumulative prospect. The cumulative prospect value assesses the driver’s perception of overall driving risk and is used to quantify risk levels. In contrast, the short-term cumulative prospect value is calculated using a sliding time window to capture short-term risk perception, which can more accurately reflect the driver’s risk preferences at specific moments. The mean, 25th, 50th, and 75th percentiles of the reference point characteristics are used as evaluation metrics. The values of
The probability weighting function reflects the inaccuracy of drivers’ subjective judgments of gains or losses. This study uses the average safe following distance and average following speed as reference points. Since both exhibit synergy during the LC process, they share the same probability weighting function. The specific calculations are shown in Equations 3 and 4.
As the decision-making behavior of drivers is a processual behavior, the probability weighting dynamically changes as the process develops. Therefore, it is necessary to calculate the cumulative decision weights. The specific calculations are shown in Equations 5 and 6.
Evaluation Phase
For the parameters of the value function and the probability weighting function, Kahneman and Tversky (
48
) developed a method for assigning values to the CE paradigm based on the American driving population in their earlier research. They provided the parameter values as α = β = 0.88, λ = 2.25, and later further concluded γ = 0.61, δ = 0.69. Zeng (
49
) conducted the same CE paradigm experiment on Chinese subjects and found α = 1.21, β = 1.02, γ = 0.55, δ = 0.49, and
Based on the above parameter values, the decision-gain prospect
The above calculation process can produce the cumulative prospect value and short-term cumulative prospect value for different driving behaviors. These two values complement each other to describe the driver’s risk preferences. The cumulative prospect value reflects the overall risk preference characteristics of the driver, covering the entire behavior process. The short-term cumulative prospect value reflects the driver’s preference for immediate risk.
LC Intention Recognition Module
Feature Selection and Data Preprocessing
The selection of features for LC intention recognition is a crucial aspect of the intention recognition module. In this study, in addition to the basic characteristics of the target vehicle and the interaction features between the target vehicle and surrounding vehicles, the temporal nature of LC behavior and the heterogeneity of drivers’ short-term risk preferences are also considered to improve the accuracy of intention recognition. The specific feature selection is shown in Table 2.
Feature Selection for LC Intention Recognition
Note: TV = target vehicle; LV-OL = the lead vehicle in the original lane; LV-TL = the lead vehicle in the target lane; FV-TL = the following vehicle in the target lane; LK = lane keeping; LCL = left lane change; LCR = right lane change.
In addition, before building the model, the extracted LC sample data must be preprocessed to ensure its validity and accuracy. This study uses Python and related libraries to achieve this purpose. The preprocessing steps include:
Step 1: Filtering. Eliminate outliers in the time-series data by setting a window size and standard deviation threshold, replacing outliers with the median.
Step 2: Null value filling. Use forward and backward fill methods to perform linear interpolation for missing values in the data.
Step 3: Data cleaning. Remove samples that still contain null values after filling and duplicate samples.
Step 4: Label encoding and feature scaling. Perform one-hot encoding on the labels and apply min–max normalization to the features, scaling the feature values to the range of [−1, 1].
Step 5: Dataset splitting. Split the preprocessed data into training, validation, and test sets in a 6:2:2 ratio. Resample the samples using SMOTE.
LC Intention Recognition Based on Temporal Convolutional Network–LSTM–Self-Attention
To better capture drivers’ LC intentions from trajectory data, this study considers the time-series characteristics of trajectory data and constructs a deep learning model combining a temporal convolutional network (TCN) and LSTM for LC intention recognition. Additionally, self-attention is utilized to weigh the input features. The structure of the model is shown in Figure 6.

Lane-changing intention recognition model structure.
The TCN model layer includes structures such as causal convolutions, dilated convolutions, and residual connections, which can effectively handle time-series information and have good scalability. The LSTM model layer can process the long-term dependencies of trajectory data by filtering feature data through its gate structures. Combining these two models allows for the simultaneous learning of local and global time dependencies and passes the output to the attention layer. The self-attention layer assigns weights to each time step of the LSTM output, indicating its importance to the final output. The flatten and dropout layers adapt to changes in data shape and prevent overfitting. The dense layer serves as the output layer, using Softmax as the activation function to output the recognition results for each category. The model uses accuracy, precision, recall, and F1 score as evaluation metrics, and plots the confusion matrix and receiver operating characteristic curve. The specific steps are as follows:
Step 1: Using sliding time windows to process time-series data
Observe and segment sample data based on sliding time windows, considering the characteristic information at each moment before LC, and utilize the memory effect to improve the prediction accuracy of the LC model (Figure 7). Generally, the sliding time window usually ranges from 1 to 5 s ( 50 ). The model establishes four different time windows, continuously inputting time-series data within a sliding window of a fixed length and updating the prediction results at 0.1-s intervals. This approach enables the recognition of real-time LC intention and risk prediction.

Sliding time window (e.g., 1 s).
Step 2: Dataset splitting
Based on the train_test_split method, perform a time-series split on the dataset by setting the shuffle parameter to shuffle = false to ensure that the sequence of the time series is maintained. Split the input data into training, validation, and test sets in a 6:2:2 ratio. Additionally, set the stratify parameter to stratify = y, so that the data is split according to the label distribution, ensuring that the data quantity of each category remains relatively balanced.
Step 3: Hyperparameter optimization based on Bayesian optimization
Establish a probabilistic model using Bayesian methods to efficiently search the hyperparameter space with fewer experimental iterations, enhancing the performance of the neural network model in time-series tasks. The hyperparameter space definition includes the number of filters, convolution kernel size, dropout ratio, LSTM units, and learning rate, with preset value ranges for each hyperparameter. Use multi-class cross-entropy as the loss function and perform a hyperparameter search by minimizing the validation set loss. The Tree-Structured Parzen Estimator algorithm is used to try different hyperparameter combinations, and early stopping and model checkpoints are introduced to improve training efficiency and avoid overfitting.
Step 4: Intention recognition based on TCN–LSTM–self-attention
Based on the results of hyperparameter optimization, the model is trained using the optimal hyperparameter combination. Utilizing the TCN–LSTM–self-attention model structure, the input feature data ae processed to recognize the driver’s LC intentions efficiently, and the model outputs the recognition results.
Specifically, the model maintains causality through causal convolutions in the TCN layer and captures longer-time dependencies using dilated convolutions while avoiding increased computational complexity. Residual connections effectively address overfitting and gradient explosion issues. The LSTM layer controls data flow through its gating mechanism. In contrast, the self-attention mechanism computes the correlations between features at each time step in the time series, generating weighting coefficients to dynamically adjust the output, thereby capturing more important behavioral patterns. Finally, the output is processed through a Softmax activation function to classify LC intentions. For detailed information on the model structure, equations, and formulas, please refer to Appendix A.
Ablation Study and Feature Importance Analysis
In addition, this study conducted an ablation experiment to better understand the model composition and the impact of feature factors on intention recognition. By successively removing specific layers and then training and evaluating the model, the recognition effects of different model layer combinations were compared to assess the contribution of each layer. Then, by calculating the marginal contribution of each input feature to the model output, the machine learning model was interpreted from global and local perspectives to analyze the importance of different features.
Real-Time Prediction Module of LC Risk
LC Risk Quantification
Although traditional post-event analysis methods can describe risk outcomes, they are limited in providing proactive warnings. To enhance the proactivity and foresight of LC risk management, this study proposes a novel method for quantifying LC risk that incorporates drivers’ risk preferences. By identifying drivers’ short-term risk preferences and integrating them with LC intention recognition results, this method enables early intervention decisions before risks materialize, thereby significantly strengthening proactive prevention capabilities.
Building on the LCRI proposed by Park, this study introduces two evaluation indicators: real-time risk exposure level (RREL) and real-time risk severity level (RRSL). These indicators, combined with the heterogeneity of drivers’ risk preferences, allow for a multidimensional and dynamic quantification of LC risks. Specifically, RREL measures the real-time exposure level of LC risk accounting for individual risk preference heterogeneity, whereas RRSL quantifies the real-time severity of the risk during the LC process.
Finally, this study employs fault tree analysis to integrate the risks of various LC interaction events. It ultimately calculates the LCRI_CPT, which incorporates drivers’ risk preferences, as the core metric for LC risk quantification (see Figure 8). This method not only enables comprehensive quantification of LC risk but also offers real-time performance, interpretability, and operability, thereby providing a solid data foundation for intelligent driving safety interventions.

Fault tree analysis structure.
To use the risk quantification results for model prediction, it is necessary to classify the LCRI_RP calculation results into different levels. This study uses the K-means clustering method optimized by the silhouette coefficient to achieve this. The LCRI_RP values are sequentially divided into four levels—safe, low risk, medium risk, and high risk—as shown in Table 3.
Clustering and Threshold Division Results
Real-Time LC Risk Prediction Based on LightGBM with Intent Recognition
LightGBM is an optimized version of the GBDT algorithm ( 51 ), featuring higher training efficiency and better model generalization capabilities while addressing the memory issues GBDT faces when handling large-scale data. Its main advantages include:
(1) Histogram algorithm: Discretizes feature values to reduce memory consumption and computational complexity, speeding up data processing.
(2) Leaf-wise growth strategy: Splits based on maximum split gain to reduce unnecessary calculations and improve model performance, with depth constraints to prevent overfitting.
(3) Direct support for categorical features: Allows direct input of categorical data without extra processing, saving computational resources and time and improving speed and accuracy.
Therefore, this study constructs a LightGBM-based real-time LC risk prediction model using Python and related libraries. The LC intention recognition serves as the trigger point for risk prediction, effectively combining LC intention recognition and LC risk prediction. The specific steps are as follows:
Step 1: Feature selection for real-time prediction of LC risk
LC risk is mainly influenced by the interaction between the target vehicle (TV) and surrounding vehicles, as well as the driver. Unlike LC intention recognition, LC risk prediction needs to consider the uncertainty risk of interaction with the following vehicle in the original lane (FV-OL) and TV ( 52 ). Therefore, feature selection should include interaction features with FV-OL and TV and those used for LC intention recognition, and the classified risk levels should be used as prediction labels.
Step 2: Data processing and model optimization based on temporal cross-validation
Since real-time LC risk prediction is based on LC intention recognition, the preprocessing steps for time-series data are handled during the intention recognition stage. Temporal cross-validation should be used to re-split the training, validation, and test sets chronologically to avoid data leakage. Temporal cross-validation should also be used to validate the model at different time windows, assess its stability and performance, enhance model generalization, and determine the optimal model parameters. In general, when there is a sufficient amount of data, fivefold cross-validation usually yields better results. Therefore, under the premise of splitting the dataset into a training set, validation set, and test set in a 6:2:2 ratio, this study uses the TimeSeriesSplit function from the scikit-learn library to perform cross-validation, setting the parameter “tscv” to 5.
Step 3: Real-time risk prediction of different LC behaviors
Near the exit ramp areas, the risks associated with different LC behaviors vary. Therefore, this study conducts real-time risk prediction for both LCL and LCR behaviors. Using the optimal model obtained from temporal cross-validation, the real-time LC risk level for each sample is predicted with a 0.1-s time step after LC intention is recognized, obtaining the probability distribution of each sample belonging to different risk levels. The risk level with the highest probability is chosen as the final prediction result. Accuracy, precision, recall, and F1 score are used as evaluation metrics, and a confusion matrix is plotted.
Step 4: Analysis of influencing factors for real-time risk prediction of different LC behaviors
The LightGBM model’s plot_importance function visualizes the importance of features for real-time risk prediction of different LC behaviors, intuitively showing the impact of each feature on LC safety and risk warning.
Results and Discussion
LC Intention Recognition and Feature Importance Analysis Results
Optimal Time Window Recognition Results Analysis
This study thoroughly examined the model’s recognition performance under various time window lengths ranging from 1 to 4 s to determine the optimal time window length for recognizing LC intentions. The results, as depicted in Table 4 and Figures 9 and 10, testify to the model’s robustness. It recognizes drivers’ LC intentions across different time windows, proving its effectiveness. Mainly, with a 2-s time window, the model’s overall evaluation and all evaluation metrics are optimal, reaching 95%. This finding aligns with the conclusions of previous studies ( 53 ), further reinforcing the model’s effectiveness. However, it is important to note that after 2 s, the model’s performance gradually declines as the time window increases. This adaptability indicates that shorter time-series lengths can reduce data interference and improve recognition accuracy. This understanding can help optimize the model’s performance in real-world applications. For behavior classification, the model excels in recognizing LK behavior, with evaluation metrics highest across all time windows, exceeding 95% at 2 s and more than 88% at 4 s. The recognition performance for LCR is slightly better than for LCL, indicating that LCR has more distinctive LC features. This could be because LK behavior, which involves maintaining the current lane, is more predictable and less influenced by external factors compared with LCL and LCR behaviors ( 54 ).
Recognition Results of the Model Under Different Time Windows
Note: LK = lane keeping; LCL = left lane change; LCR = right lane change.

Different time windows under intention recognition confusion matrix results. (a) Time window 1 s. (b) Time window 2 s. (c) Time window 3 s. (d) Time window 4 s.

Receiver operating characteristic curve results of intention recognition under different time windows. (a) Time window 1 s. (b) Time window 2 s. (c) Time window 3 s. (d) Time window 4 s.
Advance Identification of LC Intention
After determining the optimal time window for the model, the performance differences in LC intention recognition at various times before the LC point can be analyzed. This evaluation assesses the model’s ability to recognize LC intentions in advance. Table 5 shows that as the time before the LC point increases, the accuracy, precision, recall, and F1 score of the model’s recognition generally show a declining trend. This indicates that the difficulty of recognizing LC intentions in advance gradually increases. However, within 1–2 s before the LC operation, the performance of various metrics is relatively good. This may indicate that the model can meet the accuracy and timeliness requirements for LC intention recognition 1–2 s before the operation ( 33 ). This will provide an effective decision-making buffer for advanced driver assistance systems or autonomous vehicles, helping to issue early warnings or make strategic adjustments in advance. As the advance time increases, accuracy gradually decreases, especially after reaching 4 s in advance, where the precision of all metrics drops to around 50%. This may be because as the advance time increases, the driver has not yet prepared for the upcoming driving behavior, or the driver’s intentions have changed because of environmental influences during this period, leading to increased recognition difficulty and decreased accuracy. Among different LC behaviors, LK shows the best recognition performance.
Evaluation of LC Intention Early Recognition Performance Under the Optimal Time Window
Note: LC = lane change; LK = lane keeping; LCL = left lane change; LCR = right lane change.
Model Layer Contribution and Feature Importance Analysis
The results of the ablation experiment shown in Table 6 highlight the model’s robustness. Among the TCN, LSTM, and attention layers, the LSTM layer emerges as the most significant contribution, followed by the TCN layer, and the attention layer has the lowest contribution. This underscores the model’s ability to handle time dependency in time-series data, a crucial factor in recognizing LC intentions. The LSTM, with its effective learning of long-term sequence mappings, plays a pivotal role in this process ( 55 ).
Contribution of Different Model Layers to LC Intention Recognition
Note: LC = lane change; TCN = temporal convolutional network; LSTM = long short-term memory.
The feature importance results in Figure 11 highlight the significance of the influencing factors for LK intention recognition. The vehicle’s current lateral movement state, LK risk preference, and relative distance and speed to the lead vehicle in the original lane (LV-OL) are identified as vital factors. This underscores the importance of the conditions of the original lane and the vehicle’s state for LK. For LCL intention recognition, the risk preference for LCL, the relative speed between the following vehicle in the target lane (FV-TL) and TV, the TV acceleration, and the relative speed between the LV-OL and TV are considered critical influencing factors ( 56 ). These factors directly relate to whether the driver will change lanes to the left, highlighting the environmental features of LCL behavior and the vehicle’s dynamic characteristics. For LCR intention recognition, the risk preference for LCR, the TV lateral acceleration, and the relative speed between the FV-TL and TV are deemed important influencing factors. The LCR risk preference directly reflects the potential risk level of the driver executing a LCR maneuver. The change in the TV lateral acceleration indicates whether the vehicle tends to initiate a rightward movement. The relative speed between the FV-TL and TV is directly related to the safety and feasibility of the LC.

Feature importance for LK, LCL, and LCR intention recognition. (a) LK. (b) LCL. (c) LCR.
Comparison of Model Recognition Performance
To verify the recognition performance of the TCN–LSTM–self-attention LC intention recognition model based on short-term risk preferences constructed in this study, a comparative experiment was conducted with and without the influence of short-term risk preference factors. As shown in Table 7, the recognition performance of the TCN–LSTM–self-attention model that considers short-term risk preferences is superior to the model that does not consider these factors, with an improvement of 5%–6% in recognition accuracy. This indicates that short-term risk preference factors indeed affect drivers’ LC intentions.
Comparison of Model Recognition Effectiveness
Note: LK = lane keeping; LCL = left lane change; LCR = right lane change.
Real-Time Prediction of LC Risk and Analysis of Influencing Factors
Risk Quantification Result
The risk quantification results in Figure 12 not only reveal the distribution of risk levels but also highlight the potential safety hazard near the exit ramp area. The high proportion of low-risk situations, exceeding 40%, indicates specific safety hazards. The relatively high proportion of medium-risk situations, which can easily transition to either low or high risk, indirectly or directly affects vehicle operational efficiency and traffic safety. High-risk situations are the direct cause of frequent accidents. Therefore, the results underscore the need for special attention to preventing and warning of medium and high risks near the exit ramp areas.

Proportion of different risk levels.
Figure 13 shows that for different LC behaviors, the risk level proportions for both LCL and LCR are generally similar, with a low risk being predominant. The difference is that the proportion of medium and high risks for LCL is higher than that for LCR. This may be because, near the exit ramp areas, LC behavior is frequent, and the risks associated with LCR are often caused by drivers leaving the freeway. In contrast, the higher risk for LCL indicates that changing lanes from the slow to the fast lane results in greater speed differences, leading to higher LC risks ( 57 ).

Stacked bar chart of the risk level proportions for different LC types.
Real-Time Risk Prediction Results for Different LC Behaviors
The real-time risk prediction results in Table 8 and Figure 14 for LCL and LCR behaviors show that the overall model prediction accuracy exceeds 90%. The high-risk category prediction performs exceptionally well, with accuracy, precision, recall, F1 score, and confusion matrix results close to 97%. This success in predicting high-risk behaviors, which often have more distinct features and patterns, such as sudden changes in speed or lateral position, should provide reassurance about the model’s capabilities to identify and predict.
Real-Time LC Risk Prediction Results for Different LC Behaviors
Note: LC = lane change; LCL = left lane change; LCR = right lane change.

Confusion matrix results for real-time risk prediction of different lane-changing behaviors. (a) Left lane change. (b) Right lane change.
The real-time risk predictions for different LC behaviors exhibit specific differences, with LCR predictions performing better than LCL, particularly in medium- to high-risk states. This difference may stem from the differing purposes and characteristics of the two LC behaviors. LCR often involves leaving the freeway, requiring the driver to change lanes frequently from the fast lane to the slow lane at high speeds, which is riskier and has more distinct features, making it easier to identify. LCL typically does not involve exiting the freeway, offers a broader field of view, has relatively lower risk, has less distinct features, and is more challenging to predict. The confusion matrix results show that the accurate actual positive rates for both LCL and LCR are high, with no instances of predicting safe levels as high risk or vice versa. This high accuracy provides reassurance about the model’s performance. However, the prediction accuracy between low and medium risk is lower, with some misclassification, likely because of the similarity or overlap of behavior characteristics and the imbalance in sample proportions with a higher proportion of low-risk samples.
Overall, the model successfully demonstrates strong performance in predicting driver LC risks, particularly in identifying high-risk scenarios. Given the differences between LCL and LCR, future research is clearly needed to further investigate the challenges of predicting LCL risks. This ongoing development is crucial to improving prediction accuracy and comprehensiveness.
Comparison of Model Real-Time Prediction Results
To validate the model’s effectiveness in this study, it was compared with three other standard models, including XGBoost, GBDT, and SVM. Figure 15 clearly shows that LightGBM stands out with exceptional performance among the four models. It achieves the highest evaluation results for nearly all levels, with its performance for the high-risk level being particularly outstanding. The accuracy, precision, recall, and F1 score all exceed 96%, validating the model’s effectiveness.

Comparison of real-time LC risk prediction results for different models. (a) Accuracy result. (b) Precision result. (c) Recall result. (d) F1 score result.
Analysis of Influencing Factors of Real-Time Risk Prediction for Different LC Behaviors
The analysis results of influencing factors for real-time risk prediction of different LC behaviors, as depicted in Figure 16, underscore the pivotal role of TV speed in LCL real-time risk prediction ( 58 ). Among the interaction features, those with vehicles in the target lane have a higher impact on LCL risk than those in the original lane, and the importance scores of longitudinal interaction features are higher than those of lateral interaction features. The relative speed between the lead vehicle in the target lane (LV-TL) and TV is the most important feature, reflecting whether the driver has enough free driving space after LC. The longitudinal distance between the FV-TL and TV is related to the risk of rear-end collision. Among driver heterogeneity characteristics, the risk preference for LCL is the most critical feature, mainly affecting medium- and high-risk level predictions, indicating that the driver’s risk preference plays a crucial role in LC decisions and risk assessment.

Analysis of influencing factors for real-time risk prediction of LCL and LCR. (a) LCL. (b) LCR.
For LCR real-time risk prediction, the TV lateral speed is an important feature ( 40 ). Its impact on risk prediction is significant, drawing our attention to its key role in the process. Interaction features with the target lane significantly affect LCR risk prediction, especially if the influence of the FV-TL is greater than LV-TL. This reflects that LCR usually involves the purpose of exiting the freeway. Among driver heterogeneity characteristics, the risk preference for LCR is the most important feature, indicating that the driver’s risk preference directly affects LC risk when making an LCR near the exit ramp areas.
Additionally, it should be noted that although existing studies have indicated that the SSD as a vehicle safety following distance is not significant in car-following and low-speed (LV) operations, this paper still introduces SSD as the reference point in the CPT framework to quantify the differences in drivers’ risk preferences when engaging in driving behaviors such as LC or car following. We argue that drivers’ subjective acceptance and psychological expectations toward SSD exhibit significant individual variability, which reflects the heterogeneity of individual risk preferences that this study aims to capture. Therefore, our model allows for inconsistencies or deviations between driver behavior and the SSD, either across different drivers or within the same driver under different scenarios, which aligns with the diversity observed in real-world driving behavior. Moreover, in the subsequent risk prediction framework, SSD is not used as a standalone risk assessment indicator. Instead, it is integrated with other key metrics, such as the LCRI, RREL, and RRSL, to form a multidimensional LC safety evaluation system. This comprehensive framework enables a more holistic assessment of driving risk, thereby enhancing the model’s generalization capability and practical applicability across diverse driving behaviors.
Conclusions
Based on the NGSIM dataset, this study introduces a novel approach that considers the impact of driver risk preference heterogeneity. It proposes a pioneering research framework that integrates a risk preference quantification module. This LC intention recognition module combines TCN–LSTM–self-attention and a LightGBM real-time LC risk prediction module. Based on this study, the following comments are offered:
CPT can intuitively describe the driver’s risk preference characteristics throughout the driving process and in the short term. It transforms the previous qualitative methods of describing driver risk preference heterogeneity into quantitative analysis, providing new research elements for driver LC intention and risk assessment.
Considering drivers’ short-term risk preferences can improve model performance. Ablation test results show that the LSTM layer contributes the most to the model’s recognition of time dependence in time-series data. The model achieves optimal recognition results with a time window of 2 s, with all indicators reaching 95%. The feature importance analysis shows that for LK behavior, the most critical factor is the vehicle’s lateral motion state; for LCL behavior, the most vital factor is the driver’s risk preference for LCL; and for LCR behavior, the interaction features between the vehicle and the target lane are significant influencing factors.
Considering drivers’ risk preferences, the risk quantification method can more accurately quantify LC risks. The LightGBM real-time risk prediction based on LC intention recognition outperforms models such as XGBoost, GBDT, and SVM, with all indicators above 93%, and it is more sensitive to high-risk LC behaviors, with a prediction accuracy of 97%. Prediction results for different LC behaviors show that the prediction performance for LCR is slightly higher than for LCL, indicating that LCR behavior near the exit ramp areas has more significant risk characteristics. The feature importance analysis shows that drivers must pay more attention to controlling their speed for LCL. For LCR, the driver’s risk preference for LCR is the most essential feature.
The main contribution of this study lies in proposing a framework that includes driver risk preferences for LC intention recognition and conducts real-time LC risk prediction based on intention recognition. This not only effectively combines LC intention recognition and real-time risk prediction, improving the efficiency of risk prediction, but also provides insights for vehicle driving safety warning research in complex environments and promotes the development of autonomous and assisted driving systems. Additionally, the results of the importance analysis of intention recognition and LC risk influencing factors can provide theoretical references for driver education and traffic managers. They can provide autonomous driving systems with the ability to simulate human driving behavior styles, especially in mixed traffic or transitional stages of autonomous driving (such as level 3 automation), where the autonomous system needs to predict the potential behaviors of human drivers to make safer decisions. Therefore, incorporating human risk preferences into intention recognition models can enhance the autonomous driving system’s understanding and adaptability to the behavior of surrounding human-driven vehicles. This study can support human–machine collaborative driving and optimization of game strategies.
This study has some limitations, providing exciting opportunities for future research. First, the NGSIM data are mainly vehicle trajectory data extracted through video image technology, concentrated in the afternoon to evening periods. LC behavior is influenced by various factors such as road environment, road geometry, and weather conditions, and these influences vary at different times of the day. Future research should consider a more comprehensive dataset. Second, considering the complexity of traffic flow near the exit ramp area in real-world scenarios, it is not easy to collect real-time data on drivers’ LC behavior in this region. Therefore, this study has not yet used empirical data for external validation of the model. In the future, we plan to collaborate closely with relevant departments to collect real-world data to further validate the model’s effectiveness. Finally, more artificial intelligence algorithms should be employed to improve the prediction accuracy.
Supplemental Material
sj-docx-1-trr-10.1177_03611981251384960 – Supplemental material for Real-Time Risk Lane-Change Intention Recognition Prediction at Freeway Exit Ramps
Supplemental material, sj-docx-1-trr-10.1177_03611981251384960 for Real-Time Risk Lane-Change Intention Recognition Prediction at Freeway Exit Ramps by Yanqun Yang, Xinli Wu, Wei Lin, Yongjian Huang, Said M. Easa and Hosam Abdelgawad in Transportation Research Record
Footnotes
Acknowledgements
We want to express our gratitude to the editor, reviewers, and all those who have supported us in this research. We also thank Dr. Ibrahim El-Dimeery for his valuable comments and guidance during the manuscript revision.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: Xinli Wu, Wei Lin, Yongjian Huang, Said M. Easa; supervision: Yanqun Yang, Wei Lin, Said M. Easa; data curation: Yongjian Huang, Yanqun Yang, Xinli Wu; methodology: Xinli Wu, Wei Lin, Yongjian Huang; formal analysis: Xinli Wu, Yongjian Huang; draft manuscript preparation: Xinli Wu, Wei Lin, Yongjian Huang; reviewing and editing the manuscript: Wei Lin, Hosam Abdelgawad, Said M. Easa. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
