Abstract
In this paper, a novel algorithm is proposed for the motion planning and path following automated cars with the incorporation of a collision avoidance strategy. This approach is aligned with an optimal reinforcement learning (RL) coupled with a new risk assessment approach. For this purpose, a probabilistic function-based collision avoidance strategy is developed, and the proposed RL approach learns the probability distributions of the adjacent and leading vehicles. Subsequently, the nonlinear model predictive control (NMPC) algorithm approximates the optimal steering input and the required yaw moment to follow the safest and shortest path through the optimal RL-based probabilistic risk function framework. Additionally, it is attempted to maintain the travel speed for the ego vehicle stable such that the ride comfort is also offered for the vehicle occupants. For this purpose, the steering system dynamics are also incorporated to provide a thorough understanding of the vehicle dynamics characteristic. Different driving scenarios are employed in the present paper to evaluate the proposed algorithm’s effectiveness.
Introduction
Rapid advancement in vehicular technologies, in addition to the practical implementation of adaptive equipment, has pushed the concept of automated cars forward.1–3 Powerful motivations to employ automated cars include safety of passengers, driving comfort, improved car performance, and efficiency in terms of time and infrastructure deployment.4,5 Statistics reveal that car drivers are the main culprit of almost 90% of casualties in road accidents that can be potentially avoided by employing automated cars.6,7 To accomplish this goal, automated cars should hold an adequate intelligence level to demonstrate effective decision-making and environmental awareness to handle severe traffic scenarios and hazardous road conditions. It is essential in case the driving procedure is wholly taken over by the car. Optimal path planning based on road conditions, potential obstacles, and traffic regulations are crucial for automated cars. Accordingly, emerging a framework to equip the cars with optimized route planning algorithms based on the potential road obstacles and available road space is still a dynamic research field.
Path-planning combined with the obstacle avoidance paradigm has been extensively investigated in the literature for non-holonomic robots before.8,9 However, path-planning based on obstacle avoidance should be performed by extra considerations such as the road regulations and free maneuvering space, the dynamic constraints related to the vehicle components and system states. Overall, the described factors turn the path- planning and following problem into a hugely challenging task. 10 It is also essential to recognize a strategy for path-planning in a real-time fashion because of the risk of obstacle emergence on the road. The recently employed path-planning techniques for automated cars include artificial potential field methods (APF), 10 random search methods, 11 and invariants of optimal control such as nonlinear model predictive control (NMPC).2,4 Huang et al. 12 employed an APF approach to designate several distinct potential functions for possible obstacles and road barriers. Moreover, the obstacle-free areas were meshed and utilized as safe driving zones. As a result, the driving path was planned spatiotemporally. APF approach holds the ability to designate various potential functions to complex obstacles and road barriers to set the desired path accordingly. However, APF approach does not necessarily encompass the optimal vehicle dynamic response to follow the desired path. Rasekhipour et al. 13 developed a combined model predictive and APF algorithm for planning the optimal path, while the objectives incorporated the obstacle-based potential functions together with the vehicle dynamics constraints.
The path-planning strategy typically accounts for any individual road barrier under the operating conditions such as vehicle-obstacle shortest space and whether the obstacle is visible for the approaching vehicle. Among the commonly employed strategies, single MPC is typically sluggish to deal with the two-dimensional collisionless maneuvering by incorporating an obstacle avoidance cost. Additionally, the optimal control problem analyzes the different types of obstacles in identical functions without the road regulations. A hybrid path-planning paradigm for automated cars under constrained environments was proposed by considering various constraints related to the path geometry, vehicle dynamics and holonomicity. 14 In this approach, a non-derivative-based global search algorithm was employed to derive the higher-order state information for state sampling. Yoon et al., 15 proposed a recursive-based path-planning algorithm running based on the reduction in the states of the search space and incorporating several factors such as the geometry of the road and the dynamics of the car. Additionally, the framework operates through two heuristics-based constraints-wise node expansion approaches that correct the future path according to the available geometry and cornering space. Similar convex optimization-based attempts were undertaken to develop a collision-free-based path-planning algorithm for automated cars by considering road geometry, vehicle states, and infrastructure constraints.16–18 Apart from the optimality for the path to take, it is also crucial to consider the entire surrounding risks of the ego vehicle and guarantee that the intended direction is reliable and reasonably safe. Therefore, other competent techniques are being reported to sort out the obstacle collision during maneuvering by assessing the risks of the vehicle approaching the surrounding obstacles.19–21 Kim et al. 19 developed an algorithm based on the potential risk assessment by realizing the possible collision risk with the driving situation obstacles and identifying the safest path to take. In addition, an integrated path planning and path-following strategy was proposed in Chen et al. 22 according to the velocity prediction for the leading vehicle by employing a composite nonlinear feedback controller design for the path following purpose and an input-output hidden Markov model (HMM) for the estimation of the velocity related to the leading car. In the area of the dynamic field, the invariants of curve shapes such as Bezier, 23 cubic, 24 and quintic polynomials 25 have manifested their effectiveness for generating smooth trajectories for the vehicle during the path planning process. Lattice-based path planning framework26,27 as well as multi-objective trajectory planning according to evolutionary algorithms have also been proposed28–30 as potential approaches. However, such a strategy mainly incorporates the kinematic constraints but lacks the collision risk and obstacle avoidance intentions, demonstrating limited effectiveness in practical applications. In Liang et al., 31 a local motion planning framework was conceived, connected with a cruise control algorithm using adaptive MPC. Another lateral MPC controller then performed the lateral path-tracking for the global path. In Hang et al., 32 a tube-based MPC method was utilized to control autonomous electric vehicles with active safety considerations at driving limits by controlling four-wheel steering (4WS) and direct yaw-moment control (DYC). Because sideslip angle, as a critical state, plays a significant role in achieving the desired tire forces for controlling the vehicle, Advancing the estimation sideslip angle was carried out by combining vehicle kinematics and dynamics states and fed to a fuzzy logic system in Xia et al. 33 Other hybridized local and global path-planning methods, such as the visibility graph method, have also been introduced to control autonomous vehicles, combined with NMPC control algorithms for path-tracking purposes. 34 Despite the particular merits of the reviewed approaches, path planning for situations where the existence of obstacles and barriers is unknown or no information exists on their dynamic states still serves as a challenge for optimal path planning problems.
The reviewed literature indicates that optimal path, and predictive models have been massively employed so far. However, these algorithms are liable to the lack of dynamic interaction with the surrounding environment to plan for the optimal path adaptively. Furthermore, the varying risk function growth interpreted in probabilistic functions suggests more excellent reliability and generality. Moreover, reinforcement learning-based risk assessment is broadly considered as a proper solution to address the collision-free path-planning of automated cars. In this paper, the path planning of automated cars is explored based on the following main contributions: (i) a novel optimal path-planner paradigm to avoid the obstacle collision is proposed by employing the optimal reinforcement learning algorithm combined with probabilistic risk assessment and (ii) potential risks of the car-obstacle(s) collision based on a growth function is considered to be uniformly distributed but unequal in magnitudes. Hence, the proposed algorithm comprises the merits of both unstructured-based obstacle avoidance control according to the nearest neighbor principle and the deep learning benefit of the RL-algorithm combined with the optimality search according to a nonlinear model predictive control (NMPC) paradigm. Computing the optimal path is organized based on the approaching obstacles, road structures, and the dynamic response of the automated car in terms of the constrained inputs and states.
The structure of the present paper is laid out as follows. In Section II, the dynamics of an automated car are formulated. Section III presents the path-planning problem, prospective obstacles, and constraints, and the probabilistic risk assessment. Section IV describes the optimal RL-algorithm. In Section V, the path planning paradigm is investigated based on numerous simulations under various operating situations, and the results are discussed in further detail. Finally, Section VI concludes the paper.
Problem formulation
The dynamic response of the vehicles closely depends on the directional forces and resulting moments generated by the pneumatic tires. However, merely the forces developed by the tires in the lateral direction leave a substantial effect on the handling performance analysis of the vehicles. In contrast, the longitudinal force components affect the handling dynamics infinitesimally. The longitudinal acceleration and the resultant force components of the car are dismissed. However, the vehicle’s longitudinal speed must be adequate for producing the lateral forces in proportion with the value of slip angles based on the well-known models for tires. Furthermore, the roll effect of the lateral weight transfer during the cornering is considered negligible due to the adequately adjusted suspension setting. Therefore, a bicycle model with two describing degrees of freedom is applied to describe the main dynamic modes of the vehicle in the yaw-plane of motion owing to the symmetricity between the right- and left-side tracks (Figure 1). The vehicle yaw stability implies that the yaw angle, the so-called heading angle, is essentially taken as the controlled parameter. Furthermore, maintaining the vehicle yaw velocity

Yaw-plane vehicle bicycle model due to the symmetricity between the right and left tracks.
where
where
where
The nonlinear cornering characteristics of tire may be captured in terms of the uncertainty about the nominal tire cornering stiffness as follows:
where
We also incorporated the steering system dynamics in the present study to thoroughly explore the vehicle response. The steering system dynamics can drastically affect the vehicle’s dynamic response and the capacity to follow the path formed by the developed algorithm. Considering the dynamics of steering system from (Figure 2), the produced moment about the kingpin due to the tire lateral force is obtained as:

EPS-based steering system model.
where
One should note that moment is an extraneous cause to the front wheel, and the attached steering system. Hence, the describing equations of motion for steering wheel account for the rotations transferred through the kingpin:
Since the rotational acceleration acts relative to the absolute space and dynamically changing steering system, the expression
The equations of motion associated with the vehicle’s lateral dynamics, in terms of vehicle lateral speed, yaw-rate, and steering system and the steering system, can be re-structured as follows:
Accordingly, the general state-space representation of the system dynamics is derivable as follows:
where
Furthermore, it can be stated that
Path-planning and probabilistic risk assessment
Path considerations and risk assessment
For the automated cars to follow the intended path, a modified model according to the Frenet–Serret differential geometry can be developed. Assuming
Since
Hereafter, the parameterized variable
Differentiating (20) with respect to the variable
Additionally, it is assumed that the curvature vector is
where
For any spline
where
where
where m and n are the indices related to the columns and rows of the matrix elements, respectively. Therefore, the probability function is expressed as:
where
Additionally, the following kinematic equalities can be described as 39 :
where

Sample collision risk probability function value variations in different orientations.
NMPC path-tracking
Herein, there are constraints that are placed on the vehicle side-slip angle together with the yaw-rate to guarantee cornering stability. The constraint related to the vehicle slip-angle is purposed to keep the vehicle away from the tire-road adhesion limits because the slip-angle massively relies on the road condition and varies under different adhesion characteristics. Accordingly, the constraints imposed on the yaw-rate and slip-angle states of the vehicle are explained as follows:
The vehicle’s total acceleration/deceleration performance is directly related to the tire-road adhesion characteristics. Such value is bounded by
Assuming that the vehicle longitudinal acceleration is negligible
From equation (35), it is axiomatic that the control input to put the vehicle on the preplanned path would be constrained for a specific operating condition. As a result, the steering angle and direct yaw moment control (DYC) and the corresponding variations constrained as follows:
where
where
Reinforcement learning algorithm
The conventional reinforcement learning (RL) models are mostly explained through an agent operator that dynamically interacting with the environment. Such an interaction is implemented by applying the action and perception system. Throughout each singular exchange between the agent and environment, the agent accommodates input
By taking into account the longer-term reward policy for the agent, the infinite horizon discounted model is applied. Besides, the subsequent rewards are topologically discounted on account of a discount factor ranged between 0 and 1
Based on the uniqueness and existence of the optimal result, the solution to the concurrent equations is determined in terms of a recursion expression 39 :
where
Moreover, the action-value function
Hence, the associated optimal solution
where
The stated modification is used to implement the RL-based predictive decision-making to avoid the obstacle collision according to the predictive model. Figure 4 illustrates the integrated algorithm for the path planning and path following strategies according to the provided discussions, and the NMPC control algorithm, the control commands in terms of the steering input and the DYC signal applied to the vehicle dynamics model.

The flowchart related to the integrated path planning and path following algorithm.
Results and discussion
In order to evaluate the performance of the proposed path-planning algorithm for automated cars, simulations are implemented during two different driving scenarios. These two scenarios can demonstrate the feasibility of the proposed methods under various operating conditions. The simulation parameters are summarized in Table 1. The simulation results for the proposed controller are implemented using numerous simulations. In the present study, a road with a single lane in each direction is employed for evaluating the proposed method without the loss of generality to be extendedly employed for other road conditions and driving environment.
Simulation parameters.
Scenario A
In the first scenario, the ego car is traveling at an average forward speed of 30 km/h while the two leading vehicles travel on the same lane holding the constant speed of 25 km/h. It is obvious that the ego vehicle is required to safely pass the leading vehicles. The space for the ego vehicle to pass the leading vehicle has to be sufficient, which is typically a function of the car traveling speed. Herein, the threshold is put at a low space to verify whether the car has the capacity to pass the leading vehicle and also to return to the main pass successfully. This maneuver simply mimics the double lane change maneuver. Herein, the ego vehicle is represented with the risk functions shown in Figure 5. The two leading vehicles are represented based on their collision risk functions in the global coordinate system. Additionally, the planned path for the ego vehicle to pass the two in-line leading vehicles can be seen to safely return the original lane without collision.

Sample collision risk probability function value variations in different orientations.
Because the target lane is clear after the first lane change, the vehicle is planned to can successfully complete the double lane change without changing the travel speed such that the left lane is kept free for the other vehicles attempting to pass. Additionally, it can be seen that after the critical passing from the leading vehicles, because the front lane is free, the vehicle has the opportunity to make the second lane change is a gradual and smoother manner. Figure 6 represents the vehicle trajectory in the plane of the motion and how the vehicle passes the leading vehicles as a function of the iteration numbers of the RL-agent and environment interaction. The plot encompasses both of the longitudinal and lateral based trajectory variations and how the ego vehicle (blue) can pass the leading vehicles (red) without any collision and considering other dynamics obstacles in the environment (green car). Figure 7 illustrates the vehicle responses in terms of the steering system input, the applied torque for the yaw generation for smooth cornering velocity. These parameters are mainly the control tuning inputs to the system which can be seen that are within the reasonable ranges for tires before saturation and prior to the tire starting to drop the lateral force generation. In response to the applied inputs to the ego vehicle to follow the planned path, the dynamic response of the car in terms of the lateral acceleration (g-acceleration), yaw-rate variations and vehicle heading angle change during the intended trajectory travel are presented in Figure 8. Furthermore, it is clear that based on the co-simulations of the model, the path-planning and following the proposed trajectory by the ego vehicle can be performed satisfactorily.

Vehicle trajectory in the plane of motion and the planned strategy for the vehicle passing the leading vehicles as a function of the iteration numbers of the RL-agent interaction with the environment.

Dynamic variations related to: (a) steering system input and (b) applied torque for the yaw generation of the ego car.

Dynamic responses of the ego car in terms of: (a) lateral acceleration, (b) yaw-rate variations, and (c) vehicle heading.
Scenario B
This scenario is considerably complex compared to the first scenario mainly because the ego vehicle is expected to perform two consecutive double-lane-change maneuvers. The leading vehicles are distributed within two lanes with various collision risk functions depending on the traveling speed. The relative positions and the collision risk functions related to the leading vehicles on the plane of the motion proposed can be seen in Figure 9. Furthermore, the planned trajectory for the ego vehicle based on the algorithm is also demonstrated in Figure 9. It can be seen that because the leading vehicles hold the constant speed lower than the ego vehicle and that the leading vehicles are distributed randomly, the optimal path to pass the entire vehicles safely without the collision risk is to perform two consecutive double-lane changes with different lengths depending on the collision risk function, road condition, and geometric understanding of the environment. It is also noted that the ego vehicle changes the lane to the left lane when it is clear and has sufficient space to accommodate the ego-vehicle with the constant speed. The vehicle keeps the constant speed to provide a smooth ride comfort for the passenger. It is also appreciated that the vehicle is intended to return to the original lane after any passing of the leading vehicles to keep the left lane free for other higher speed traveling cars. The measure of comfort for seated passengers inside vehicles, according to the ISO 2631-1:1997, is associated with the magnitude of exposure to the total magnitude of weighted accelerations in all directions. The root mean square (RMS) of the accelerations can be utilized to objectify the magnitude of the weighted accelerations:
where

Global Coordinate based path-planning based on the proposed algorithm based on: (a) collision-risk function and (b) on yaw-plane of motion.
Figure 10 represents the vehicle trajectory in the plane of the motion and how the vehicle passes the leading vehicles as a function of the iteration numbers of the RL-agent interaction with the environment. The plot encompasses the lateral based trajectory variations and how the ego vehicle (blue) can pass the leading vehicles (red) without any collision and considering other dynamics obstacles in the environment (green car). It is also noted that the agent-environment interaction number causes the self-tuning and deep learning of the ego vehicle to adapt to the driving environment and the road condition. Finally, Figure 11 explores the tracking performance of the target vehicle by employing the designed NMPC algorithm, subsequent to the planned path according to the risk-assessment based collision avoidance algorithm. It can be seen that the target car holds the capacity to follow the planned path during the entire simulation range although slight variations are observed which are followed by rapid stabilization. Furthermore, it is observed that the second double-lane-change maneuver is taken more consistently and smoothly compared to the first maneuver which can be attributed to the improved learning of the algorithm after iterative interactions of the agents with the environment. Additionally, the error variations of the tracking performance across the X-coordinate is presented in Figure 11 along with the standard deviation of the tracking error. According to the obtained results, the maximum and mean values of the tracking error are obtained at 0.11 and 0.01 m, respectively, indicating the effectiveness and reliability of the proposed integrated path planning and following algorithm.

Vehicle trajectory in the plane of motion and the planned strategy for the vehicle passing the leading vehicles as a function of the iteration numbers of the RL-agent interaction with environment.

(a) The planned path versus the actual path subsequent to applying NMPC in the global coordinate system and (b) tracking error variations alongside the traveling direction.
Conclusions
In this paper, a motion planning and path following algorithm was proposed by employing the optimal reinforcement learning (RL) coupled with a novel risk assessment approach to avoid the collision with the leading and adjacent vehicles and obstacles during the lane change and critical maneuvers. The proposed RL approach demonstrated to be capable of learning the collision risk based on the probability distributions of the adjacent and leading vehicles and identifying the safest and shortest paths during the lane changes. Additionally, it was achieved to maintain the travel speed for the ego vehicle unchanged such that the ride comfort is rendered for the vehicle occupants by minimizing the contribution of the weighted longitudinal acceleration, as explored in equation (45). For this purpose, the dynamics of the steering system was also incorporated to provide an understanding of how the steering system dynamics can potentially affect the vehicle response to the input variations. Different driving scenarios were employed in the present paper to verify the effectiveness and performance of the proposed algorithm.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
