A new approach to controlling an active suspension system based on reinforcement learning

Abstract

Active suspension provides better vehicle control and safety on the road with optimal driving comfort compared to passive suspension. Achieving this requires a good control system that can adapt to any environment. This article uses a deep reinforcement learning method to develop an optimal neural network that meets the comfort requirements according to ISO 2631-5 standards. The algorithm trains the agent without any prior knowledge of the environment. Various simulations were performed, and the results were validated with the literature and the standard until the appropriate reward function was found. Simple and consistent road profiles were used while maintaining constant system parameters during training. The results show that suspension based on deep reinforcement learning reduces vehicle body acceleration and improves ride comfort without sacrificing suspension deflection and dynamic tire loading. The controller expects the RMS value of the acceleration to be 0.228 with a minimum overrun of the suspended mass.

Keywords

Active suspension system vehicle stability vibration artificial neural network reinforcement learning

Introduction

Passenger safety and comfort, as well as the safety of the load carried by the vehicle, are some of the greatest concerns of researchers and manufacturers of heavy goods vehicles. This is evident from the amount of research published throughout this decade.¹

The shock absorber is the centerpiece that plays the role of a conductor in the suspension system, which tries to harmonize movements to maintain the maximum possible stability without sacrificing safety. There are three types of suspension systems: active, semi-active, and passive. Each of these systems offers benefits and drawbacks. For instance, the passive suspension offers appropriate performance in a restricted frequency range, while the semi-active mechanical system changes the coefficient by changing the viscosity of the shock absorber, making it efficient in a wide frequency band. However, the technique is limited, hence the need for a suspension system that has highly modular dynamic behavior, which is the active suspension. In this suspension system, we find a spring, a shock absorber, and an actuator that exerts an adaptive counter-force to meet the requirements of stability and safety.

The actuator control is the most delicate phase in an active suspension system, which requires a good servo system, such as commonly used controllers like PID, LQR, etc. Engineers are not strictly limited to using one of these types, and they are free to develop or merge control techniques to have a more optimal custom solution. According to studies, there is no perfect solution, but we can achieve good results. However, the complexity of systems increases, and added external influencing factors (like the behavior of the road) can cause the regulator to lose its performance.

Hence the refuge toward artificial intelligence, which makes it possible to predict the behavior of the shock absorber and react accordingly to the state, is a kind of imitation of the behaviors of living beings. For example, the reference Salem and Aly² showed that Fuzzy Logic, an approach used in AI (artificial intelligence), works better than PID in the daily model grounded on two types of road conditions.

Neural networks can be combined with several traditional controllers like Proportional integral-derivative (PID), Linear-quadratic regulator (LQR), etc., and the neural network aims to detect road roughness to improve the performance of these traditional controllers by varying their parameters according to road conditions.^3,4

However, the neural network is rarely used as a controller itself. Despite many motivating trials, such as the one that trained its neural network with an optimal classical controller, the results show that the performance of neural networks exceeds that of traditional controllers.⁵

Another method of reinforcement learning (or machine learning) has recently been used in various fields such as economics, games, aviation (drone control), and even the automotive field. This technique has gained momentum and great success in various fields, as shown in the results obtained from the studies carried out, such as in this article,⁶ which studied suspension control in vehicles and trains. The results are highly motivating, surpassing conventional controllers and even smart controllers like artificial neural network (ANN) and FUZZY logic. The main idea of reinforcement learning is to develop the suspension environment that interacts with the agent throughout the learning phase, the objective of which is to maximize the reward function to achieve the best neural network performance. The results obtained by the articles⁷ are optimal compared to the Linear Quadratic Gaussian (LQG), and show an improvement of 62% compared to the passive suspension.

This work is a continuation of the research carried out by Anis Hamza.

State of the art

Influence of the suspension on the human body

When driving a vehicle on the road, the wheels encounter a variety of obstacles with random and variable distributions, both spatially and temporally. This unevenness in the road can lead to vibrational movements. The intensity of these movements depends on the profile of the obstacle and the vehicle’s speed.

The wear of a suspension part can lead to the failure of the shock absorber. This can harm the handling, direction, or braking of a car and damage other parts of the vehicle. The effect that can be noticed is that the car begins to bounce, squat, or dive excessively. All these actions can make driving uncomfortable and dangerous, increase the difficulty of controlling the vehicle, and the risk of aquaplaning.

The solution for detecting damper malfunction is through specific diagnosis while exciting the damper and comparing the measured values with the predicted ones.⁸ With artificial intelligence, there is a new method of identifying and diagnosing shock absorbers. The principle consists of analyzing the squealing noise of the shock absorbers.⁹ This method of early fault detection can be a solution for the automotive industry.

Comfort is a physiological feeling of well-being associated with the properties of the driver’s environment in a moving vehicle.

When the whole body is subjected to prolonged vibrations, it has harmful effects on the organs of the human body such as lumbar pain, early degeneration of the spine, rapid heartbeat, pelvic osteoarthritis,¹⁰ visual disturbances, pain in the neck and shoulders, etc.^11,12 The driver’s sensitivity to vibrations inherent in the use of the vehicle depends on the frequency of road conditions.

As an example, studies have been conducted on the exposure of an agricultural tractor driver to suspension for different durations, which showed that 92% of the people studied suffered from health problems as a result of long periods of sitting in a vehicle.^13,14

According to ISO 2631-5,¹⁵ the standard methodology for assessing the exposure of individuals to vibration containing repeated shocks, the most dangerous vibrations for the human body are in the following frequency range [4…15 Hz]^16,17:

Between 4 and 8 Hz: the vibration of the whole body is significant.

Between 8 and 15 Hz: the vibrations are transmitted to the whole body through the spine.

Vibration can be rated according to ISO 2631-5, which measures weighted root-mean-square (RMS) acceleration, defined as follows¹⁵:

a_{w} = \sqrt{\frac{1}{T} \int_{0}^{T} a_{w}^{2} (t) dt} = RMS

(1)

RMS: Acceleration $(m / s^{2})$

T: Time (s)

$a_{w} (t)$ : The weighted $(m / s^{2})$

The ISO 2631 standard is devoted to the assessment of health risks and provides a guideline on comfort. The health alert diagram (Figure 1) shows the health alert zone (which is in red) is a health risk zone that is a function of the duration of exposure to vibration.¹⁸

Figure 1.

Health guidance caution zone (HGCZ) from (ISO 2631-5).

Suspension system study

The suspension is a component that connects the vehicle with the wheels, and it ensures relative movement between them. The device acts as a vibration insulator to protect the vehicle and provide ride comfort while maintaining tire contact with the road, so they have a grip for rolling.¹⁹ The suspension system generally consists of two main elements:

Springs: Support the vehicle’s weight and allow up and down movement to absorb road shock. Its mission is to transform kinetic energy into potential energy; or vice versa. There are several types of springs (pneumatic, coil, torsion bar, and leaf). There are several types of springs, such as multi-leaf springs, which have very limited travel of 50–75 mm with a high coefficient of friction, which dampens the oscillations of the suspension and lightens the task of the shock absorber.

Shock Absorbers: Control spring oscillations, helping to maintain vehicle control over bumps and corners. The damper is designed to dissipate the kinetic energy produced by the various modes of excitation. Such as single tube, twin-tube, compensating chamber, and gas chamber shock absorbers.²⁰ Automotive suspension vibration energy can be recovered using regenerative shock absorbers to convert vibration energy into electrical energy, effectively reducing vehicle fuel consumption. According to comparative studies, it has been found that dampers with regenerative behavior are more reliable than others, specifically the hydroelectric damper.²¹

Tire: Its primary function is grip, but it also plays a role comparable to the shock absorber by deforming. It is an essential component in controlling the behavior of a vehicle. It transmits the longitudinal forces necessary for acceleration and braking and the lateral forces for turning.

More precisely, the suspension of heavy trucks has seen a significant evolution toward high performance. Today, the demand for more efficient shock absorbers meets safety and comfort requirements.

Reducing driver fatigue by isolating the vehicle’s components, as well as its loading from vibrations excited by bad road conditions, requires a reduction in the coefficient of friction and better control of suspension oscillation. This requires working on a new performance standard for shock absorbers.

A new generation of suspensions with a reduced coefficient of friction, such as air suspensions or parabolic leaf suspensions, have a very low damping coefficient and significant vertical travel of up to 230 mm. These features optimize vehicle control and performance.

Coil springs, also called coil springs, have supplanted leaf suspensions in passenger vehicles, offering optimal performance adapted to the vehicle. They can be used in low-tonnage trucks. Pneumatic suspensions (bellows filled with a compressible fluid, the air is used for heavy vehicles) replace leaf springs. These air bellows can work at a constant volume or air mass (injection of air into the bellows by a compressor) or static charge^22,23 (Figure 2).

Figure 2.

Mechanical and pneumatic suspension: (a) the high coefficient of friction of these springs limits the suspension travel to approximately 50–75 mm and (b and c) the travel of these suspensions can go up to 230 mm.²⁴

We can classify the suspension into three different types:

Passive suspension: This type of suspension does not ensure vehicle stability.²⁵ The dynamic behavior of the system changes as a result of variations in spring stiffness and damping coefficient.^26,27

Semi-active suspension: The main idea behind semi-active control is to change the characteristics of energy dissipation devices in real-time, with minimal energy input. The principle of operation of a semi-active suspension system is to modify the damping coefficient, which requires only a reduced energy source. For example, Pierce showed that changing the piston orifice diameter is sufficient, or another type of semi-active rheological magnet (MR) damper that uses a magnetic fluid that interacts with the magnetic field produced by the magnetic coil to change the oil flow and break the piston movement.²⁸

Active suspension: An active suspension system is a passive shock absorber equipped with an actuator. The role of the actuator is to transmit a calculated force (using information collected from sensors attached to the vehicle) to suppress the vibrations of the vehicle, ensuring greater comfort and safety for the driver (shown in Figure 3). Unlike other passive and semi-active suspension systems, active suspension provides greater flexibility to react to unpredictable forces caused by road roughness and vehicle load, even while driving. In theory, all this control freedom provides better driving comfort and ideal wheel holding. However, to use this technology effectively in a real car, we need an intelligent system that can control it. Unfortunately, active suspension remains a complicated and expensive solution, which explains why it is only used in several high-end car models or truck ranges.^29–31

Figure 3.

The electromagnetic active suspension system.

Several studies have investigated the different types of vehicle suspension systems from the point of view of complexity, efficiency, maintenance, and lifespan. All this study is analyzed and summarized in Table 1.

Table 1.

Comparative study of suspension systems.³²

Parameters	Passive suspensions	Semi-active suspensions	Active suspension
Parameters	Passive suspensions	Semi-active suspensions	Hydraulic/pneumatic	Electro-magnetic
Structure	Very clear	Complicated	Very complicated	Light
Weight/volume	Very inferior	Low	High	Highest
Cost	Very inferior	Inferior	Highest	High
Ride comfort	Bad	Medium	Perfect	Best
Handling	Bad	Medium	Good	Perfect
Reliability	Highest	High	Medium	High
Dynamic	Passive	Passive	Medium	Good

Several studies have shown that the electromagnetic actuator is more efficient than the hydraulic actuator, given its simplicity of manufacture and dynamic behavior, despite the limitations in terms of structure and complexity of the hydraulic system. The high cost of manufacturing and system maintenance also poses problems in terms of efficiency.

The linear electromagnetic motor (Figure 3) may be the right choice as an actuator for an active suspension system. Studies have shown that the finite force density of electromagnetic systems can be as high as 663 kN/m³, compared to hydraulic and pneumatic systems. In addition, the ability to regenerate energy through the transfer of linear motion directly into electrical energy reduces overall electrical consumption. The linear suspension movement can also be a stored energy source, which leads to reduced overall consumption. The cylindrical shape and the absence of attraction force, and the active force generated in real-time, offer potential to the active suspension. All these advantages improve performance in terms of comfort, safety, and total control of the vehicle.³³

Active suspension sits at the top of the pyramid of suspension techniques. This adaptive suspension system is capable of adapting to various changes, such as vehicle loading or different phases of evolution like acceleration, braking, and turning. Sensors measure the inclination and acceleration of the wheels, as well as the anti-skid and steering wheel angle, among other parameters. All this information is analyzed by a computer, which controls the supply of the cylinder, enabling the system to compensate in real-time for the body’s movements. This technology can anticipate the roughness of the road and other potentially dangerous situations, explaining the manufacturers’ focus on this technology.

Active suspension control system

The actuator is a crucial component of an active suspension system. It acts as a regulator, applying a force between the sprung and unsprung mass, and ensuring the system’s dynamics according to state variables. This results in a better driving quality, with excellent vehicle maneuverability and improved wheel contact with the road.

The active suspension system requires sensors to measure physical parameters such as vertical displacement, speed, and acceleration. These measurements provide important information for the system’s operation, including vehicle comfort, suspension travel, and tire condition estimation since it’s not possible to measure tire compression directly.^34,35

The actuator requires a servo system to reach and maintain the desired setpoint value more quickly. The objective of this study is to decrease the frequency of the suspended mass, resulting in zero acceleration. Various algorithms and control techniques are available, which can be classified into three categories, as shown in Figure 4.

Linear controllers

Non-linear controllers

Controllers based on learning technique.

Figure 4.

Control techniques for a suspension system.³⁷

Several control algorithms include neural networks, fuzzy Logic, iterative control, vector control, scalar control, etc. However, all control techniques had advantages and limitations. Most control techniques have various problems such as:

Sensitivity to parameter variations and external disturbances.

Weak dynamic responses

Configuration complexity

Each control technique led to at least one problem. No control technique has been implemented to solve all these problems simultaneously while providing high precision control.³⁶ Below is a comparative table of control techniques according to their advantages and disadvantages (Table 2).

Table 2.

Evaluation of control techniques for an active suspension system.^38,39

Method	Advantages	Disadvantages
1. Linear controllers.
	Design implicit; Ease of implementation.	Accurate linear structural model; Full state feedback requirement; Mainly used to control linear structures.
LQR	Can deal with various data sources and results.	In some cases, it cannot control stability errors.
LQG	No full feedback requirement.
PID	Can overcome steady state error.	Does not resist stress and noise; unable to process input data and results at the same time.
$H_{\infty}$	Good performance when the system has multiple variables.	Design complexity; Requires a well-designed model.
2. Non-linear controllers.
	No specific structural model requirement; Ease of implementation; Good performance when the system has multiple variables.	Design complexity; Requires a well-designed model.
SMC	Less sensitivity to model perturbations and uncertainties.	System instability due to the effect of excessive noise.
MPC	Predicts the future behavior of states; Processes multiple inputs and outputs simultaneously; Can overcome noise and disturbances.	Slow tracking.
3. Learning based controller.
	No specific structural model requirement; Improved robustness and reliability; Improved control performance; Used to control linear/non-linear structures.	Complexity of design.
ANN	A good method of learning and adaptation in the case of distributed parallel.	Requires enough data for training; Poor system stability.
Fuzzy Logic	Offers an efficient solution to a complex model.	System control and analysis problem; long parameter adjustment; reliability issue; approximation error.

In this context, to find an optimal solution for any internal or external variation of a suspension system, the science of decision-making opens the horizon to reinforcement learning, which can optimize trajectories, plan movements, or establish routes dynamically.

The control algorithm gives a value to the power amplifier, and the amplifier translates this value into bidirectional electrical power to the electromagnetic motor. The system uses the compression force to harvest energy and store it. Thus, the amplifier functions as a generator and provides power to extend or contract the motor to ensure the vehicle’s comfort and safety.³⁴

Reinforcement learning

Deep learning, more precisely reinforcement learning (RL), attracts researchers for its ability to solve complex problems. In RL, agents can imitate the human learning process to achieve a designated goal, and they are trained on a mechanism of reward and punishment. The agent perceives the current state of the environment and performs actions for which it is rewarded for good moves and punished for bad ones. In doing so, the agent tries to minimize bad moves and maximize good ones.^40–42

Reinforcement learning uses several algorithms that have a single objective, and the agent must find the policy that maximizes the sum of the rewards over time. We can classify these learning algorithms into two classes, one of which is based on a model and the other without a model, or we can combine the two. The choice of the algorithm is specific to the objective of our problem. Figure 5 shows the various algorithms.

Figure 5.

Taxonomy model of deep reinforcement learning (DRL).⁴³

According to Poole and Mackworth in their book,⁴⁴ if one uses model-based learning, it is much more efficient from the point of view of experience and neural network maturity. However, in return for free learning (without a model), the agent will be confronted with new experiences, which include a certain inaccuracy and imprecision of the state. This will be an advantage throughout the learning phase in the improvement of the policy. Several approaches propose to combine these two techniques.⁴⁵

Another classification of deep reinforcement learning is based on the type of stimulation used to optimize the agent’s reaction. A positive model uses favorable stimulation of the system, while a negative model uses an undesirable stimulus to distract the agent from a specific action.

Our suspension system is based on the interaction between the agent and the environment. The realization of an optimal strategy requires precise parameterization and an environment based on mathematical formulas.⁴⁶ Deep reinforcement learning (DRL) application in active suspension control addresses several challenges, such as safety and ride comfort. Given the complexity of the suspension model (non-controllable internal and external parameters), model-free algorithms are a solution for the non-linearities of the suspension system. It allows one to actively learn in real-time without pre-learning to be functional in an unknown environment.

Choice of algorithm

The agent’s decision-making function (control strategy) represents a mapping of situations to actions. Given the importance of choosing an algorithm, performance studies of these DRL algorithms are carried out. For example, drone flight control studies include hovering, landing, random waypoints, and target tracking. All DRL algorithms have their pros and cons.

In a study by Mr. William Koch in 2019, learning algorithms such as Proximal Policy Optimization (PPO), Q-learning, Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG) were used to ensure stable and fluid autonomous navigation of drones. The study analyzed the angular velocity to reach a target velocity $Ω^{*}$ . The results were compared with a PID controller (shown in Figure 6⁴⁷).

Figure 6.

The best learning (RL) reinforcement agent response compared to PID. with a target angular velocity equal to $Ω^{*} = [2.20, - 8.14, - 1.81]$ rad/s.⁴⁷

According to a comparative study of the behavior of algorithms on the stability of drones, it was noted that the TRPO and DDPG algorithms have extreme oscillations. This leads us to discard these types of algorithms in our suspension system because they generated instability in both the roll and yaw axis of the drone, resulting in instability during flight. According to this study, the PPO algorithm is more precise, faster, and produces the smoothest navigation with the minimum error. This is why Song et al.⁴⁸ trained the drone using the PPO algorithm for AlphaPilot and Airsim drone racing tracks due to its excellent performance and simple implementation.^49–51 So to stabilize our suspension system, we will use the Proximal Policy Optimization (PPO) algorithm.

Numerical modeling

Researchers often use two different approaches to solve control problems, such as ANN⁵² and fuzzy logic.⁵³ ANN uses interconnected neurons that learn and adjust their behavior based on expected input and output, while fuzzy logic uses fuzzy sets and decision rules to provide outputs. The researchers chose ANN as a method for stabilizing a vehicle because it is better suited for modeling nonlinear systems like vehicle suspension. The dynamic characteristics of suspension are very complex and difficult to model with simple mathematical equations. ANN is capable of learning these characteristics and adjusting its behavior accordingly.

However, for our study, we opted for reinforcement learning instead of ANN because the latter requires a large amount of training data, which can be difficult to collect for real-world driving scenarios. Reinforcement learning is an unsupervised approach in which an agent interacts with its environment to learn how to make decisions by maximizing a reward. This method is more flexible and can adapt to unforeseen situations in the environment, making it more robust for stabilizing a vehicle. Additionally, reinforcement learning allows for directly optimizing the desired reward, in this case, the stability of the car, which can lead to better overall performance.

Our model consists of two suspended and unsuspended masses, respectively Ms and Mus, supported by two shock absorbers and two springs. This model is similar to the passive model but includes an actuator between the sprung and unsprung mass, as shown in Figure 7 below. The active damper generates forces under the demand of a control strategy. The simplicity of this model facilitates the analysis and optimization of the calculation.

Figure 7.

Active suspension model.

Mathematical formula of an active damper

In our model with two degrees of freedom (Figure 7), we use the general theorems of mechanics based on the fundamental principle of dynamics. The suspension system considers the vertical movement of the body $x_{s}$ and that of the wheel $x_{us}$ along the road presented by $x_{r}$ . We can present the dynamics of our suspension system by the following differential equations⁵⁴:

Sprung mass:

m_{s} {\overset{\cdot\cdot}{x}}_{s} = - b_{s} ({\overset{\cdot}{x}}_{s} - {\overset{\cdot}{x}}_{us}) - k_{s} (x_{s} - x_{us}) + U

(2)

Unsprung mass:

\begin{matrix} M_{us} {\overset{\cdot\cdot}{x}}_{us} = b_{s} ({\overset{\cdot}{x}}_{s} - {\overset{\cdot}{x}}_{us}) + k_{s} (x_{s} - X_{us}) \\ + b_{us} (\overset{\cdot}{W} - {\overset{\cdot}{x}}_{us}) + k_{us} (w - X_{us}) - F_{c} \end{matrix}

(3)

Equations (2) and (3) present the mathematical models of a vehicle suspension system knowing that:

$m_{s}$ : Sprung mass, which represents the mass of the vehicle body and is supported by the suspension.

$x_{s}$ : Displacement of the sprung mass.

${\overset{\cdot}{x}}_{s}$ : Velocity of the sprung mass.

$b_{s}$ : Damping coefficient of the suspension for the sprung mass.

$k_{s}$ : Spring constant of the suspension for the sprung mass.

$m_{us}$ : Unsprung mass, which represents the mass of the wheel and tire that are not supported by the suspension.

$x_{us}$ : Displacement of the unsprung mass.

${\overset{\cdot}{x}}_{us}$ : Velocity of the unsprung mass.

$b_{us}$ : Damping coefficient of the suspension for the unsprung mass.

$k_{us}$ : Spring constant of the suspension for the unsprung mass.

$w$ : Displacement of the wheel.

$\overset{\cdot}{w}$ : Velocity of the wheel.

$F_{c}$ : Force due to tire-road contact.

From (2) and ((3)), the following state space equations can be formulated:

\overset{\cdot}{x} = Ax + Bu + d

(4)

y = Cx + Du

(5)

\begin{matrix} A = [\begin{matrix} 0 & 1 & 0 & 0 \\ - \frac{k_{s}}{m_{s}} & - \frac{b_{s}}{m_{s}} & \frac{k_{s}}{m_{s}} & \frac{b_{s}}{m_{s}} \\ 0 & 0 & 0 & 1 \\ \frac{k_{s}}{m_{us}} & \frac{b_{s}}{m_{us}} & - \frac{k_{s} + k_{us}}{m_{us}} & - \frac{b_{s} + b_{us}}{m_{us}} \end{matrix}] \end{matrix}

(6)

\begin{matrix} B = [\begin{matrix} 0 \\ \frac{1}{m_{s}} \\ 0 \\ - \frac{1}{m_{us}} \end{matrix}] \end{matrix}

(7)

\begin{matrix} d = [\begin{matrix} 0 \\ 0 \\ 0 \\ \frac{b_{us}}{m_{us}} {\overset{\cdot}{x}}_{r} + \frac{k_{us}}{m_{us}} x_{r} \end{matrix}] \end{matrix}

(8)

x = [\begin{matrix} x_{s} \\ {\overset{\cdot}{x}}_{S} \\ x_{us} \\ {\overset{\cdot}{x}}_{us} \end{matrix}]

(9)

y = [\begin{matrix} x_{s} - x_{us} \\ {\overset{\cdot\cdot}{x}}_{s} \end{matrix}] u = F_{c}

(10)

\begin{matrix} C = [\begin{matrix} 1 & 0 & - 1 & 0 \\ - \frac{k_{s}}{m_{s}} & - \frac{b_{s}}{m_{s}} & \frac{k_{s}}{m_{s}} & \frac{b_{s}}{m_{s}} \end{matrix}], D = [\begin{matrix} 0 \\ \frac{1}{m_{s}} \end{matrix}] \end{matrix}

(11)

The equations (4) and (5) describes the state space representation of the vehicle’s suspension system, where:

$A$ : State matrix.

$B$ : Input matrix.

$C$ : Output matrix.

$D$ : Feedthrough matrix.

$u$ : Input variable representing the actuator force.

$y$ : Output variable representing the displacement and acceleration of the sprung mass.

$d$ : Disturbance variable due to changes in road profile.

The actuator plays the role of a regulator between these components (suspended mass and unsprung mass) while minimizing the acceleration of the suspended mass and eliminating the effect of wheel travel. Note: if we deactivate the actuator, we give it zero force, and we return to the behavior of a passive damper. The force of the actuator varies between $[- 8000 N . . 8000 N]$ .

Numerical simulation with reinforcement learning

Our contribution in this work is to combine the PPO algorithm with active suspension. The global process is as follows: the road information is generated according to ISO 8608 and fed into the suspension system model. Meanwhile, the control performance index of the current time $t$ is calculated and fed into the actor network of the PPO algorithm as a state value. Then, the corresponding actor value is selected as an output according to the probability density function in the policy network. A series of trajectories $τ_{i} = s_{i}, a_{i}, r_{i}, s_{i + 1}$ is stored in the memory space by repeatedly interacting with the environment. The value network is updated by importance sampling, and the control strategy is continuously optimized according to the obtained reward value until the control performance is better and convergence is achieved. The optimal neural network is then saved so that it can be used in a real system. The flowchart of the active suspension control structure based on the PPO algorithm is shown in Figure 8.

Figure 8.

Reinforcement learning model.

During the reinforcement learning phase, a rectangular shape was used for the road to simplify the problem and reduce the complexity of the environment. This created a more controllable and reproducible environment for learning, making it easier to evaluate the model’s performance. However, this shape may not represent all real driving situations, and the model’s performance may be limited when facing unexpected or unknown situations. Nonetheless, our model will behave according to imposed rules and will be validated in advance using this learning method with a reproducible road profile. To increase reliability, we can diversify the learning environments so that the model can generalize to different situations encountered during learning.

r (x) = {\begin{matrix} 1 & if x \in [0, L] \\ 0 & other wise \end{matrix}

(12)

where $x$ represents the longitudinal position on the road, $L$ represents the total length of the road, and $r (x)$ represents the height of the road at position $x$ . The $r (x)$ function is defined as being equal to 1 if $x$ is between 0 and $L$ , and $0$ otherwise.

Active suspension control based on the proximal policy optimization algorithm

The PPO algorithm belongs to the policy gradient (PG) family. The basic idea is to update the policy to maximize the probability of actions that provide the greatest future reward. It does this by running algorithms in the environment and collecting state changes based on the agent’s actions. Collections of these interactions are called trajectories. Once one or more trajectories are captured, the algorithm examines each step, verifies whether the chosen action yields a positive or negative reward, and updates the policy. The environment represents the physical behavior of our suspension system, with its state representing the accelerations and displacements of the suspended and unsuspended mass, and its action being the value of the force that must be exerted.

Trajectories are sampled through step-by-step interaction with the environment. To perform a single step, an agent selects an action and passes it to the environment.

Agent Update: Policy update: The basis of the PG algorithm is the formula for updating the weights of a network (Formula 13).

The gradient is the positive or negative direction of the weights in which the policy change will make actions more likely in a given state.

θ_{t + 1} = θ_{t} + α A (a | s) \frac{\nabla π_{θ_{t}} (a | s)}{π_{θ_{t}} (a | s)}

(13)

$θ_{t + 1}$ : New weights of a network

$θ_{t}$ : Current weights

$α$ : Learning rate

$\nabla π_{θ_{t}} (a | s)$ : Gradient of the current network

$A (a | s)$ : Advantage function (adjustment of direction and amount of weight )

$π_{θ_{t}} (a | s)$ : Policy output, importance sampling (how much weight to give a particular update)

Visualization of the neural network update: We move to the visualization of the neural network update once the trajectory is completed (step 1). All values (log probabilities, values, and rewards) are recorded. After the end of the trajectory, the rewards and benefits are discounted (step 2) (where advantages = discounted return − expected return). In step 3, the loss of each step is calculated. Finally, in step 4, we calculate the average of all these losses and update it with gradient descent (see Figure 9).

Figure 9.

Neural network update diagram.

Algorithm PPO: OpenAI proposed PPO to solve the problem of the gradient policy’s learning rate convergence. If the step size is too large, the policy diverges, and if it is too small, the calculation time will be very long. PPO adds a factor, the probability ratio, to prevent large updates from occurring and makes the policy gradient less sensitive.^55,56

L^{CLIP} (θ) = {\hat{E}}_{t} [min (γ_{t} (θ) {\hat{A}}_{t}, clip (γ_{t} (θ), 1 - ε, 1 + ε) {\hat{A}}_{t})]

(14)

L: Loss function.

${\hat{E}}_{t}$ : Empirical average over a nite batch of samples.

${\hat{A}}_{t}$ : Advantage function, which is the difference between $Q^{π θ}$ and $V^{π θ}$

$ε$ : Constant, usually equals 0.2.

The first term inside the min is $L^{CPI}$ :

L^{CPI} (θ) = {\hat{E}}_{t} [\frac{π_{θ} (a_{t} | s_{t})}{π_{θ old} (a_{t} | s_{t})} {\hat{A}}_{t}] = {\hat{E}}_{t} [γ_{t} (θ) {\hat{A}}_{t}]

(15)

γ_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ old} (a_{t} | s_{t})}

(16)

Moreover, the second term $clip (γ_{t} (θ), 1 - ε, 1 + ε)$ , modifies the surrogate objective by clipping the probability ratio.

{\hat{A}}_{t}^{π_{θ}} (a_{t} | s_{t}) = Q^{π_{θ}} (a_{t} | s_{t}) - V^{π_{θ}} (a_{t} | s_{t})

(17)

Reward function

The reward function plays a crucial role throughout the learning process as it indicates the quality of the action undertaken by the agent after transitioning to the next state, with quality varying between positive or negative values. The reward is then transferred to the neural network via back-propagation, and the optimization method adjusts the neural network parameters to minimize errors. During the reinforcement learning phase, the agent receives a reward value from the environment.

Therefore, the choice of the reward function is a key element in the design of any control system, including active vehicle suspension systems. The reward function defines the goals of the control system and evaluates the system’s performance in terms of these goals. In this Table 3, we have compiled and compared the reward functions used in several studies to stabilize an active suspension system. We also evaluated the pros and cons of each reward feature in terms of the performance of the active suspension system. This comparison helped us to choose the most suitable reward function for achieving an effective and efficient active suspension system.

Table 3.

Benchmarking reward functions for performance optimization of an active suspension system.

Study	Reward function	Benefits	Disadvantages
Fares and Younes⁵⁴	$r_{t} = - k (\| x_{s} - x_{us} \|)$	Simple	Convergence difficulties due to small and close numerical values
Fares and Younes⁵⁴	$r_{t} = - k (\| {\overset{\cdot}{x}}_{s} \|)$	Worked better	Failed to eliminate the suspended mass steady state error
Fares and Younes⁵⁴	$r_{t} = - k_{1} ({\overset{\cdot}{x}}_{s})^{2} - k_{2} (\| u \|)$	Induce the actor to produce zero force when the speed of the suspended mass is zero	Do not respect the state of the road
Han and Liang⁵⁷	$\begin{matrix} z_{r} \to φ, \\ R = \sum_{t = 0}^{T} φ^{T} A φ, \\ A = [\begin{matrix} x \\ y \\ z \end{matrix}], \end{matrix}$ .	Dynamically adjusting the suspensions performance weight matrix $ϕ$ based on these conditions, all based on passive suspension performance measurements. Where A represents the weight matrix of each performance indicator, and the values of x, y, and z are dynamically adjusted according to the different routes	Using traditional policy over policy leads to low sampling efficiency and sampled data can only be used for one policy update
Li et al.⁵⁸	$r_{t} = {\begin{matrix} - c (t) - α \cdot E (t) \\ if a (t) is admissible \\ - G, \\ otherwise \end{matrix}$	Hit-and-run sampling technique for sampling safe actions to preserve exploration efficiency, provides good adaptability to new situation. $c (t)$ standard objective cost. $E (t)$ and the overhead measuring the amount of effort needed to keep the system within constraints, $α$ is a weight on this latter cost, $G$ is the big penalty assigned to an exploratory action and $a (t)$ exploratory action	Poor control of suspension stresses, especially in non-linear systems
Ming et al.⁵⁹	$r_{t} = - (k_{1} y_{1}^{2} + k_{2} y_{2}^{2} + k_{3} y_{3}^{2})$	The reward is applied to the control of a semi-active suspension system, providing better adaptability. Where, $y_{1}$ is the vehicle body acceleration, $y_{2}$ is the suspension dynamic deflection, $y_{3}$ is the vehicle body displacement; $k_{1}, k_{2}, k_{3}$ are the weights	Test that on a semi-active suspension, the weights used are not optimal

The reward of our suspension system is studied according to four objectives that must be expected simultaneously. The first is to minimize the acceleration of the suspended mass $(| {\overset{\cdot\cdot}{x}}_{s} |)$ . The second objective is to catch up with the position of the state of the road $(| x_{s} - x_{r} |)$ to prepare for the next situation, as well as the third objective is to ensure maximum grip between the wheel and the road while keeping the position of the unsprung mass with the road state $(| x_{us} - x_{r} |)$ . Ultimately optimize the use of force exerted on the actuator to minimize unsprung mass vibration $(| F_{c} |)$ .

Figure 10 shows the four objectives of the work. The more the values converge toward the origin (converges toward zero), the more we have good stability of the suspended mass and good wheel adhesion. Our contribution is to develop a formula that seeks a compromise between all these constraints. Several reward formulas are tested and studied in the result part. The following (formula 18) presents the general reward function.

R = - α | x_{s} - x_{r} | - β {(x_{us} - x_{r})}^{2} - γ {({\overset{\cdot\cdot}{x}}_{s})}^{2} - δ {(F_{c})}^{2}

(18)

Where:

$α, β, γ, δ$ : are variables to be optimized to find the right occurrence of neural network.

Figure 10.

Convergence of objective parameters throughout learning.

Result and discussion

In this article, we used Google Colab Pro to build a 1/4 active suspension system model, and the suspension parameters are shown in Table 4. The actor network and the critic network of the PPO algorithm were eight- and ten-layered neural networks, respectively. The learning rates of the actor network $α_{A}$ and the critic network $α_{B}$ were set to $5 e - 5$ and $1 e - 4$ , respectively. The discount factor $γ$ was set to $0.98$ . The clipping parameter $ε$ was set to $0.2$ , and the GAE $λ$ parameter was set to $0.9$ . The time steps were fixed at $1 e 7$ . The specific hyperparameters were defined as shown in Table 5.

Table 4.

Parameters for the quarter active suspension.

Settings
Symbol	Description	Values
$m_{s}$	Sprung mass	$300 kg$
$k_{s}$	Stiffness of the car body	$40, 000 N / m$
$m_{us}$	Unsprung mass	$30 kg$
$C$	Suspension dumping	$1385 N . S / m$
$k_{us}$	Stiffness of the tire	$22, 000 N / m$
$U$	Control force	$[- 8000 . . 8000] N$

Table 5.

Hyperparameters for the proximal policy optimization structure.

Settings
Symbol	Description	Values
$α_{A}$	Actor learning rate	$5 e - 5$
$α_{B}$	Critic learning rate	$1 e - 4$
$γ$	Discount factor	$0.98$
$ε$	Clip parameter	$0.2$
$λ$	GAE parameter	$0.9$
	Times steps	$1 e 7$

The results obtained from the intelligent controller with a reward function (equation (18)) after the training phase are compared with the passive suspension. The results show that we obtained a reduction of 53.24% and 35.60%, respectively, in acceleration and displacement compared to the passive suspension. The reduction results of the chosen reward function are shown in Table 6 and Figure 11.

Table 6.

Reduced overshoot values for stepped road entry.

Parameters	Passive	Active	Reduction $(\frac{p a s s i v e - a c t i v e}{p a s s i v e}) * 100 %$
Acceleration (suspended mass)	$7.97 (m / s^{2})$	$3.72 (m / s^{2})$	53.24%
Displacement (suspended mass)	$1.28 (cm)$	$0.82 (cm)$	35.60%
displacement (unsprung mass)	$1.19 (cm)$	Yes	–

Figure 11.

Sprung mass displacement on rectangular road profile.

Figure 12 presents the simulation results of the acceleration of the unsprung mass compared to the passive suspension. The proposed method significantly improves the acceleration of the vehicle body, leading to more stability.

Figure 12.

Sprung mass acceleration on rectangular road profile.

However, evaluating comfort solely by the acceleration limit method does not capture the behavior of the suspension throughout the journey. Therefore, according to the ISO 2631 standard,¹⁵ one can use the Root Mean Square (RMS) acceleration method, which calculates the average acceleration over a certain period of time.⁶⁰ The comfort reference values are shown in Table 7.

Table 7.

System settings.

Acceleration RMS value $(m / s^{2})$	Comfort reaction
$\leq 0.315$	Not uncomfortable
$0.315 ~ 0.63$	A little uncomfortable
$0.5 ~ 1$	Fairly uncomfortable
$0.8 ~ 1.6$	Uncomfortable
$1.25 ~ 2.5$	Very uncomfortable
$\geq 2$	Extremely uncomfortable

Several reward functions were tested during the neural network optimization phase. To evaluate the performance of each model, the hyperparameters for the proximal policy optimization algorithm were set as shown in Table 5. After each test, the root-mean-square (RMS) values were calculated and evaluated according to ISO 2631-5. The results obtained showed that the reward function played a guiding role throughout the learning phase. As described later, our objective was to work on several aspects, such as the stability of the suspended mass and the dynamics of wheel movement, to ensure better grip with the road.

According to the reward, formulas studied, in which we gradually introduced the constraints such as:

The distance between the road condition and the unsprung mass $(X_{us} - X_{r})$

The distance between the state of the road and the suspended mass $(X_{s} - X_{r})$

The acceleration of the suspended mass $({\overset{\cdot\cdot}{X}}_{s})$ .

The objective is to minimize acceleration while maintaining the wheel’s grip on the road and keeping up with the suspended mass.

To optimize the results, we adjusted the weights of each constraint so that the agent maximizes its reward, which generates stability of the suspension system with optimal values.

After several tests, we noticed that the agent controlled our suspension system according to the latter reward criteria. The system exhibited a uniform behavior of displacement of the unsprung mass in the logarithmic form with RMS, which can reach up to 0.180 $m / s^{2}$ . This prompted us to rectify the reward formula to follow a logarithmic form while incorporating the exponential function. The result is motivating and can be improved later, with an RMS of 0.228 $m / s^{2}$ and a minimum overrun of the sprung and unsprung mass.

The integration of the force exerted on the actuator U as a criterion in the reward formula aims to minimize the phenomenon of wheel deflection. The result reaches its target toward more stability of the wheel with an RMS of the order of 0.250 $m / s^{2}$ and an overrun of the suspension mass of 0.04 cm.

Several articles have used the reinforcement learning method, for example, that of Fares and Younes,⁵⁴ who used the critical actor algorithm, but the results obtained with the PPO algorithm are more efficient.

Comparison between the results:

Critical actor algorithm: profile of the road used during the learning phase (square signal of amplitude $0.02 m$ with a period of $1.5 s$ ), acceleration of the average suspended mass $4 m / s^{2}$

PPO algorithm used by our model: profile of the road used during the learning phase (square signal of amplitude $0.08 m$ of period $1.5 s$ ), acceleration of the suspended mass does not exceed $3.8 m / s^{2}$ .

It is clearly concluded that the PPO algorithm with the reward function⁴ listed in the Table 8 is more efficient than the critical actor algorithm.

Table 8.

Analysis of RMS values according to reward functions.

	Reward function	RMS ${\overset{\cdot\cdot}{X}}_{s}$	$\max (X_{s}) - \max (X_{r})$ $(cm)$	$\min (X_{s}) - \min (X_{r})$ $(cm)$
(1)	$- 10 \| X_{s} - X_{r} \| - 10 {\| X_{us} - X_{r} \|}^{2}$	1.173	$0.49 ↑$	$- 0.49 ↓$
(2)	$- 10 {\| X_{s} - X_{r} \|}^{2} - 10 {\| X_{us} - X_{r} \|}^{2}$	1.020	$0.15 ↑$	$- 0.22 ↓$
(3)	$- 10 {\| X_{s} - X_{r} \|}^{2} - 10 {\| X_{us} - X_{r} \|}^{2} - \| {\overset{\cdot\cdot}{X}}_{s} \|$	0.922	$0.10 ↑$	$- 0.20 ↓$
(4)	$- 10 {\| X_{s} - X_{r} \|}^{2} - 10 \| X_{us} - X_{r} \| - \| {\overset{\cdot\cdot}{X}}_{s} \|$	0.930	$0.05 ↑$	$- 0.06 ↓$
(5)	$- 10 {\| X_{s} - X_{r} \|}^{2} - 100 {\| X_{us} - X_{r} \|}^{2} - {\| {\overset{\cdot\cdot}{X}}_{s} \|}^{2}$	0.759	$0.13 ↑$	$- 0.09 ↓$
(6)	$- 10 {\| X_{s} - X_{r} \|}^{2} - 100 \| X_{us} - X_{r} \| - 100 {\| {\overset{\cdot\cdot}{X}}_{s} \|}^{2}$	0.180	$- 0.53 ↓$	$0.13 ↑$
(7)	$- 10 {\| X_{s} - X_{r} \|}^{2} - 100 \| X_{us} - X_{r} \| - 10 {\| {\overset{\cdot\cdot}{X}}_{s} \|}^{2}$	0.804	$0.01 ↑$	$- 0.01 ↓$
(8)	$- {\| X_{s} - X_{r} \|}^{2} - {\| X_{us} - X_{r} \|}^{2} - {\| {\overset{\cdot\cdot}{X}}_{s} \|}^{2}$	0.212	$- 0.01 ↓$	$0.12 ↑$
(9)	$- \| X_{s} - X_{r} \| - {\| X_{us} - X_{r} \|}^{2} - {\| {\overset{\cdot\cdot}{X}}_{s} \|}^{2}$	0.225	$0.02 ↑$	$- 0.01 ↓$
(10)	$- \| (X_{r} + (X_{s}^{t - 1} - X_{r}^{t}) e^{\frac{- 1.0}{0.5}}) - X_{x}^{t} \| - \| X_{us}^{t} - X_{r}^{t} \| - {\| {\overset{\cdot\cdot}{X}}_{s}^{t} \|}^{2}$	0.228	$0.06 ↑$	$- 0.03 ↑$
(11)	$- 10 \| (X_{r} + (X_{s}^{t - 1} - X_{r}^{t}) e^{\frac{- 1.0}{0.5}}) - X_{x}^{t} \| - {\| X_{us}^{t} - X_{r}^{t} \|}^{2} - {\| {\overset{\cdot\cdot}{X}}_{s}^{t} \|}^{2}$	0.483	$0.08 ↓$	$- 0.05 ↑$
(12)	$- 10 \| (X_{r} + (X_{s}^{t - 1} - X_{r}^{t}) e^{\frac{- 1.0}{\| X_{s}^{t} \|}}) - X_{x}^{t} \| - \| X_{us}^{t} - X_{r}^{t} \| - {\| {\overset{\cdot\cdot}{X}}_{s}^{t} \|}^{2}$	0.228	$0.08 ↑$	$- 0.01 ↓$
(13)	$- 10 \| X_{s} - X_{r} \| - 2 {\| X_{us} - X_{r} \|}^{2} - 10 {\| {\overset{\cdot\cdot}{X}}_{s} \|}^{2} - 0.00001 U^{2}$	0.250	$0.04 ↑$	$- 0.01 ↓$
(14)	Passive	1.172	$0.48 ↑$	$- 0.483 ↓$

Simulation of different road levels

To validate the control performance and robustness of the PPO-based active suspension system, road profiles were created in accordance with ISO 8608,^61,62 and the road characteristics were described using the special $G_{q} (n)$ index data. The road roughness can be classified into eight classes, as shown in Table 9. Figure 13 depicts the road power spectrum, which includes the upper and lower limits as well as the average values of the road spectrum for each road class. The formula for the spectral density of road power is given by $G_{q} (n)$ :

G_{q} (n) = G_{q} (n_{0}) {(\frac{n}{n_{0}})}^{- w}

(19)

Table 9.

Road classification according to ISO 8608 [1]with road unevenness coefficient $G_{d} (n_{0}) (10^{- 6} m^{3} / cycles)$ , $n_{0} = 0.1 cycles / m$ , and $w = 2$ .

Road level	Range	Geometric mean
A	<32	16
B	32–128	64
C	128–512	512
D	512–2048	1024
E	2048–8192	4096
F	8192–32,768	16,384
G	32,768–131,072	65,536
H	>131,072	262,144

Figure 13.

The various PSDs of different classes utilized to stimulate the model in this paper.⁶³

knowing that :

$G_{q} (n)$ : Road unevenness coefficient (the road power spectral density value at the reference spatial frequency).

$n$ : Spatial frequency (reciprocal of the wavelength).

$n_{0}$ : Reference spatial frequency (generally selected as $0.1 m^{- 1}$ ).

$ω$ : Frequency index.

Throughout the simulation, the same conditions apply. Specifically, for a vehicle speed of 20 m/s, the result is shown in the Figure 14.

Figure 14.

Movement of the suspended mass on different types of road (A, B, C, D, E, and F): (a) class A, (b) class B, (c) class C, (d) class D, (e) class E, and (f) class F.

Results of the acceleration of the suspended mass

After simulating the passive and active suspensions under class D road conditions, the results showed a significant decrease in acceleration, which could reach up to 62.5%, as depicted in Figure 15. This improvement effectively enhances ride comfort and stability. Moreover, the dynamic stability characteristics of the active suspension with delay have been improved. Therefore, we can conclude that the proposed reinforcement controller is capable of meeting the design and simulation requirements.

Figure 15.

Sprung mass acceleration on class D road profile.

To further support our findings, it would be valuable to conduct physical experiments to validate the results obtained from the simulations. Furthermore, future work can focus on testing the proposed controller on other road conditions to investigate its performance in different scenarios.

Evaluation

Several simulations were conducted to evaluate and compare the results obtained. The simulations were conducted in the OpenAI Gym environment, and the same set of hyperparameters were used throughout all phases of the simulations to ensure consistency. To conduct these simulations, we utilized the cloud simulator “Google Colab Pro,” which is based on Jupyter Notebook and offers a high-performance computing environment with 32 GB of RAM and either a Tesla T4 GPU or NVIDIA Tesla P100.

Figure 16 shows the convergence process of the algorithm, which appeared to stabilize at around 0.4e7 episodes. The simulation process lasted approximately 6 h and required more than 30 trials to find optimal convergence. Despite the long time it took to find optimal convergence, the results obtained from the simulations were reliable and justified the time and effort spent conducting them.

Figure 16.

Total reward during a number of time steps.

Jin et al.^64–66 used the Active Suspension of In-Wheel-Drive Electric Vehicles (IWMD-EV) solution. In order to achieve better driving comfort and reduce the force applied to the motor bearing in the wheel, a robust H $\infty$ dynamic output feedback controller^64,66 and $μ$ -Synthesis Methodology⁶⁵ have been derived so that the closed loop system has asymptotic stability and simultaneously satisfies stress performance such as road holding, suspension travel, dynamic load applied to bearings, and limitation of actuators. Finally, the simulation results demonstrated that the proposed controller offered better suspension performance despite the faults of the actuators and the time delay.

If we now compare our solution, the RL controller, with the work of Jin et al., we find that the results at the RMS level of the acceleration of the body are very close, approximately $0.1 m / s^{2}$ . Active suspension is a technology that adjusts the suspension system of a vehicle in real-time to optimize the ride quality, handling, and stability of the vehicle. In the context of in-wheel-drive electric vehicles (IWMD-EVs), active suspension can play an important role in improving the overall performance of the vehicle. There are two main approaches to implementing active suspension in IWMD-EVs: using a dedicated active suspension system or using a reinforcement learning controller.

A dedicated active suspension system typically consists of a set of sensors, actuators, and a control unit. The sensors measure the vehicle’s motion and the road conditions, while the actuators adjust the suspension components (such as dampers, springs, and anti-roll bars) to optimize the ride quality and handling. On the other hand, a reinforcement learning controller uses machine learning algorithms to learn the optimal suspension settings for a given driving scenario. The controller receives feedback from sensors on the vehicle’s motion and makes adjustments to the suspension settings to optimize the ride quality and handling.

Both approaches have their advantages and disadvantages. A dedicated active suspension system provides more precise control over the suspension settings and can respond more quickly to changes in driving conditions. However, it requires more complex hardware and software, which can increase the cost and weight of the vehicle. A reinforcement learning controller, on the other hand, can adapt to different driving scenarios and learn from experience, which can lead to better overall performance. However, it requires a significant amount of computing power and can be difficult to train and optimize.

In summary, both approaches have their strengths and weaknesses, and the choice between them depends on the specific requirements of the IWMD-EV and the preferences of the designer.

Analysis of the results and performance of our reinforcement learning-based controller for active suspension compared to recent works by Hamza and Ben Yahia¹ and Swethamarai and Lakshmi⁶⁷ proposing controllers such as Artificial Neural Network (ANN), Proportional, Integral, and Derivative (PID), Fractional Order PID (FOPID), and Adaptive Fuzzy tuned Fractional Order PID (AFFOPID) controllers, demonstrates the effectiveness and efficiency of our proposed method in this paper. The results are listed in Table 10.

Table 10.

Level of comfort of the different types of regulators.

Controller	Acceleration RMS (m/s²)	Comfort level

Very bad suspension behavior (a great risk to the health of the driver, and poor adhesion of the vehicle with the road)

The suspension control with PID minimizes the acceleration of the suspended mass, the results vary between fair and average

The suspension control with FOPID to exceed the medium to above average

The suspension control with AFFOPID surpasses from average to good

The control with the neural network belongs to the family of good suspension controllers that ensure good driving comfort

The controller by reinforcement goes from good to excellent comfort and more driving stability

The superior performance of the reinforcement learning controller significantly outperforms that of ANN and AFFOPID, attributed to the constant improvement in RMS values (from $0.342$ for ANN to $0.180$ for RL), leading to a considerable reduction in vibrations experienced by the driver. Additionally, our controller has the advantage of achieving these comfort results without integrating the driver seat suspension. On the other hand, the active suspension model of a quarter car is supposed to be rigid. Therefore, we can claim that the proposed solution in this paper, namely the reinforcement learning controller, is the best and most cost-effective for controlling active suspension, especially for heavy-duty trucks.

Conclusion and outlook

In this study, we utilized the Proximal Policy Optimization (PPO) algorithm in deep reinforcement learning to overcome the drawbacks of conventional suspension methods. We conducted studies and tests to optimize the neural network on the reward function using 13 different reward functions in both uniform and square road conditions. The results showed that reinforcement learning provided better comfort and improved driving stability and vehicle safety, with near-optimal results. Additionally, the results highlighted the importance of the reward function as a guide throughout the learning phase. A clear reward function leads to optimal neural network results that meet all the intended goals.

We compared our results with the ISO 2631-5 standard to evaluate the degree of comfort of our solution. The Root Mean Square (RMS) of the acceleration of the suspended mass was reduced to 0.118 $m / s^{2}$ compared to the passive suspension’s RMS of 1.172 $m / s^{2}$ . This represents a significant reduction in RMS of 90%, which is highly significant when compared to passive suspension.

These results encourage further research on reward function optimization and exploration of other algorithms such as Asynchronous Advantage Actor Critic (A3C), Deterministic Policy Gradient (DPG), and Deep Deterministic Policy Gradient (DDPG). Furthermore, continuous learning under more complex perturbations could enhance reinforcement learning. We should also consider optimizing the number of epochs and the learning rate to provide further neural network optimization.

Footnotes

Handling Editor: Chenhui Liang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Issam Dridi

Anis Hamza

References

Hamza

Ben Yahia

. Heavy trucks with intelligent control of active suspension based on artiﬁcial neural networks. Proc IMechE, Part I: J Systems and Control Engineering 2021; 235: 952–969.

Salem

Aly

. Fuzzy control of a quarter-car suspension system. Int J Comput Inf Eng 2009; 3: 1276–1281.

Qin

Xiang

Wang

, et al. Road excitation classiﬁcation for semi-active suspension system based on system response. J Vib Control 2018; 24: 2732–2748.

Zhao

Dong

Qin

, et al. Adaptive neural networks control for camera stabilization with active suspension system. Adv Mech Eng 2015; 7: 1687814015599926.

Konoiko

Kadhem

Saiful

, et al. Deep learning framework for controlling an active suspension system. J Vib Control 2019; 25: 2316–2329.

Papadimitrakis

Alexandridis

. Active vehicle suspension control using road preview model predictive control and radial basis function networks. Appl Soft Comput 2022; 120: 108646.

Chen

Lin

Chang

. An actor-critic reinforcement learning control approach for discrete-time linear system with uncertainty. In: Proceedings of the 2018 international automatic control conference (CACS), Taoyuan, Taiwan, 2018, pp.1–5. New York: IEEE.

Ragot

Maquin

Adrot

, et al. Détection de dysfonctionnement d’un systéme amortisseur de véhicule automobile. In: 5ème Congrès International Pluridisciplinaire Qualité et Sûreté de Fonctionnement, Qualita 2003, Nancy, France, 2003, p.3.

Huang

, et al. Novel method for identifying and diagnosing electric vehicle shock absorber squeak noise based on a DNN. Mech Syst Signal Process 2019; 124: 439–458.

10.

Axmacher

Lindberg

. Coxarthrosis in farmers. Clin Orthop Relat Res 1993; 287: 82–86.

11.

INRS. Vibrations, plein le dos. Edition INRS, ED 864. 2001.

12.

Ning

Sun

, et al. An innovative two-layer multiple-DOF seat suspension for vehicle whole body vibration control. IEEE ASME Trans Mechatron 2018; 23: 1787–1799.

13.

Bovenzi

Betta

. Low-back disorders in agricultural tractor drivers exposed to whole-body vibration and postural stress. Appl Ergon 1994; 25: 231–241.

14.

Bovenzi

Pinto

Stacchini

. Low back pain in port machinery operators. J Sound Vib 2002; 253: 3–20.

15.

BS ISO 2631-5:2018. Mechanical vibration and shock—evaluation of human exposure to whole-body vibration. International Standard. ISO, 2018.

16.

Kuznetsov

Mammadov

Sultan

, et al. Optimization of a quarter-car suspension model coupled with the driver biomechanical effects. J Sound Vib 2011; 330: 2937–2946.

17.

De la Hoz-Torres

Aguilar-Aguilera

Martínez-Aires

, et al. A comparison of ISO 2631-5:2004 and ISO 2631-5:2018 standards for whole-body vibrations exposure: a case study. In: Arezes

Santos Baptista

Barroso

(eds) Occupational and environmental safety and health. Studies in systems, decision and control. Cham: Springer, 2019, pp.711–719.

18.

Chumchan

Tontiwattanakul

. Health risk and ride comfort assessment by ISO2631 of an ambulance. In: 2019 5th international conference on engineering, applied sciences and technology (ICEAST), Luang Prabang, Laos, 2 July 2019, pp.1–4. New York: IEEE.

19.

Ghazaly

Moaaz

. The future development and analysis of vehicle active suspension system. IOSR J Mech Civil Eng 2014; 11: 19–25.

20.

Goodarzi

Khajepour

. Vehicle suspension system technology and design. Synth Lect Adv Automot Technol 2017; 1: i–77.

21.

Zheng

Wang

Gao

, et al. Parameter optimisation of power regeneration on the hydraulic electric regenerative shock absorber system. Shock and Vibration 2019; (2019).

22.

Bouvin

. Vers une version alternative la suspension CRONE Hydractive. Doctoral Dissertation, Bordeaux, 2019.

23.

Khemoudj

. Dveloppement d’une mthode de pesage embarqu pour poids lourd. Doctoral Dissertation, Valenciennes, 2010.

24.

Gabriel.com. Amortisseurs pour poids lourds, semi-remorques et autobus, https://gabriel.com/wp-content/uploads/2011/06/Gabriel_Section_Fraincaise_FRE.pdf (2011).

25.

Riduan

Tamaldin

Sudrajat

, et al. Review on active suspension system. SHS Web Conf 2018; 49: 02008.

26.

Sharp

Hassan

. The relative performance capabilities of passive, active and semi-active car suspension systems. Proc IMechE, Part D: Transport Engineering 1986; 200: 219–228.

27.

Inoue

Yamaguchi

Kondo

. Damping force generation system and vehicle suspension system constructed by including the same. Google Patents. United States patent US 7,722,056, 2010.

28.

Soliman

Kaldas

. Semi-active suspension systems from research to mass-market: a review. J Low Freq Noise Vib Act Control 2019; 40: 146134841987639.

29.

Shaﬁe

Bello

Khan

. Active vehicle suspension control using electro hydraulic actuator on rough road terrain. J Adv Res Appl Mech 2015; 9: 15–30.

30.

Martins

Esteves

da Silva

, et al. Electromagnetic hybrid active-passive vehicle suspension system. In: Proceedings of the IEEE 49th vehicular technology conference, VTC 1999, Houston, TX, USA, 16–20 May 1999.

31.

Heidarian

Wang

. Review on seat suspension system technology development. Appl Sci 2019; 9: 2834.

32.

Jiregna

Sirata

. A review of the vehicle suspension system. J Mech Energy Eng 2020; 4: 109–114.

33.

Gysen

van der Sande

Paulides

, et al. Efﬁciency of a regenerative direct-drive electromagnetic active suspension. IEEE Trans Veh Technol 2014; 60: 1384–1393.

34.

Kuber

. Modelling simulation and control of an active suspension system. Int J Mech Eng Technol 2014; 5: 66–75.

35.

Ghrle

Schindler

Wagner

, et al. Road proﬁle estimation and preview control for low-bandwidth active suspension systems. IEEE ASME Trans Mechatron 2014; 20: 2299–2310.

36.

Abdul Ali

Abdul Razak

Hayima

. A review on the AC servo motor control systems. ELEKTRIKA J Electr Eng 2020; 19: 22–39.

37.

Amin

Aijun

Shamshirband

. A review of quadrotor UAV: control methodologies and performance evaluation. Int J Autom Control 2016; 10: 87–103.

38.

Adeli

. Control methodologies for vibration control of smart civil and mechanical structures. Expert Syst 2018; 35: e12354.

39.

Roy

Islam

Sadman

, et al. A review on comparative remarks, performance evaluation and improvement strategies of quadrotor controllers. Technologies 2021; 9: 37.

40.

Sutton

Barto

. Reinforcement learning: an introduction. IEEE Trans Neural Netw 1998; 9: 1054.

41.

Nguyen

Nahavandi

. System design perspective for human-level agents using deep reinforcement learning: a survey. IEEE Access 2017; 5: 27091–27102.

42.

Ding

Huang

Yuan

, et al. Introduction to reinforcement learning. In: Dong

Ding

Zhang

(eds) Deep reinforcement learning: fundamentals, research and applications. Singapore: Springer, 2020, pp.47–123.

43.

Azar

Koubaa

Mohamed

, et al. Drone deep reinforcement learning: a review. Electronics 2021; 10: 999.

44.

Poole

Mackworth

. Artiﬁcial intelligence: foundations of computational agents. New York, NY: Cambridge University Press, 2010.

45.

François-Lavet

Henderson

Islam

, et al. An introduction to deep reinforcement learning. Found Trends Mach Learn 2018; 11: 219–354.

46.

Huang

Yang

Wang

, et al. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Trans Veh Technol 2019; 69: 1117–1121.

47.

Koch

. Flight controller synthesis via deep reinforcement learning. ArXiv:1909.06493, 2019.

48.

Song

Steinweg

Kaufmann

, et al. Autonomous drone racing with deep reinforcement learning. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, October 1–5 2021, pp. 1205-1212. New York: IEEE.

49.

. Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 2020; 8: 29064–29074.

50.

Koch

Mancuso

West

, et al. Reinforcement learning for UAV attitude control. ACM Trans Cyber Phys Syst 2019; 3: 1–21.

51.

Bøhn

Coates

Moe

, et al. Deep reinforcement learning attitude control of ﬁxed-wing UAVs using proximal policy optimization. In: Proceedings of the 2019 international conference on unmanned aircraft systems, ICUAS 2019, Atlanta, GA, USA, 2019, pp.523–533. New York: IEEE.

52.

Hamza

Ben Yahia

. Artiﬁcial neural networks controller of active suspension for ambulance based on ISO standards. Proc IMechE, Part D: J Automobile Engineering 2023; 237: 34–47.

53.

Deng

Gong

. Double-channel event-triggered adaptive optimal control of active suspension systems. Nonlinear Dyn 2022; 108: 3435–3448.

54.

Fares

Younes

. Online reinforcement learning-based control of an active suspension system using the actor critic approach. Appl Sci 2020; 10: 8060.

55.

Schulman

Wolski

Dhariwal

, et al. Proximal policy optimization algorithms. ArXiv:170706247v2 [cs.LG], 2017.

56.

Hsu

Mendler-Dünner

Hardt

. Revisiting design choices in proximal policy optimization. ArXiv:2009.10897v1 [cs.LG], 2020.

57.

Han

Liang

. Reinforcement-learning-based vibration control for a vehicle semi-active suspension system via the PPO approach. Appl Sci 2022; 12: 3078.

58.

Chu

Kalabić

. Dynamics-enabled safe deep reinforcement learning: case study on active suspension control. In: 2019 IEEE conference on control technology and applications (CCTA), Hong Kong, China, 2019, pp.585–591. New York: IEEE.

59.

Ming

Yibin

Xuewen

, et al. Semi-active suspension control based on deep reinforcement learning. IEEE Access 2020; 8: 9978–9986.

60.

Kumar

Pal

Sethi

. Objective evaluation of ride quality of road vehicles. SAE technical paper 990055, 1999.

61.

Mechanical Vibration. Road surface profiles–reporting of measured data. The International Organisation for Standardisation ISO. 8608, 1995.

62.

Múčka

. Simulated road proﬁles according to ISO 8608 in vibration analysis. J Test Eval 2017; 46: 405–418.

63.

Dridi

Hamza

Ben Yahia

. Control of an active suspension system based on long short-term memory (LSTM) learning. Adv Mech Eng 2023; 15: 16878132231156789.

64.

Jin

Wang

, et al. Improving vibration performance of electric vehicles based on in-wheel motor-active suspension system via robust finite frequency control. IEEE Trans Intell Transp Syst 2023; 24: 1631–1643.

65.

Jin

Wang

Yan

, et al. Robust vibration control for active suspension system of in-wheel-motor-driven electric vehicle via µ-synthesis methodology. J Dyn Syst Meas Control 2022; 144: 051007.

66.

Jin

Wang

Yang

. Development of robust guaranteed cost mixed control system for active suspension of in-wheel-drive electric vehicles. Math Probl Eng 2022; 2022: 4628539.

67.

Swethamarai

Lakshmi

. Adaptive-fuzzy fractional order PID controller-based active suspension for vibration control. IETE J Res 2022; 68: 3487–3502.