Abstract
To solve the problems of the lack of economic efficiency and the short driving range of electric commercial vehicles, a hybrid system was developed in this work that uses fuel cells as a range extender. In addition, a method to solve the problem of multi-power energy management was proposed using the model predictive control as a framework. In the state of charge maintenance interval, a quadratic utility function was used to calculate the output power of the fuel cell and battery. The unknown parameters in the quadratic utility function were solved using the model prediction control. Speed prediction was performed using long short-term memory and particle swarm optimization. The demanded power sequence within the prediction horizon was calculated based on the predicted speed. The dynamic programming algorithm was used to solve the power demand sequence within the prediction horizon length, and the unknown parameters in the utility function were deduced inversely. The simulation results show that the proposed energy management strategy (EMS) is superior to conventional EMS in improving component durability and vehicle economy.
Keywords
Introduction
The conventional automobile consumes fossil fuels and causes a lot of greenhouse gas emissions, so the development of new energies is of great significance to the sustainable development of the automotive industry (Xu and Shen, 2021). Electric vehicles, hybrid electric vehicles, and fuel cell electric vehicles are continuously being designed and developed (Yang et al., 2022). Compared to passenger vehicles, commercial vehicles have a longer running time and must pay more attention to refueling and charging time. Therefore, electric technology is struggling to meet the actual needs of commercial vehicles (Liu et al., 2021). Although hybrid vehicles do not have to take driving range into account, there is still the problem of pollutant emissions. The future development of the internal combustion engine will also meet with greater resistance due to political guidelines. Fuel cells play an important role in transportation due to their advantages such as zero emissions, high energy density, high efficiency, and fast refueling (He et al., 2021; Lemphers et al., 2022; Qi et al., 2022; Sorlei et al., 2021). Fuel cell hybrid technology has transitioned from the laboratory R&D phase to the industrial application phase. In China, fuel cell technology has entered the industrialization era: its system-level performance is on par with international advanced standards, while significant strides are being made in the localized production of key materials. At the stack level, the 150 kW G-100 fuel cell, developed by the U.S.-based company Siltrax, set a new world record in third-party testing conducted by TÜV Rheinland. It achieved a volumetric power density of 9.77 kW/L and a gravimetric power density of 9.7 kW/kg. At the system level, leveraging advanced lightweight design and system integration technologies is expected to help meet the power density targets of 6 kW/kg (for stacks) and 2 kW/kg (for systems). This advancement would reduce the weight of fuel cell systems by 60% compared to traditional configurations. Enhancing the efficiency of fuel cell systems is a critical enabler for their commercial adoption. Per the 2025 technology R&D objectives, the efficiency of fuel cell systems will consistently exceed 60%. Equally notable is the progress in durability. Following one to two decades of dedicated technological development, the service life of fuel cells has surpassed 20,000 h, representing a major breakthrough in reliability. Even in environments as cold as −30°C, cold start-up time has been cut to under 15 min, and dynamic response time has been shortened to 1 s.
Fuel cells have poor dynamic characteristics and durability. To ensure the normal operation of the vehicle, additional power sources such as supercapacitors and batteries are required (Ammari et al., 2021; Sun et al., 2022). Research shows that the use of fuel cells in conjunction with batteries/supercapacitors can reduce costs and fuel consumption and improve the economy of vehicle (Al Sumarmad et al., 2022; Li et al., 2021a, 2021b). In a fuel cell electric hybrid architecture, energy management strategies (EMSs) play an important role in reducing fuel cell degradation and improving vehicle economy (Sun et al., 2023).
Energy management strategies are mainly divided into three classes: rule-based (RB) strategies, optimization-based strategies, and learning-based (LB) strategies (Gu et al., 2023). The RB EMS mainly uses an “if-then” structure to convert experience or heuristic knowledge into a set of deterministic rules or fuzzy rules. Therefore, RB EMSs can be mainly divided into two categories: EMSs based on deterministic rules and EMSs based on fuzzy rules.
EMSs based on deterministic rules mainly include thermostat control strategy (TCS) and the power following strategy (PF). TCS originates from temperature control (Dizqah et al., 2014; Ferahtia et al., 2021; Wang et al., 2021; Yang et al., 2021; Ye and Li, 2020). It is usually set to start the fuel cell when the battery/supercapacitor state-of-charge (SOC) is below a certain predetermined value and shut down the fuel cell when the battery SOC is above a certain predetermined value. PF adjusts the output power of the fuel cell and battery according to the power demand of the vehicle. Unlike the logical control method of TCS, PF uses specific mathematical formulas to map the relationship between the fuel cell/ battery/supercapacitor and the required power. The output of the fuel cell/ battery/supercapacitor is controlled so that it continuously adapts to changes in required power. The advantages of EMS based on deterministic rules are simple logic; the algorithm is implemented through “if-then” or table lookup. The disadvantage is that the developed rules are less objective and may not guarantee optimal fuel economy and efficient energy utilization under various driving conditions (Erdinc et al., 2009).
EMSs are based on fuzzy rules, i.e. fuzzy logic control (FLC), which is divided into three stages: fuzzification, intelligence, and defuzzification. In the fuzzification stage, the control input consists of numerical values (Ates et al., 2010; Kamel et al., 2021; Rezk et al., 2021). Based on this, the membership function and weight coefficient based on the input fuzzy variables are defined to solve the power allocation ratio, and a fuzzy RB is formulated in the form of “if-then.” The advantage of FLC is that the fuzzification process is robust and adaptable to uncertainty, noise, and interference. The disadvantage is that due to the complex internal logic of FLC, it is difficult to observe the internal control logic, parameter adjustment is difficult, and the formulation of rules is also subjective and difficult to ensure the optimal strategic control effect.
EMSs based on optimization are mainly divided into two types: global optimization and real-time optimization (Guo et al., 2021; Hu et al., 2022; Li et al., 2014; Rezaei et al., 2017). EMSs based on global optimization are divided into two types: dynamic optimization algorithm and static optimization algorithm. Dynamic optimization algorithms based on global optimization mainly include: dynamic programming (DP), Pontryagin's minimum principle (PMP), etc. DP is an inverse solution strategy for analyzing complex problems based on the Bellman optimality principle. Its essence is the idea of dividing and conquering and the resolution of redundancy. The calculation of DP is discrete, so the dynamic system must also be discretized in the time domain and value domain. The advantage of DP is that for recurring sub-problems, problems are only solved the first time and can be directly referenced when the same problem occurs again. The disadvantage of DP is that it relies on the mathematical model of the system, the calculation takes a lot of time, requires a large data storage space, and the degree of discreteness of the data largely determines the accuracy of the results. If DP is applied to EMS, the completeness needs to be known in advance. However, the driving cycle conditions are unknown during actual driving, which also prevents DP from being directly applied to the vehicle. However, DP's excellent performance in global dynamic optimization still makes it an ideal choice for evaluating other control strategies. PMP is another optimization method for solving constrained global dynamic optimization problems. It is essentially an extension of the variational method. PMP transforms the global optimization problem into a local Hamiltonian minimization problem. PMP can be used for real-time control in theory, and the computation effort is less than for DP. PMP, like DP, relies on inverse processes and dynamic system models. Although the robustness and performance of PMP are weaker than those of DP, PMP has less computational workload and memory footprint. The increase in the number of state variables causes the dimensionality of the optimal space and the computational workload of DP to increase exponentially. This defect is called the curse of dimensionality in DP, while in PMP the number of nonlinear differential equations increases with the number of nonlinear differential equations.
Optimization algorithms based on real-time optimization mainly include: equivalent consumption minimization strategy (ECMS), model predictive control (MPC), etc. (Wang et al., 2022; Zhang et al., 2022). The main idea of ECMS is to convert the global optimization problem into a local optimization problem by minimizing the equivalent hydrogen consumption (the sum of fuel cell hydrogen consumption and battery/supercapacitor power consumption converted into hydrogen consumption). Based on the PMP, the equivalent factor that converts electrical energy into equivalent hydrogen consumption is the common state of equivalent PMP. ECMS is an approximate implementation of PMP and can be used as a real-time optimized EMS. The control performance of hybrid electric vehicles strongly depends on the correct estimation of the equivalence factor. The optimal equivalence factor is related to the driving cycle, battery SOC limits and current direction. The equivalence factor is the key to ECMS and has been extensively studied by a large number of scholars. MPC is a widely used control strategy in industry. Different from DP or PMP, MPC needs to predict the action in the future time period with the help of prediction algorithm. In the framework of MPC, it is still necessary to find the optimal solution in each time period using DP, PMP or other solutions. MPC has the advantage that it does not need to obtain a complete driving cycle condition, but adopts the real-time prediction method, and only needs to execute the first second of each time period, which also greatly reduces the cumulative calculation error problem caused by inaccurate prediction (Guo et al., 2024a, 2024b).
EMSs based on LB include reinforcement learning (RL) and deep reinforcement learning (DRL) (Han et al., 2019; Li et al., 2018, 2021a, 2021b; Zhou et al., 2021).
RL includes Q-learning and Dyna, etc. Q-learning is a model-free RL method. The algorithm does not rely on an accurate physical model of the vehicle, but requires more experience and has poor real-time performance. Dyna can not only learn from the state transition probability model and enhance the planning process, but also interact with the environment to learn. Dyna's value function is updated through learning and planning, and the state transition probability changes with the predicted driving condition information. However, as the number of calculations increases, the real-time performance decreases.
DRL mainly includes Actor-Critic (AC), deep Q-network (DQN), policy gradient algorithm (PG), and deep deterministic policy gradient (DDPG). The emergence of DQN solves the limitation of the number of states in Q-learning. In Q-learning, due to the “curse of dimensionality,” the number of states and actions of Q-learning cannot be very large. This problem is solved by using neural network that replaces the Q-table in Q-learning. The PG algorithm is a policy-based algorithm that uses proximity operators as approximate gradients to perform gradient descent. The proximity operator is an extension of the gradient. When the function is a smooth function, the proximity operator is the gradient. The essence of PG is an algorithm that combines Monte Carlo method and neural network. PG is different from output of Q-values such as Q-learning and DQN. Instead, it directly outputs the probability of all actions that can be taken in the current state within a continuous interval. DDPG is a deeply deterministic policy gradient algorithm proposed to solve continuous action control problems. It also combines the DQN structure to improve the stability and convergence of Ac-tor-Critic. The input of the DQN network is state information, and the output is the value of each action. At the same time, DQN is a value-based rather than a policy-based method, so the DQN algorithm can be used to solve continuous state spaces and discrete actions problems. LB algorithms can continuously learn and adjust the EMS through real-time interaction between vehicle agents and the environment. However, due to driving safety considerations and limited hardware computing capabilities, these algorithms are still in the simulation stage.
In this paper, a new EMS is proposed to improve the durability and economy of the fuel cell commercial vehicles. The main contributions of this paper are summarized as follows:
A TCS is proposed to solve the power management issue of the fuel cell and maintain the battery SOC. The proposed strategy can also be adapted to different driving cycles. A quadratic utility function is used to measure the benefits of power batteries and fuel cells in the energy management. This strategy can effectively improve the overall vehicle economy while reducing the power fluctuation of the fuel cell and the number of start-stop cycles. To solve the parameters in the quadratic utility function, an algorithm based on the MPC framework is proposed. The long short-term memory (LSTM)-particle swarm optimization (PSO) algorithm is used for speed prediction, and DP is applied to calculate the fuel cell output power.
System structure and modeling
Fuel cell model
The fuel cell efficiency model established in this paper is as follows:
The mass of the remaining hydrogen can be calculated from the pressure change in the high-pressure hydrogen storage bottle. Using the equation of state for ideal gasses, modified by the compression coefficient, the mass of the remaining hydrogen in the hydrogen storage bottle can be calculated as follows:
The hydrogen compressibility factor Z can be determined via the Helmholtz equation and fitted by a polynomial.
Battery model
The main function of the battery is to absorb the braking energy of the vehicle and reduce the fluctuation of the fuel cell output current. The battery is composed of an ideal voltage source connected in series with internal resistance. The battery model is as follows:
Motor model
The motor directly determines the power of the entire vehicle. The torque response time and speed response time of the drive motor determine the speed of the vehicle's response to the pedal signal. The stall torque and stall time reflect the starting status of the vehicle. The peak torque and peak torque duration determine the maximum vehicle speed. Speed control accuracy and torque control accuracy determine vehicle speed control accuracy.
During the operation of the vehicle, the motor control unit calculates the required voltage and current based on the signals transmitted by the vehicle control unit and obtains electrical energy from the bus. The mechanical power of the motor is as follows:
Vehicle energy consumption model
The vehicle resistance mainly includes rolling resistance, wind resistance, slope resistance, and acceleration resistance. Only when the vehicle driving force and resistance are balanced can the vehicle maintain the current speed. The driving force is mainly provided by the motor. When the driving force is greater than the resistance, the vehicle obtains acceleration performance. The vehicle resistance and driving force are shown as follows:
The battery model, fuel cell model, motor model, energy consumption model, and driver model were built through Matlab / Simulink. The complete fuel cell vehicle model is shown in Figure 1.

Vehicle model.
Model validation
The polarization curve of the fuel cell system obtained through fuel cell bench tests is shown in Figure 2. The fuel cell bench test is illustrated in Figure 3.

Polarization curve of fuel cell.

Fuel cell performance test bench.
In the electric-electric hybrid mode of the FCEV, part of the electrical energy generated by the fuel cell flows into the power battery, and then is supplied to the entire vehicle from the power battery. The energy inflow and outflow from the power battery undergo multiple losses, making it difficult to accurately calculate the vehicle's energy flow. The main purpose of vehicle model validation is to verify the energy consumption model, driver model, as well as the vehicle's dynamic performance and economic efficiency. Since the fuel cell model has obtained accurate data through bench tests, the vehicle model can be validated in pure electric mode. The results of the constant speed cruise economy experiment are shown in Tables 1 and 2.
No-load constant speed cruise test and simulation results.
Full load constant speed cruise test and simulation results.
Analysis based on Tables 1 and 2 shows that there is an error between the vehicle economy simulation results and the experimental results, with the actual test's electricity consumption per 100 km being slightly higher than that of the simulation. The main reasons for the error are as follows:
Compared with the real vehicle, some mechanical or electrical losses are not fully considered in the simulation model, resulting in higher electricity consumption per 100 km in the test results than in the simulation. During the bench test, manual operation of the accelerator and brake pedals is required, making it difficult to maintain a stable vehicle speed, whereas the simulation can operate at a constant speed.
Data analysis indicates that the maximum error in electricity consumption per 100 km is 3%. Under different vehicle speeds and load conditions, the error in electricity consumption per 100 km between the vehicle bench tests and simulation tests is within the acceptable range.
While conducting economy tests and simulations on the vehicle, dynamic performance tests are also required. The test plan is as follows:
Under full load, the vehicle travels at 10 km/h for 5 min on a 20° slope for the slope test. Under full load, the vehicle travels at 20 km/h for 5 min on a 20° slope for the slope test.
According to the test and simulation results, both the experiment and simulation meet the above dynamic performance requirements. Comprehensive analysis of the data from bench tests and simulation tests shows that the built vehicle model is basically consistent with the real vehicle.
The real vehicle test data of the FCEV are presented in Table 3. The real road test of the FCEV is shown in Figure 4. During the entire test process, the operating parameters and parameter change trends of each component were acquired and recorded using a CAN analyzer. The initial on-board hydrogen storage was 6.88 kg, the average vehicle speed was 23.62 km/h, and the initial SOC of the power battery was 39%. At the end of the operation, the on-board hydrogen storage was 5.69 kg, and the SOC of the power battery was 73%. The total hydrogen consumption during the entire process was 1.19 kg, with an actual measured hydrogen consumption of approximately 1.68 kg per 100 km. The hydrogen consumption per 100 km in the simulation test was 1.56 kg. The error in hydrogen consumption per 100 km between the simulation and the test was 0.12 kg, with an error percentage of 6.97%. Since the error is within the acceptable range, the accuracy of the vehicle hybrid model can be effectively guaranteed.

Vehicle road test.
Hybrid test data.
SOC: state-of-charge.
Working mode division based on thermostat control strategy
In this work, the battery is used as the main energy source and the fuel cell as the range extender. In order to maintain the long-term driving requirements of the vehicle, it is necessary to keep the SOC of the battery within a reasonable range. Therefore, this paper uses TCS to maintain the battery SOC between 40 and 80%. It is specifically implemented through the finite state machine (FSM), and 6 states (State 0–5) are defined based on the remaining hydrogen storage capacity, SOC, and driving/braking status, with specific control rules and triggers, which can be seen Figure 5.

Finite state machine process.
State 0: There is no hydrogen in the hydrogen storage bottle and the demand current is 0. Neither the fuel cell nor the battery participates in power supply.
State 1: There is no hydrogen in the hydrogen storage bottle, and the battery alone meets the vehicle driving and braking needs.
State 2: There is hydrogen in the hydrogen storage bottle and the battery SOC is >70%. The battery alone meets the vehicle's driving and braking needs.
State 3: When there is hydrogen in the hydrogen storage bottle and the battery is 70% ≥ SOC > 20% and the vehicle is in driving mode, the battery and fuel cell work together to meet the energy needs of the vehicle.
State 4: When there is hydrogen in the hydrogen storage bottle and the battery is 70% ≥ SOC > 20% and the vehicle is in driving mode, in order to avoid frequent starts and stops of the fuel cell, the fuel cell does not stop and the fuel cell output power remains at the previous second output power.
State 5: When there is hydrogen in the hydrogen storage bottle and the battery 20% ≥ SOC and the vehicle is in driving mode, in order to prevent the battery from being in a low SOC state for a long time, the fuel cell outputs at maximum power at this time. It provides electric energy to the entire vehicle, and the excess electric energy is transferred to the battery to quickly recover the battery's SOC.
Strategy design based on utility function
The output mode of the fuel cell and battery in each state was determined through FSM above. However, the output power of the fuel cell in State 3 and 4 has not yet been determined. In this section, the output pattern of the fuel cell in States 3 and 4 is mainly determined by the utility function. Frequent starts and stops of fuel cells accelerate the decline of lifespan. In order to avoid frequent starts and stops of fuel cells, it is necessary to ensure that the battery SOC is maintained between 40 and 80%. If the battery SOC is close to 80%, the fuel cell output power must be reduced to delay reaching top dead center and prevent the fuel cell from shutting down. If the battery SOC is close to 40%, the fuel cell output power must be increased to prevent the battery from being in a low power state for a long time. It is very important for fuel cells to improve their durability and fuel economy. The lower the change rate of fuel cell output power, the lower the number of starts and stops. At the same time, it is a priority for EMS to keep the output power close to the highest efficiency. For batteries, it is important to increase their service life. If the change in battery output power is small and close to its historical average power, its lifespan will decline more slowly.
In this paper, a utility function is used to measure the benefits of power batteries and fuel cells in the energy management. The utility function is usually defined in the form of a quadratic form. The mathematical form of the utility function adopts a quadratic function. There are also other forms of utility functions in the literature, such as linear functions and logarithmic barrier functions. Since the concavity of the quadratic function guarantees the existence and uniqueness of the solution, a quadratic utility function is used in this paper to simulate the preferences of participants.
The closer the output current of the battery is to the average current, the lower the current fluctuation of the battery. The battery utility function considering durability is as follows:
The fuel cell has a maximum efficiency point, and its economy is optimal when the fuel cell output power is close to the high efficiency point. In States 3 and 4, the fuel cell will neither start nor stop, so the main factor affecting the performance degradation of the fuel cell is the variable load condition. The smaller the change in fuel cell output power, the lower the performance degradation and the greater the durability. Therefore, the utility function of the fuel cell considering economy and durability is as follows:
When the utility function is maximized, the economy and durability of the fuel cell and the durability of the battery are maximized. The utility function maximization form of fuel cells and power batteries is as follows:
The solution to the dual-objective function is not unique, and ultimately it is difficult to determine the output power of the fuel cell and power battery. To simplify the controller calculation process and obtain a unique solution at the same time, the dual-objective function maximization problem is transformed into a single-objective function minimization problem in this paper. The new objective function is as follows:
The EMS must also meet the following conditions: the sum of the weight coefficients is 1, the sum of the fuel cell output power and the power battery output power is equal to the vehicle demand power Preq, the fuel cell and battery output power need to be within their respective maximum ranges, the constraints are as follows:
This paper uses Karush–Kuhn–Tucker conditions to solve the optimization problem of Equation (19). The final form of the objective function is as follows:
The weight coefficient in formula (25) has not yet been determined. In order to determine the weight coefficient, let:
Once k2 and k3 are determined, the real-time output power of the fuel cell and power battery is also determined. How to determine k2 and k3 is the core issue of the EMS proposed in this paper.
With the increasing complexity of control problems and the pursuit of better system performance, the development of new control strategies is becoming increasingly important. MPC is one of the advanced control technologies suitable for industrial applications. The corresponding MPC needs to solve three problems: (1) how to construct a value function; (2) how to build a prediction model; and (3) how to solve the optimal problem within the prediction horizon. As long as the specific values of k2 and k3 are obtained, the specific power values of the fuel cell and power battery can be obtained. Therefore, k2 and k3 are used as control variables. The control variables are as follows:
Taking the power battery SOC as a state variable, the state variables are as follows:
Within the prediction horizon, the control system can be described as a nonlinear and time discrete system:
At time k, obtain the optimal control sequence [u(k),u(k + 1), … u(k + N)] through the optimization algorithm. The value function is shown as follows:
The optimization problem within each prediction horizon length tp can be converted into a nonlinear optimization problem, and as time rolls, the optimization problems within each prediction horizon length tp must be solved. The most used solution methods are: PMP, DP, SQP, etc. In the first step, the problem of the value function is solved, and in the second step, the problem of the prediction model is solved. The predictions used in this paper are shown in the next section.
Strategy design based on model predictive control
With the increasing complexity of control problems and the pursuit of better system performance, the development of new control strategies is becoming increasingly important. MPC is one of the advanced control technologies suitable for industrial applications. The basic idea of MPC is (1) to predict future state or input variables based on the state or input variables of the controlled system; (2) the rolling optimization idea in the future finite time domain; and (3) equipped with feedback and predictive correction functions, the MPC process is shown in Figure 6.

MPC process. MPC: model predictive control.
Three problems need to be solved by MPC: (1) how to build value function; (2) how to build prediction model; and (3) how to solve the optimal problem within the prediction horizon. The optimization problem within each predicted horizon length tp can be converted into nonlinear optimization problem, and the optimization problem within each predicted horizon length tp needs to be solved with time rolling. Currently, the commonly used solution methods include: PMP, DP, SQP, etc. The first step is to solve the problem of value function, and the second step is to solve the problem of prediction model. The algorithm logic of this paper is shown in Figure 7.

MPC algorithm logic. MPC: model predictive control.
Speed prediction based on long short-term memory-particle swarm optimization and dynamic programming process
BPNN is widely used in processing time series problems, but there is the problem of gradient vanishing or exploding in the backpropagation of errors during long-term memory. Therefore, LSTM is used in this paper to deal with time series prediction problems.
In this work, the parameters of the LSTM are optimized by PSO to achieve the best prediction results. The horizon length tp is selected as 5 s. The optimization parameters are the number of network layers, the number of neurons in each layer, the number of fully connected layers, and the neurons in the fully connected layer number. The nonoptimized LSTM speed prediction is shown in Figure 8, the optimized LSTM speed prediction is shown in Figure 9, and the evaluation indicators are shown in Table 4. The nonoptimized root mean square error (RMSE) is 3.41 and the optimized RMSE is 2.87. It can be seen that LSTM is better than BPNN in speed prediction.

Speed prediction based on long short-term memory.

Speed prediction based on LSTM-PSO. LSTM: long short-term memory; PSO: particle swarm optimization.
Optimized LSTM prediction error results.
LSTM: long short-term memory.
When the speed sequence within the prediction horizon length is obtained, the corresponding demand power sequence can be obtained. Next, the demand power must be allocated to obtain the best control effect. In this work, DP is used to allocate the demand power within the prediction horizon length. When the prediction horizon length tp increases, the prediction results become increasingly inaccurate. Since the vehicle speed is subject to large uncertainty, it is impossible to predict the global vehicle speed directly. As a global optimization algorithm, DP must know the time-velocity sequence. Therefore, DP cannot be directly used as a method to solve energy management optimization problems in actual controllers, but the time-velocity sequence can be considered known in consecutive independent prediction horizon segments. In this paper, the complete time-velocity sequence is not obtainable. However, through the MPC algorithm, we can obtain the deterministic time-velocity sequence of tp within the prediction horizon length and utilize DP to solve the global optimization problem within this horizon. The optimal control sequence within each predicted horizon length is obtained by rolling the time window, and the first second action within each horizon length tp is selected as the output method for controlling the fuel cell and battery.
The solution process of DP is as follows:
Load the time-velocity sequence within the prediction horizon length and calculate the time-demand-power sequence; Initialize parameters and determine the output power boundary conditions and SOC boundary conditions of fuel cells and power batteries; Form the SOC grid, discretize the SOC values, and use these values as the value function to calculate each SOC value; Find the optimal output power of the fuel cell and battery and infer k2 and k3 based on the fuel cell output power. Get the output power of the fuel cell and power battery in the first second within the prediction horizon.
Results and discussion
The time-velocity sequence within the predicted horizon length is obtained through LSTM-PSO, and the optimal control matrix within the predicted horizon length is solved through DP, and then the optimal k2 and k3 are solved to achieve the optimal real-time control effect. The fuel cell is solved according to the MPC algorithm. The output current of the power battery is shown in Figures 10–13. Figures 10–13 show that the fuel cell can still maintain stable output power despite large fluctuations in demand power. The part with large fluctuations in the demand power is mainly borne by the battery. When the demand power is large, the fuel cell output power is large; when the demand power is small, the fuel cell output power is small; when the demand power is zero, the fuel cell does not stop but maintains the current power output, reducing the occurrence of fuel cell start-stop and idling conditions while maintaining the SOC of the power battery at an appropriate level.

Fuel cell and battery curves under WLTC driving cycle. WLTC: worldwide light-duty test procedure.

Fuel cell and battery curves under NEDC driving cycle. NEDC: new European driving cycle.

Fuel cell and battery curves under UDDS driving cycle. UDDS: urban dynamometer driving schedule.

Fuel cell and battery curves under JC08 driving cycle.
The MPC algorithm is used to solve the utility function parameters. The performance of MPC under single cycle conditions is shown in Table 5. Since the demand power fluctuation corresponding to the worldwide light-duty test procedure (WLTC) operating condition is the largest among the four typical operations, the degree of battery performance degradation and fuel cell performance degradation under the WLTC operating condition is larger than that under the other three operating conditions; the demand power fluctuation corresponding to the new European driving cycle operating condition is small, so battery and fuel cell performance degradation is minimal.
Performance of different EMSs under different cycle conditions.
EMS: energy management strategy; MPC: model predictive control; WLTC: worldwide light-duty test procedure; NEDC: new European driving cycle; UDDS: urban dynamometer driving schedule.
In order to show the characteristics of different algorithms, we compare and simulate different EMS under three aspects: fuel cell degradation, battery degradation, and driving range. The results of different solution methods are shown in Table 3. As can be seen from Table 6, the MPC algorithm is basically better than the other two strategies in prolonging component life and improving driving range. The performance of MOABC (multi-objective artificial bee colony algorithm) is the worst among the three methods, mainly because MPC is a real-time dynamic optimization algorithm, which can ensure the optimal control effect within each prediction horizon, while MOABC is a global static optimization algorithm. The parameters solved by MOABC cannot guarantee real-time optimality, but only obtain better control effects overall. The overall performance of DDPG is better than MOABC, but slightly worse than MPC. The main reason is that DDPG is used in this paper as a global static optimization, so its performance is not as good as real-time optimization. However, DDPG can continuously update the agent through parameter learning. On the other hand, the number of iterations of the model in this paper is limited, and the full performance of the DDPG algorithm is not fully utilized. Due to the limited number of iterations, the DDPG algorithm may not find the optimal solution. DP offers significant advantages in solving global optimal solutions. Therefore, the driving range calculated by the DP strategy is generally regarded as the optimal solution and serves as a benchmark for comparing other strategies. However, DP only focuses on maximizing driving range and does not consider component life protection, its performance in terms of both fuel cell performance degradation and battery performance degradation is inferior to that of other algorithms.
Performance of different EMS under different cycle conditions.
EMS: energy management strategy; DP: dynamic program; MPC: model predictive control; DDPG: deep deterministic policy gradient; WLTC: worldwide light-duty test procedure; NEDC: new European driving cycle; UDDS: urban dynamometer driving schedule.
Conclusions
In this paper, energy management of fuel cell commercial vehicles is investigated and the fuel cell model, battery model, motor model, and vehicle energy consumption model are established. The vehicle and component models reflect the rules of energy flow transfer and the efficiency of the transfer process, and also reflect the dynamic response characteristics of the components. Finally, a new energy management strategy (EMS) is designed. The strategy uses the MPC strategy as a framework and combines the SOC status of the battery to divide the fuel cell working range. A quadratic utility function is used to calculate the output power of fuel cells and batteries. Speed prediction is carried out through LSTM-PSO, and the demanded power sequence within the prediction horizon length is calculated based on the predicted speed. The DP algorithm is used to solve the demand power sequence within the prediction horizon length, and the unknown parameters in the utility function are deduced inversely. The simulation results show that the proposed EMS is superior to conventional EMS in improving component durability and vehicle economy.
Compared with the MOABC strategy and DDPG strategy, the battery degradation of proposed strategy is reduced by 27.3% and 3.4%, and the durability of fuel cell is improved by 7.5% and 1.7% under WLTC driving cycle. The proposed strategy can significantly reduce the start-stops times of fuel cell, and the service life of fuel cell can be improved a lot.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was sponsored by QingLan Project of Jiangsu Higher Education Institutions (JSQL2022, JSQL2023). The authors would like to thank the National Key R&D Plan of China (NO. 2016YFD0700402) for the support given to this research. This work was also supported by the Changzhou University Higher Vocational Education Research Institute project (NO. CDGZ2023012).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
The data used to support the findings of this study are available from the corresponding authors upon request.
