Abstract
Approximate dynamic programming is an effective optimal control method. This article researches a data-driven approximate dynamic programming. The method is extended to a nonlinear multi-input multi-output form. Using the data from a unique 4JB1-T weifu accumulator pump system (WAPS) engine, the developed approximate dynamic programming controller is trained to achieve its optimal trade-off emission control between the nitrogen oxides and particulate matter. The convergent proof of this method is given. The second-order training algorithm is introduced to promote the robust and convergent performance. The control objective is to let the WAPS engine pass the China State-IV emission test under the New European Drive Cycle. The bench test shows that an excellent control transient performance and significant promotion have been achieved. This article presents a new approach for the engine control and calibration. In addition, it also adds another dimension to the existing literature on the data-driven nonlinear multi-input multi-output trade-off emission control of the WAPS engine.
Keywords
Introduction
The 4JB1-T weifu accumulator pump system (WAPS) engine is to pass the China State-IV emission test. However, it has difficulty in the nitrogen oxides (NOx) and particulate matter (PM) emission control. These two emissions are a hard trade-off relationship for the engine. At present, there are mainly three kinds of technologies for this emission control.
Some scholars solved this problem mainly by adding an after-treatment equipment. For example, in 1996, Summers et al. 1 adopted a cerium fuel-borne catalyst/filter/exhaust gas recirculation (EGR) system for the simultaneous control of the engine PM and NOx emissions. In 2010, Rathore et al. 2 applied an activated carbon fibers functionalized with ammonia for this control. In 2014, Feng et al. 3 used a particulate oxidation catalyst to improve the control effect of the NOx and PM emissions. Today, the particulate filter has become a standard device to reduce the PM emission of a diesel engine. 4
Recently, the fuel blends with a certain control strategy or the clean energy technology is popular, such as the diesel–alcohol ether blends,5–7 the diesel–hydrogen blends, 8 the diesel–nature gas blends, 9 and the diesel–oxygenates blends. 10 For example, in 2012, Lin et al. 11 reduced the NOx and PM emissions by adding a water-containing butanol into a diesel engine. In 2016, Kumar et al. 12 used some advanced bio-fuels to overcome the smoke and NOx trade-off. In 2017, Kumar et al. 10 proposed a multi-response optimization to screen suitable diesel-oxygenate blends for achieving a simultaneous reduction of smoke and NOx.
Other scholars mainly utilized the optimal control strategy. For example, in 2013, Tschanz et al. 13 used a feedback control for the optimal trade-off PM and NOx emissions. In 2014, Nikzadfar and Shamekhi 14 used the neural network method to control the soot and PM emissions of a common rail diesel engine. In 2015, Fang used a response surface method to experimentally optimize the engine emission. The NOx and soot emissions were reduced by 79% and 50% at the low load, and by 72% and 27% at the high load. 15 In 2016, Deb et al. 16 applied the artificial intelligence method to control the emission (including the trade-off soot/NOx) of a single cylinder engine, that is, the neural network with a fuzzy logic–based topology optimization method. In 2016, Liu and Song 17 used the post-injection strategy to regulate the exhaust and PM emissions of a high-speed direct injection engine. In 2016, Divekar et al. 18 carried out an empirical investigation and parametric analysis to assess the impact of the EGR in attaining an ultra-low NOx emission while minimizing the smoke. In 2016, Liu et al. 19 proposed a novel λ-based EGR modulation method. The NOx overshoot is eliminated without harming the soot too much. In 2017, Hu et al. 20 obtained the optimal engine design parameters with a multi-objective genetic algorithm which deals the trade-off between NOx and soot.
Generally speaking, the in-cylinder control strategy should be adopted in preference. This is because the after-treatment technology needs an additional costly hardware or the fuel blends are not available in some occasions. For the WAPS engine, its in-cylinder control requires an integrated optimization of the rail pressure, EGR rate, and fuel injection. Then, the optimal controller makes the WAPS engine pass the China State-IV emission test for the NOx, PM under a New European Drive Cycle (NEDC) (the hydrocarbons (HC), carbon monoxide (CO), and carbon dioxide (CO2) emission have been achieved). This is a typical optimal control problem of a constrained, nonlinear, coupled, multi-input multi-output (MIMO), time-varying system. Therefore, this article will try a new method which is very suitable for this control, although the above have listed many control strategies.
Today, the approximate dynamic programming (ADP) has overcome the curse of dimensionality of the dynamic programming (DP). 21 This makes the ADP become an effective optimal control method. This method is especially proper to solve the control cases, most of which can be formulated as a cost minimization or maximization problem. 22 Thus, this article researches a data-driven ADP method, and it does not need the model of its controlled object. Then, this method is extended to a nonlinear MIMO form. Using the data from a unique testing 4JB1-T WAPS engine, the developed ADP controller is trained to achieve the optimal trade-off emission control between the NOx and PM. The convergent proof of this method is given. The second-order training algorithm is introduced to promote the robust and convergent performance. The control objective is to make the WAPS engine pass the China State-IV emission test under the NEDC. The bench test shows that an excellent control transient performance and significant promotion have been achieved. This article presents a new approach for the engine control and calibration. In addition, it also adds another dimension to the existing literature on the data-driven nonlinear MIMO trade-off emission control of the WAPS engine.
The novel WAPS injector
The structure and principle of the WAPS injector
The modern engine adopts the high-pressure common rail injector to meet the increasing emission and fuel economy requirement. However, China has difficulty or cost to manufacture the high-pressure common rail injector. Because of that, in 2015, the Wuxi Weifu High-Technology Corporation innovatively presented the WAPS injector. The structure and principle are shown in Figure 1. The WAPS injector (1) replaces the electronically controlled high-pressure common rail injector with a VE pump and routine injector and (2) adds a control solenoid valve to control the oil into the mechanical distributor, which replaces the solenoid valve of the electronically controlled injector.

(a) The high-pressure common rail pump of the WAPS injector in the test bench and (b) the demonstration for the structure and principle of the WAPS injector.
In this way, the similar function to that of the Bosch high-pressure common rail injector is achieved with low cost. This novel system can supply injection pressure up to 160 MPa.
The comparison with the counterpart of Bosch
Compared with the Bosch high-pressure common rail system of the CRSN2-16 type, the WAPS feeds oil through the distributor. Therefore, this injector cannot respond as fast as the electronically controlled injector of the Bosch. The adjustment of the injection angle and timing for the WAPS system are also limited by its mechanical structure. This makes its control flexibility and precision worse than that of the Bosch. However, the WAPS injector can still achieve the similar function as that of the Bosch. The WAPS injector is also much cheaper and easier to manufacture. In addition, the low-pressure fuel oil in the return pipeline of the WAPS injector has a relatively lower temperature and thus a better cooling effect. The WAPS injector also does not have the phenomenon of returning oil as long as it injects liking the Bosch. Thus, the WAPS injector has a relatively lower return-oil energy consumption. The performance comparison is shown in Table 1.
The comparison between the WAPS injector and the Bosch high-pressure common rail system of the CRSN2-16 type.
The optimal control principle of the ADP
The DP and cost-to-go function–based control
The DP is based on Bellman’s principle of optimality. 23 Suppose that a discrete-time nonlinear time-varying dynamic system is given as 24
where
where
If using the known cost function
Then, the optimal control
equation (4) is the principle of the optimality for the system equation (1): any strategy that minimizes J in the short term will also minimize the sum of
The ADP method
In order to solve the curse of dimensionality problem, the ADP is introduced. It mainly contains three modules: critic, model and action. Each of them can be implemented with a neural network. By combining the critic and model networks to form a new critic network, it can get a form of action-dependent heuristic dynamic programming (ADHDP). The critic network of the ADHDP implicitly includes a model network, 24 which is shown in Figure 2.

Three modules in a typical ADP and a new critic network of the ADHDP.
Define a new future accumulated cost at time t as 25
In the new structure, the critic network approximates the estimate of the
where
By comparing equation (7) with equation (2), it can yield
The new critic network maps a state and action pair to the cost function value. Thus, the optimal Q function satisfies 26
The action network is trained after the critic network, and its training objective is
Once the optimal Q function is known, the optimal control policy
This is the theory of the ADHDP that achieves the optimal equation (4) and solves the curse of dimensionality.
The nonlinear MIMO ADHDP
Figure 3 shows the principle of the nonlinear MIMO ADHDP. The action network is extended to multi-output

The schematic diagram demonstrating the principle of the MIMO ADHDP. The solid lines represent the signal flow, while the dashed lines are the paths for the weights turning.
The critic network
Symbols are seen in Figure 3. The prediction error of the critic network is defined as equation (6). The gradient vector of the critic network can be the following:22,23
Then, the weights of the critic network can be updated with the following second-order training algorithm 27
In this,
The action network
Symbols are seen in Figure 3. The prediction error of the action network is defined as equation (9). The gradient vector of the action network can be the following:27,28
Then, the weights of the action network can be updated with the following second-order training algorithm 27
In this,
The recursive Levenberg–Marquardt algorithm
The calculation of
The weight update of this algorithm is given as the recursive Levenberg–Marquardt formulations 27
where the forgetting factor is
so that the
In it,
where
The normalization
A normalization is needed for the MIMO ADHDP to confine the critic and action network weights into an appropriate range by 25
The convergence analysis of the nonlinear MIMO ADHDP
Analysis
If the output
Lemma
Consider the discrete-time nonlinear system of equation (1). Suppose that a positive invariant for the system is
Theorem
For a nonlinear MIMO system, its dynamic state is defined as
Then, the performance index
In that,
Proof
This proof adopts the Lyapunov stability criterion. The multiply result of
The j in equation (26) means there is j number of performance index. Thus, the maximal value n of j is limited. The n can also be 1, which is the case of a single control output.
For an optimal cycle
At the initial state of
Taking
Hence,
According to equation (7), the utility function from t to m is
Then, according to equations (7) and (27), the sum of utility function from
As for a very small time step in the dynamic system equation (1), the differential of
Then
equations (28) and (32) show that the Lemma is satisfied. Thus, for not all
Experiment and results
The neural network model of the 4JB1-T engine
A neural network model of the 4JB1-T engine is needed, which is used as a controlled object to interact with the ADHDP controller during the offline training. Data are collected at the NEDC of this engine for a length of about 47,000 samples during each test with an existing controller. The time-lagged recurrent neural network is used to learn the engine model based on the sample data for a high precision.
The five inputs to the neural network–based model are the rail pressure, EGR rate, injection quantity, injection timing, and vehicle speed. The two outputs are the NOx and PM emissions. Validation results for the NOx and PM emissions of the neural network engine model indicate a good match to the real engine data. The maximal relative error precision is controlled within 5%.
The utility function design
For this work, the local cost function can be defined as 34
where
Then, the optimal ADHDP controller is designed according to equation (7) by minimizing
where
The critic network design
The critic network is chosen as a 6-14-1 structure with six input neurons, fourteen hidden layer neurons and one output neuron. This structure is selected based on experience and many trials. The detailed structure of the critic network is seen in the critic frame of Figure 3. The six inputs are the normalized value of the NOx emission, PM emission, rail pressure, EGR rate, injection quantity, and injection timing. The hidden layer uses the following sigmoidal function, and the output layer is linear
The action network design
The structure of the action network is chosen as a 2-9-4 structure with two input neurons, nine hidden layer neurons, and four output neurons. This structure is also selected based on experience and many trials. The detailed structure of the action network is seen in the action frame of Figure 3. The two inputs are the normalized value of the NOx and PM emissions. The four outputs are the rail pressure, EGR rate, injection quantity, and injection timing. Both the hidden layer and output layer use the sigmoidal function of equation (35).
The ADHDP controller parameters are chosen referring to Liu et al. 35 and trials. The practice in the study of Liu et al. 35 shows that these parameters can achieve a relatively satisfying control effect and stable convergence, which are shown in Table 2.
The design parameters of the nonlinear MIMO ADHDP controller.
MIMO ADHDP: multi-input multi-output–action-dependent heuristic dynamic programming
First, the critic network is trained for many cycles with 250 training epochs in each cycle. When its output cannot be further decreased, we stop the critic network training. This training usually needs 3 h. Then, the action network is trained for many cycles with 100 epochs in each cycle, and the optimal control effect is observed. This procedure is repeated until a good control effect is achieved. At least 4700 data points from the sample data (47 000 in the data set) are needed for the critic and action networks training.
The experimental devices
The test bench is shown in Figure 4:
The testing engine: Jiangling 4JB1-T is a four-stroke, four-cylinder, high-speed, mechanical supercharger, direct injection engine. The bore is 93 mm, the stroke is 102 mm, the total displacement is 2.771 L, and with a common rail fuel pump of WAPS injector. The maximum power is 68 kW (3600 r/min) and the maximum torque: 210 Nm (2100 r/min). The IMS engine control unit (ECU) is mounted to run the existing proportional, integral, and derivative (PID) and correcting ADHDP controller.
The testing tools and their purpose are listed as follows: (1) an AVL4000 smoke sensor is used to measure the PM value, (2) a HORIBA emission analyzer is used to measure the NOx, HC, and CO values, (3) two current clamps of Tektronix are used to measure the injecting control current, (4) a rail pressure sensor of Bosch is used to measure the rail pressure, (5) a MEAN WELL data-collecting board of the RSP-1000-27 type is used to collect the sensor data, (6) a speed sensor of the DG6 type is used to test the engine speed, and (7) an Agilent oscilloscope of the DSO7054A type is used to observe the data wave.
The debugging personal computer (PC) software is the Vector-CANape of Germany Vector Company. This software is used to record data and produce a debugging graph.

The test bench of the WAPS engine for the running and emission data.
In addition, the MAHA rotating hub test bench is also needed. It is used to test the NEDC and collect the emission data after the optimal trade-off ADHDP control result.
The experiment design
The analysis and experiment have demonstrated that the emission performance of the WAPS engine is mainly determined by four parameters: the rail pressure, EGR rate, injection quantity, and injection timing. The emission control variables are mainly the NOx and PM values. Therefore, the control objective is to provide proper control signals of the rail pressure, EGR rate, injection quantity, and injection timing to achieve the optimal trade-off control of the NOx and PM emissions. Then, the 4JB1-T engine will be upgraded from the China State-II to State-IV emission regulation. The ADHDP design is adopted. It is one of the most widely used methods in ADP, for it does not need the model of the controlled object. 24
The ADHDP controller is implemented with the MATLAB function. Its design principle is shown in Figure 5. This MATLAB function is embedded into the engine management system of the Simulink in the ECU developed by IMS Company with a laptop. The ADHDP controller acts as a corrector for the original ECU output variables of the PID controller, that is, the rail pressure, EGR rate, injection quantity, and injection timing. The test bench is also equipped with a dSPACE software in the laptop. The TargetLink software of the dSPACE can compile the Simulink module into C code and load it into the ECU to modify the control output, which achieves the optimal trade-off control effect.

The schematic diagram of the ADHDP controller design for the emission control of the WAPS engine.
The values of the NOx and PM emissions are measured under the starting and various running conditions on the MAHA rotating hub test bench. Then, the reasonable optimization standard for the NOx and PM emissions at each stage can be found according to the NEDC data and the China State-IV emission regulation.
First, the ADHDP controller is trained offline with the data from the 4JB1-T WAPS engine on the test bench. Then, the controller is used to online control after it can be offline convergent. The NEDC is used for the emission evaluation (Figure 6).

The real-time vehicle speed of a successful drive cycle following the NEDC.
The total engine running data of a NEDC is recorded. Then, the engine cycle data can be extracted from the NEDC data according to the engine speed and the revolution counting sensor mounted on the flywheel of the 4JB1-T engine.
The experiment results
By repeatedly training the controller offline and modifying its optimal objectives, a successful control may be achieved among many trials. In a successful training, the rotating hub test shows the following control effect: the WAPS engine ultimately achieves the emission demand. Figure 6 is the measured vehicle speed which is strictly followed with the NEDC for emission evaluation. Figures 7 and 8 show that a very good tracking control effect of the NOx has been achieved. The real-time dynamic HC and CO values of a successful drive cycle are also shown in Figures 9 and 10.

A successful NOx emission control effect with the ADHDP controller.

The relationship between the vehicle speed and the NOx value of a successful emission control.

The relationship between the vehicle speed and the real-time HC value of a successful drive cycle.

The relationship between the vehicle speed and the real-time CO value of a successful drive cycle.
From the following Tables 3 and 4, we can also see that the optimal result of the PID controller is not enough. There is still a gap to the China State-III standard. However, the optimal trade-off emission effect is significantly promoted with the ADHDP controller. The most indexes have been upgraded from the China State-II to be close or above to the China State-IV emission regulation. Table 4 and Figures 9 and 10 also show that the emission control of the HC, CO, and so on are not affected by the ADHDP controller. The control effect can be further improved through a finer engine data or more training in the future.
The optimal trade-off emission effect of the WAPS engine with a PID controller and without the ADHDP optimization.
PID: proportional, integral, and derivative; ADHDP: action-dependent heuristic dynamic programming; CO: carbon monoxide; HC: hydrocarbons; NOx: nitrogen oxides; PM: particulate matter.
The optimal trade-off emission effect of the WAPS engine with the ADHDP optimization and a PID controller.
ADHDP: action-dependent heuristic dynamic programming; PID: proportional, integral, and derivative; CO: carbon monoxide; HC: hydrocarbons; NOx: nitrogen oxides; PM: particulate matter.
Remark
This work application designs a MIMO ADHDP control method. The ADHDP controller is essentially to explore the nonlinear coupled relationships among the NOx/PM emissions and the rail pressure, EGR rate, injection quantity, and injection timing in a high-dimension state space. Then, the ADHDP controls the trade-off relationship between the NOx and PM emissions by correcting the rail pressure, EGR rate, injection quantity, and injection timing based on its exploration on the relationship.
The PID control by the IMS Company is still reserved in the ECU. The ADHDP controller calculates its output of the rail pressure, EGR rate, injection quantity, and injection timing. These outputs act as a corrector to compare with the corresponding outputs of the PID controller. Then, it revises the ECU output and achieves its optimal control objective.
This application achieves the above intelligent control effect without considering its influence on the drivability and fuel economy. Despite this, the work is still a complicated control problem for a nonlinear MIMO coupled system. If the drivability and fuel economy are considered, it is a constrained control problem but more complicated. This will be researched in the next work.
The control effect is compared with the PID method in Tables 3 and 4. The PID controller is not proper for the nonlinear MIMO coupled system. It usually needs to decouple the system and, respectively, control it. The control coordination of the PID is not very well, and its integrated optimization effect is not satisfying. Clearly, the ADHDP method overcomes this shortage. The ADHDP control can optimize the emissions for all load conditions. It does not need the model of the controlled objective.
In addition, during the preliminary research stage, the ADHDP controller mainly aims at NEDC learning for the WAPS engine to pass the China-IV emission test. As neural networks themselves have a generalization ability, the ADHDP controller also has a certain generalization ability. However, this generalization ability needs to be tested and improved. Nevertheless, this ADHDP controller still shows its strong optimizing capability for the complicated nonlinear MIMO coupled system during the NEDC test.
Conclusion
This work upgrades the emission performance of the 4JB1-T WAPS engine from the China State-II to nearly the China State-IV regulation with the WAPS injector and the optimal ADHDP controller. The rotating hub test shows that the optimal ADHDP control design is an effective data-driven optimal method for the nonlinear MIMO coupled system, such as the engine emission control. The novel WAPS injector can also achieve a similar function as that of the Bosch high-pressure common rail system with a low cost.
For all that this article proposes a new data-driven approach for the engine control and calibration. It also adds another dimension to the existing literature on the WAPS engine emission control. Thus, the presented controller may have the potential to outperform the existing controllers with regard to three aspects:
The ADHDP controller does not require a mathematical model of the controlled system. This is because it can automatically learn the inherent dynamics and nonlinearities of the engine from the real engine data. It is a real data-driven method and has some meaning.
The proposed controller adopts the principle of the DP, artificial neural network, and feedback control, which makes it an optimal trade-off controller of a highly regarded intelligence and performance.
This controller can also learn to improve its performance during the actual vehicle operations and will adapt to uncertain changes of the environment and vehicle conditions. This is an inherent feature of the neural network learning controller. Thus, this technique may have the promise of an adaptive controller.
Although illustrated for the engine control, this ADHDP control system framework can also be applicable to a general data-driven nonlinear MIMO system.
Footnotes
Handling Editor: Haiping Du
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the NSFC projects of China under grant nos 61403250, 51779136, and 51509151, the bureau project of China under grant no. 2015HT056, and the Science Commission of Shanghai under grant no. 13510501600.
