Sage Journals: Discover world-class research

Abstract

Approximate dynamic programming is an effective optimal control method. This article researches a data-driven approximate dynamic programming. The method is extended to a nonlinear multi-input multi-output form. Using the data from a unique 4JB1-T weifu accumulator pump system (WAPS) engine, the developed approximate dynamic programming controller is trained to achieve its optimal trade-off emission control between the nitrogen oxides and particulate matter. The convergent proof of this method is given. The second-order training algorithm is introduced to promote the robust and convergent performance. The control objective is to let the WAPS engine pass the China State-IV emission test under the New European Drive Cycle. The bench test shows that an excellent control transient performance and significant promotion have been achieved. This article presents a new approach for the engine control and calibration. In addition, it also adds another dimension to the existing literature on the data-driven nonlinear multi-input multi-output trade-off emission control of the WAPS engine.

Keywords

Approximate dynamic programming WAPS engine optimal trade-off emission control nitrogen oxides particulate matter

Introduction

The 4JB1-T weifu accumulator pump system (WAPS) engine is to pass the China State-IV emission test. However, it has difficulty in the nitrogen oxides (NOx) and particulate matter (PM) emission control. These two emissions are a hard trade-off relationship for the engine. At present, there are mainly three kinds of technologies for this emission control.

Some scholars solved this problem mainly by adding an after-treatment equipment. For example, in 1996, Summers et al.¹ adopted a cerium fuel-borne catalyst/filter/exhaust gas recirculation (EGR) system for the simultaneous control of the engine PM and NOx emissions. In 2010, Rathore et al.² applied an activated carbon ﬁbers functionalized with ammonia for this control. In 2014, Feng et al.³ used a particulate oxidation catalyst to improve the control effect of the NOx and PM emissions. Today, the particulate filter has become a standard device to reduce the PM emission of a diesel engine.⁴

Recently, the fuel blends with a certain control strategy or the clean energy technology is popular, such as the diesel–alcohol ether blends,^5–7 the diesel–hydrogen blends,⁸ the diesel–nature gas blends,⁹ and the diesel–oxygenates blends.¹⁰ For example, in 2012, Lin et al.¹¹ reduced the NOx and PM emissions by adding a water-containing butanol into a diesel engine. In 2016, Kumar et al.¹²used some advanced bio-fuels to overcome the smoke and NOx trade-off. In 2017, Kumar et al.¹⁰ proposed a multi-response optimization to screen suitable diesel-oxygenate blends for achieving a simultaneous reduction of smoke and NOx.

Other scholars mainly utilized the optimal control strategy. For example, in 2013, Tschanz et al.¹³ used a feedback control for the optimal trade-off PM and NOx emissions. In 2014, Nikzadfar and Shamekhi¹⁴ used the neural network method to control the soot and PM emissions of a common rail diesel engine. In 2015, Fang used a response surface method to experimentally optimize the engine emission. The NOx and soot emissions were reduced by 79% and 50% at the low load, and by 72% and 27% at the high load.¹⁵ In 2016, Deb et al.¹⁶ applied the artificial intelligence method to control the emission (including the trade-off soot/NOx) of a single cylinder engine, that is, the neural network with a fuzzy logic–based topology optimization method. In 2016, Liu and Song¹⁷ used the post-injection strategy to regulate the exhaust and PM emissions of a high-speed direct injection engine. In 2016, Divekar et al.¹⁸ carried out an empirical investigation and parametric analysis to assess the impact of the EGR in attaining an ultra-low NOx emission while minimizing the smoke. In 2016, Liu et al.¹⁹ proposed a novel λ-based EGR modulation method. The NOx overshoot is eliminated without harming the soot too much. In 2017, Hu et al.²⁰ obtained the optimal engine design parameters with a multi-objective genetic algorithm which deals the trade-off between NOx and soot.

Generally speaking, the in-cylinder control strategy should be adopted in preference. This is because the after-treatment technology needs an additional costly hardware or the fuel blends are not available in some occasions. For the WAPS engine, its in-cylinder control requires an integrated optimization of the rail pressure, EGR rate, and fuel injection. Then, the optimal controller makes the WAPS engine pass the China State-IV emission test for the NOx, PM under a New European Drive Cycle (NEDC) (the hydrocarbons (HC), carbon monoxide (CO), and carbon dioxide (CO₂) emission have been achieved). This is a typical optimal control problem of a constrained, nonlinear, coupled, multi-input multi-output (MIMO), time-varying system. Therefore, this article will try a new method which is very suitable for this control, although the above have listed many control strategies.

Today, the approximate dynamic programming (ADP) has overcome the curse of dimensionality of the dynamic programming (DP).²¹ This makes the ADP become an effective optimal control method. This method is especially proper to solve the control cases, most of which can be formulated as a cost minimization or maximization problem.²² Thus, this article researches a data-driven ADP method, and it does not need the model of its controlled object. Then, this method is extended to a nonlinear MIMO form. Using the data from a unique testing 4JB1-T WAPS engine, the developed ADP controller is trained to achieve the optimal trade-off emission control between the NOx and PM. The convergent proof of this method is given. The second-order training algorithm is introduced to promote the robust and convergent performance. The control objective is to make the WAPS engine pass the China State-IV emission test under the NEDC. The bench test shows that an excellent control transient performance and significant promotion have been achieved. This article presents a new approach for the engine control and calibration. In addition, it also adds another dimension to the existing literature on the data-driven nonlinear MIMO trade-off emission control of the WAPS engine.

The novel WAPS injector

The structure and principle of the WAPS injector

The modern engine adopts the high-pressure common rail injector to meet the increasing emission and fuel economy requirement. However, China has difficulty or cost to manufacture the high-pressure common rail injector. Because of that, in 2015, the Wuxi Weifu High-Technology Corporation innovatively presented the WAPS injector. The structure and principle are shown in Figure 1. The WAPS injector (1) replaces the electronically controlled high-pressure common rail injector with a VE pump and routine injector and (2) adds a control solenoid valve to control the oil into the mechanical distributor, which replaces the solenoid valve of the electronically controlled injector.

Figure 1.

(a) The high-pressure common rail pump of the WAPS injector in the test bench and (b) the demonstration for the structure and principle of the WAPS injector.

In this way, the similar function to that of the Bosch high-pressure common rail injector is achieved with low cost. This novel system can supply injection pressure up to 160 MPa.

The comparison with the counterpart of Bosch

Compared with the Bosch high-pressure common rail system of the CRSN2-16 type, the WAPS feeds oil through the distributor. Therefore, this injector cannot respond as fast as the electronically controlled injector of the Bosch. The adjustment of the injection angle and timing for the WAPS system are also limited by its mechanical structure. This makes its control flexibility and precision worse than that of the Bosch. However, the WAPS injector can still achieve the similar function as that of the Bosch. The WAPS injector is also much cheaper and easier to manufacture. In addition, the low-pressure fuel oil in the return pipeline of the WAPS injector has a relatively lower temperature and thus a better cooling effect. The WAPS injector also does not have the phenomenon of returning oil as long as it injects liking the Bosch. Thus, the WAPS injector has a relatively lower return-oil energy consumption. The performance comparison is shown in Table 1.

Table 1.

The comparison between the WAPS injector and the Bosch high-pressure common rail system of the CRSN2-16 type.

Items	WAPS	Bosch
Responding speed of injection	Relatively slow	Relatively fast
Adjustments of injection angle and timing	Limited, relatively less accurate	Flexible, precise control
Manufacture/cost	Easy/low	Not easy/high
Cooling effect of high-speed solenoid valve	Relatively well	Relatively low
Return-oil energy consumption	Relatively low	Relatively high

The optimal control principle of the ADP

The DP and cost-to-go function–based control

The DP is based on Bellman’s principle of optimality.²³ Suppose that a discrete-time nonlinear time-varying dynamic system is given as²⁴

x (t + 1) = a [x (t)] + b [x (t)] u (t)

(1)

where $x \in R^{n}$ is the state vector of the system, $a, b$ are the variable coefficients, $u \in R^{m}$ is the control action, and t is the system time step. The cost function of this system is supposed as

J [x (t), t] = \sum_{i = t}^{\infty} α^{i - t} γ [x (i), u (i), i]

(2)

where $γ$ is the utility function and $α$ is the discount factor with $0 < α \leq 1$ . The objective is to choose a control sequence $u (i), i = t, t + 1, \dots$ , so that the J function (i.e. the cost) in equation (2) is minimized.

If using the known cost function $J^{*} [x (t + 1), t + 1]$ of the optimal control sequence from time $t + 1$ on and an arbitrary control variable $u (t)$ of time t, the optimal cost function from time t on can be

J^{*} [x (t), t] = min_{u (t)} (γ [x (t), u (t), t] + α J^{*} [x (t + 1), t + 1])

(3)

Then, the optimal control $u^{*} (t)$ of time t is the one that achieves this minimum

u^{*} (t) = \arg min_{u (t)} (γ [x (t), u (t), t] + α J^{*} [x (t + 1), t + 1])

(4)

equation (4) is the principle of the optimality for the system equation (1): any strategy that minimizes J in the short term will also minimize the sum of $γ$ over all future times.²⁴ However, the DP suffers from the curse of dimensionality problem.²¹

The ADP method

In order to solve the curse of dimensionality problem, the ADP is introduced. It mainly contains three modules: critic, model and action. Each of them can be implemented with a neural network. By combining the critic and model networks to form a new critic network, it can get a form of action-dependent heuristic dynamic programming (ADHDP). The critic network of the ADHDP implicitly includes a model network,²⁴ which is shown in Figure 2.

Figure 2.

Three modules in a typical ADP and a new critic network of the ADHDP.

Define a new future accumulated cost at time t as²⁵

R (t) = γ (t + 1) + α γ (t + 2) + \dots

(5)

In the new structure, the critic network approximates the estimate of the $J (t + 1)$ in equation (2), that is, the $R (t)$ in equation (5). This is achieved by minimizing the following error measure over time²⁴

‖ E_{c} (t) ‖ = \sum_{t} \frac{1}{2} e_{c}^{2} (t) = \sum_{t} \frac{1}{2} {(α Q (t) - [Q (t - 1) - γ (t)])}^{2}

(6)

where $Q (t) = Q [x (t), u (t), t, w_{c} (t)]$ and $w_{c} (t)$ are the weights of the new critic network. When $E_{c} (t) = 0$ for all time t, equation (6) implies that

\begin{matrix} Q (t) = γ (t + 1) + α Q (t + 1) = γ (t + 1) \\ + α [γ (t + 2) + α Q (t + 2)] \\ = \dots = \sum_{i = t + 1}^{\infty} α^{i - t - 1} γ (i) \end{matrix}

(7)

By comparing equation (7) with equation (2), it can yield $Q (t) = J [x (t + 1), t + 1]$ . Therefore, when minimizing the error measure in equation (6), the neural network is trained so that its output approximates the cost of equation (2) for $i = t + 1$ , that is, the cost function of the next time step.²²

The new critic network maps a state and action pair to the cost function value. Thus, the optimal Q function satisfies²⁶

Q^{*} (x (t), u (t)) = γ [x (t), u (t), t] + α min_{u (t + 1)} Q^{*} [x (t + 1), u (t + 1)]

(8)

The action network is trained after the critic network, and its training objective is

E_{a} (t) = \frac{1}{2} Q^{2} (t) = 0

(9)

Once the optimal Q function is known, the optimal control policy $u^{*} (t)$ can be yielded by²⁶

u^{*} (t) = \arg min_{u (t)} Q^{*} (x (t), u (t))

(10)

This is the theory of the ADHDP that achieves the optimal equation (4) and solves the curse of dimensionality.

The nonlinear MIMO ADHDP

Figure 3 shows the principle of the nonlinear MIMO ADHDP. The action network is extended to multi-output $u (t) = [u_{1} (t), u_{2} (t), \dots, u_{m} (t)]$ , as well as the input to the critic network. The dashed lines are the paths for the weights turning of the critic and action networks.

Figure 3.

The schematic diagram demonstrating the principle of the MIMO ADHDP. The solid lines represent the signal flow, while the dashed lines are the paths for the weights turning.

The critic network

Symbols are seen in Figure 3. The prediction error of the critic network is defined as equation (6). The gradient vector of the critic network can be the following:^22,23

$Δ w_{c_{i}}^{(2)} (t)$ (the hidden layer to the output layer)

Δ w_{c_{i}}^{(2)} (t) = \frac{\partial E_{c} (t)}{\partial w_{c_{i}}^{(2)} (t)} = \frac{\partial E_{c} (t)}{\partial Q (t)} \cdot \frac{\partial Q (t)}{\partial w_{c_{i}}^{(2)} (t)}

(11)

$Δ w_{c_{i}}^{(1)} (t)$ (the input layer to the hidden layer)

Δ w_{c_{i j}}^{(1)} (t) = \frac{\partial E_{c} (t)}{\partial w_{c_{i j}}^{(1)} (t)} = \frac{\partial E_{c} (t)}{\partial Q (t)} \cdot \frac{\partial Q (t)}{\partial w_{c_{i j}}^{(1)} (t)}

(12)

Then, the weights of the critic network can be updated with the following second-order training algorithm²⁷

w_{c} (t + 1) = w_{c} (t) + P (t) \cdot Δ w_{c} (t) \cdot e_{c} (t)

(13)

In this, $P (t)$ is the Gauss–Newton Hessian matrix; $i = 1, 2, \dots, N_{h 1}$ is the hidden node number of the critic network; $j = 1, 2, \dots, n + m$ is the input variable number of the critic network.

The action network

Symbols are seen in Figure 3. The prediction error of the action network is defined as equation (9). The gradient vector of the action network can be the following:^27,28

$Δ w_{a_{k i}}^{(2)} (t)$ (the hidden layer to the output layer)

Δ w_{a_{k i}}^{(2)} (t) = \frac{\partial E_{a} (t)}{\partial w_{a_{k i}}^{(2)} (t)} = \frac{\partial E_{a} (t)}{\partial Q (t)} \cdot \frac{\partial Q (t)}{\partial u_{k} (t)} \cdot \frac{\partial u (t)}{\partial w_{a_{k i}}^{(2)} (t)}

(14)

$Δ w_{a_{i j}}^{(1)} (t)$ (the input layer to the hidden layer)

Δ w_{a_{i j}}^{(1)} (t) = \sum_{k = 1}^{m} [\frac{\partial E_{a} (t)}{\partial w_{a_{i j}}^{(1)} (t)}] = \sum_{k = 1}^{m} [\frac{\partial E_{a} (t)}{\partial Q (t)} \cdot \frac{\partial Q (t)}{\partial u_{k} (t)} \cdot \frac{\partial u_{k} (t)}{\partial w_{a_{i j}}^{(1)} (t)}]

(15)

Then, the weights of the action network can be updated with the following second-order training algorithm²⁷

w_{a} (t + 1) = w_{a} (t) + P (t) \cdot Δ w_{a} (t) \cdot e_{a} (t)

(16)

In this, $P (t)$ is also the Gauss–Newton Hessian matrix; $i = 1, 2, \dots, N_{h 2}$ is the hidden node number of the action network; $j = 1, 2, \dots, n$ is the input variable number of the action network; $k = 1, 2, \dots, m$ is the control variable number of the action network.

The recursive Levenberg–Marquardt algorithm

The calculation of $w_{c} (t)$ in equation (13) and $w_{a} (t)$ in equation (16) adopts the second-order training algorithm.²⁷ The reason for considering this method is that the original ADHDP training algorithm²⁵ has less convergence property. Especially, for the emission control in this article, the controller output must track the optimal target, instead of converging to a fixed weight. However, the second-order training algorithm has a more robust and stable performance.

The weight update of this algorithm is given as the recursive Levenberg–Marquardt formulations²⁷

w (t + 1) = w (t) + P (t) \cdot Δ w (t) \cdot e (t)

(17)

\begin{array}{l} P (t) = \frac{1}{α (t)} [P (t - 1) - P (t - 1) Ω (t) S^{- 1} (t) Ω^{T} (t) P (t - 1)] \\ w i t h P (t) = \frac{1}{t r a c e [P (t)]} P (t) \end{array}

(18)

S (t) = β (t) Λ (t) + Ω^{T} (t) P (t - 1) Ω (t)

(19)

where the forgetting factor is

β (t) = \bar{β} β (t - 1) + (1 - \bar{β})

(20)

so that the $P (t)$ converges faster from its initial value, and $β (t) \to 1$ as $t \to \infty$ will provide stability. The $\bar{β}$ is a scalar of $\bar{β} < 1$ which determines the rate of $β (t)$ convergence to 1.

In it, $trace [P (t)]$ is the trace of the matrix $P (t)$ and $Ω (t)$ is a $N_{w} \times 2$ matrix with the first column containing $Δ w (t)$ and the second column consisting of a $N_{w} \times 1$ zero vector with one element set to 1, that is

\begin{array}{l} Ω^{T} (t) = [\begin{matrix} Δ w (t) \\ 0 & \dots & 1 & \dots & 0 \end{matrix}] \\ ⇑ \\ p o s i t i o n = t \mod (N_{w}) + 1 and Λ {(t)}^{- 1} = [\begin{matrix} 1 & 0 \\ 0 & ρ \end{matrix}] \end{array}

(21)

where $ρ$ is a regulating term to one diagonal element of $P (t)$ , and $N_{w}$ is the number of the critic or action network weights.

The normalization

A normalization is needed for the MIMO ADHDP to confine the critic and action network weights into an appropriate range by²⁵

w_{c} (t + 1) = \frac{w_{c} (t) + Δ w_{c} (t)}{‖ w_{c} (t) + Δ w_{c} (t) ‖}

(22)

w_{a} (t + 1) = \frac{w_{a} (t) + Δ w_{a} (t)}{‖ w_{a} (t) + Δ w_{a} (t) ‖}

(23)

The convergence analysis of the nonlinear MIMO ADHDP

Analysis

If the output $Q (t)$ of the critic network is as close to zero as possible, then the sum of $r (t + 1)$ and $Q (t + 1)$ in equation (7) at this stage will be simultaneously minimized. This strategy indirectly enables the action network to produce an optimal control action (see equation (10)).²⁷ Thus, the convergence of $Q (t)$ in the ADHDP is the most important. If $Q (t)$ is convergent to zero, this method is necessarily convergent and can achieve an optimal control.^29,30

Lemma

Consider the discrete-time nonlinear system of equation (1). Suppose that a positive invariant for the system is $D \subseteq R^{n}$ , which contains the origin and is the unique equilibrium. $V (x (t), t)$ is a Lyapunov function in D. The system is asymptotically stable in the region of attraction D if there exists class $K_{\infty}$ functions $κ_{1} (\cdot)$ , $κ_{2} (\cdot)$ , and $κ_{3} (\cdot)$ , such that^31,32

0 < κ_{1} (‖ x (t) ‖) \leq V (x (t), t) \leq κ_{2} (‖ x (t) ‖)

(24)

V (x (t + 1), t + 1) - V (x (t), t) \leq - κ_{3} (‖ x (t) ‖) < 0

(25)

Theorem

For a nonlinear MIMO system, its dynamic state is defined as $X = (X_{1}, X_{2}, \dots, X_{t}, \dots)$ , and every state vector $X_{t}$ has the same dimension j. The utility function of time t is defined as the following square-weighted sum form

\begin{matrix} γ [X (t), U (t), t] = \sum_{j = 1}^{n} \frac{1}{2} e_{j}^{T} (t) e_{j} (t) \\ = \sum_{j = 1}^{n} \frac{1}{2} {[\frac{x_{j} (t) - x_{j}^{*} (t)}{x_{min, j} (t)}]}^{T} [\frac{x_{j} (t) - x_{j}^{*} (t)}{x_{min, j} (t)}] \end{matrix}

(26)

Then, the performance index $Q (t)$ defined in equation (7) and approximated by the critic network in equation (9) is convergent.

In that, $e_{j} (t)$ is the normalized state error of time t, $x_{j}^{*} (t)$ is the optimal objective state of time t, $x_{max, j} (t)$ and $x_{min, j} (t)$ are, respectively, the maximal and minimal state $x_{j} (t)$ of all time t, $x_{max, j} (t) \neq 0$ , and $x_{max, j} (t) \neq 0$ .

Proof

This proof adopts the Lyapunov stability criterion. The multiply result of $e_{k}^{T} (t) e_{k} (t)$ is a square form, which can meet a positive definite condition. In addition, according to equation (1), $x (t + 1)$ has hidden $u (t)$ in it.

The j in equation (26) means there is j number of performance index. Thus, the maximal value n of j is limited. The n can also be 1, which is the case of a single control output.

For an optimal cycle $t = 0, 1, 2, \dots, m$ and $j = 1, 2, \dots, n$ , according to equations (7) and (26), it yields

\begin{matrix} Q (t) = \sum_{k = t + 1}^{m} α^{k - t - 1} γ [X (k), U (k), k] \\ = \sum_{k = t + 1}^{m} α^{k - t - 1} (\sum_{j = 1}^{n} \frac{1}{2} e_{j}^{T} (k) e_{j} (k)) \\ = \sum_{k = t + 1}^{m} α^{k - t - 1} (\sum_{j = 1}^{n} \frac{1}{2} {‖ \frac{x_{j} (k) - x_{j}^{*} (k)}{x_{min, j} (k)} ‖}^{2}) \end{matrix}

(27)

At the initial state of $t = 0$ , the control system has $X (t) = 0$ . Then, the action network output has $U (t) = 0$ , and the utility function has $γ [X (t), U (t), t] = 0$ , $Q (t) = 0$ . Thus, the initial equilibrium state can be satisfied.

Taking $Q (t)$ as the Lyapunov function, it can be seen that

\begin{array}{l} 0 < \sum_{k = t + 1}^{m} α^{k - t - 1} (\sum_{j = 1}^{n} \frac{1}{2} {‖ \frac{x_{j} (k) - x_{j}^{*} (k)}{x_{\max, j} (k)} ‖}^{2}) \leq Q (t) \\ \leq \sum_{k = t + 1}^{m} α^{k - t - 1} (\sum_{j = 1}^{n} \frac{1}{2} {‖ \frac{x_{\max, j} (k) - x_{\min, j} (k)}{x_{\min, j} (k)} ‖}^{2}) \end{array}

(28)

Hence, $Q (t)$ is positive definite, strictly increasing, and radially unbounded.

According to equation (7), the utility function from t to m is

Q (t) = γ (t + 1) + α Q (t + 1)

(29)

Then, according to equations (7) and (27), the sum of utility function from $t + 1$ to m can be yielded as

\begin{array}{l} α γ [X (t + 2), U (t + 2), t + 2] \\ + α^{2} γ [X (t + 3), U (t + 3), t + 3] \\ + α^{3} γ [X (t + 4), U (t + 4), t + 4] \\ + \dots + \dots = α Q (t + 1) \end{array}

(30)

As for a very small time step in the dynamic system equation (1), the differential of $Q (t)$ can be replaced with its difference of two adjoining times. Thus, $Q (t)$ can be replaced with $Δ Q (t)$ in this proof. Then, combining equations (29) and (30), it yields

\begin{matrix} \overset{\cdot}{Q} (t) ≅ Δ Q (t) = \frac{α Q (t + 1) - Q (t)}{(t + 1) - t} \\ = - γ [X (t + 1), U (t + 1), t + 1] \\ = - (\sum_{j = 1}^{n} \frac{1}{2} {[\frac{x_{j} (t + 1) - x_{j}^{*} (t + 1)}{x_{min, j} (t + 1)}]}^{T} [\frac{x_{j} (t + 1) - x_{j}^{*} (t + 1)}{x_{min, j} (t + 1)}]) \\ = - \sum_{j = 1}^{n} {\frac{1}{2} ‖ \frac{x_{j} (t + 1) - x_{j}^{*} (t + 1)}{x_{min, j} (t + 1)} ‖}^{2} \end{matrix}

(31)

Then

\overset{\cdot}{Q} (t) \leq - \sum_{j = 1}^{n} {\frac{1}{2} ‖ \frac{x_{j} (t + 1) - x_{j}^{*} (t + 1)}{x_{max, j} (t + 1)} ‖}^{2} < 0

(32)

equations (28) and (32) show that the Lemma is satisfied. Thus, for not all $x_{j} (t + 1) = 0$ , the nonlinear MIMO ADHDP whose utility function is defined as a square-weighted sum form is asymptotically stable in Lyapunov sense and is convergent.^31–33

Experiment and results

The neural network model of the 4JB1-T engine

A neural network model of the 4JB1-T engine is needed, which is used as a controlled object to interact with the ADHDP controller during the offline training. Data are collected at the NEDC of this engine for a length of about 47,000 samples during each test with an existing controller. The time-lagged recurrent neural network is used to learn the engine model based on the sample data for a high precision.

The five inputs to the neural network–based model are the rail pressure, EGR rate, injection quantity, injection timing, and vehicle speed. The two outputs are the NOx and PM emissions. Validation results for the NOx and PM emissions of the neural network engine model indicate a good match to the real engine data. The maximal relative error precision is controlled within 5%.

The utility function design

For this work, the local cost function can be defined as³⁴

\begin{array}{l} γ [x (t), u (t), t] = \frac{1}{2} {[N O x (t) - N O x^{*} (t)]}^{2} \\ + \frac{1}{2} {[P M (t) - P M^{*} (t)]}^{2} \end{array}

(33)

where $NOx (t)$ and $PM (t)$ are the measured real-time values of the NO_X and PM emissions of the WAPS engine. The $NO x^{*} (t)$ and $P M^{*} (t)$ are the optimal objective values of the NO_x and PM emissions at each time step.

Then, the optimal ADHDP controller is designed according to equation (7) by minimizing

Q (t) = min_{u \in ℜ} \sum_{i = t}^{N - 1} α^{i - t} γ [x (t), u (t), t], 0 < α < 1

(34)

where $ℜ$ represents the constraints of u imposed by meeting the speed and load demands of the NEDC, and N is the finite horizon time length of this drive cycle. As proved in section “The convergence analysis of the nonlinear MIMO ADHDP,” the utility function defined in this way will make the MIMO ADHDP controller convergent. Furthermore, this defined utility function will make the actual emission values of $NOx (t)$ and $PM (t)$ track its optimal objective values of $NO x^{*} (t)$ and $P M^{*} (t)$ .

The critic network design

The critic network is chosen as a 6-14-1 structure with six input neurons, fourteen hidden layer neurons and one output neuron. This structure is selected based on experience and many trials. The detailed structure of the critic network is seen in the critic frame of Figure 3. The six inputs are the normalized value of the NOx emission, PM emission, rail pressure, EGR rate, injection quantity, and injection timing. The hidden layer uses the following sigmoidal function, and the output layer is linear

y = \frac{1 - e^{- x}}{1 + e^{- x}}

(35)

The action network design

The structure of the action network is chosen as a 2-9-4 structure with two input neurons, nine hidden layer neurons, and four output neurons. This structure is also selected based on experience and many trials. The detailed structure of the action network is seen in the action frame of Figure 3. The two inputs are the normalized value of the NOx and PM emissions. The four outputs are the rail pressure, EGR rate, injection quantity, and injection timing. Both the hidden layer and output layer use the sigmoidal function of equation (35).

The ADHDP controller parameters are chosen referring to Liu et al.³⁵ and trials. The practice in the study of Liu et al.³⁵ shows that these parameters can achieve a relatively satisfying control effect and stable convergence, which are shown in Table 2.

Table 2.

The design parameters of the nonlinear MIMO ADHDP controller.

Items	The critic network	The action network
Learning rate	l_c(t) = 0.3, decreases 0.05 per iteration until 0.01	l_a(t) = 0.3, decreases 0.05 per iteration until 0.01
Desired training error	T_c = 0.05	T_a = 0.005
Maximum training cycles	100 times	500 times
Discount factor	α = 0.95
Network weights initialization	(–1, +1), random

MIMO ADHDP: multi-input multi-output–action-dependent heuristic dynamic programming

First, the critic network is trained for many cycles with 250 training epochs in each cycle. When its output cannot be further decreased, we stop the critic network training. This training usually needs 3 h. Then, the action network is trained for many cycles with 100 epochs in each cycle, and the optimal control effect is observed. This procedure is repeated until a good control effect is achieved. At least 4700 data points from the sample data (47 000 in the data set) are needed for the critic and action networks training.

The experimental devices

The test bench is shown in Figure 4:

The testing engine: Jiangling 4JB1-T is a four-stroke, four-cylinder, high-speed, mechanical supercharger, direct injection engine. The bore is 93 mm, the stroke is 102 mm, the total displacement is 2.771 L, and with a common rail fuel pump of WAPS injector. The maximum power is 68 kW (3600 r/min) and the maximum torque: 210 Nm (2100 r/min). The IMS engine control unit (ECU) is mounted to run the existing proportional, integral, and derivative (PID) and correcting ADHDP controller.

The testing tools and their purpose are listed as follows: (1) an AVL4000 smoke sensor is used to measure the PM value, (2) a HORIBA emission analyzer is used to measure the NOx, HC, and CO values, (3) two current clamps of Tektronix are used to measure the injecting control current, (4) a rail pressure sensor of Bosch is used to measure the rail pressure, (5) a MEAN WELL data-collecting board of the RSP-1000-27 type is used to collect the sensor data, (6) a speed sensor of the DG6 type is used to test the engine speed, and (7) an Agilent oscilloscope of the DSO7054A type is used to observe the data wave.

The debugging personal computer (PC) software is the Vector-CANape of Germany Vector Company. This software is used to record data and produce a debugging graph.

Figure 4.

The test bench of the WAPS engine for the running and emission data.

In addition, the MAHA rotating hub test bench is also needed. It is used to test the NEDC and collect the emission data after the optimal trade-off ADHDP control result.

The experiment design

The analysis and experiment have demonstrated that the emission performance of the WAPS engine is mainly determined by four parameters: the rail pressure, EGR rate, injection quantity, and injection timing. The emission control variables are mainly the NOx and PM values. Therefore, the control objective is to provide proper control signals of the rail pressure, EGR rate, injection quantity, and injection timing to achieve the optimal trade-off control of the NOx and PM emissions. Then, the 4JB1-T engine will be upgraded from the China State-II to State-IV emission regulation. The ADHDP design is adopted. It is one of the most widely used methods in ADP, for it does not need the model of the controlled object.²⁴

The ADHDP controller is implemented with the MATLAB function. Its design principle is shown in Figure 5. This MATLAB function is embedded into the engine management system of the Simulink in the ECU developed by IMS Company with a laptop. The ADHDP controller acts as a corrector for the original ECU output variables of the PID controller, that is, the rail pressure, EGR rate, injection quantity, and injection timing. The test bench is also equipped with a dSPACE software in the laptop. The TargetLink software of the dSPACE can compile the Simulink module into C code and load it into the ECU to modify the control output, which achieves the optimal trade-off control effect.

Figure 5.

The schematic diagram of the ADHDP controller design for the emission control of the WAPS engine.

The values of the NOx and PM emissions are measured under the starting and various running conditions on the MAHA rotating hub test bench. Then, the reasonable optimization standard for the NOx and PM emissions at each stage can be found according to the NEDC data and the China State-IV emission regulation.

First, the ADHDP controller is trained offline with the data from the 4JB1-T WAPS engine on the test bench. Then, the controller is used to online control after it can be offline convergent. The NEDC is used for the emission evaluation (Figure 6).

Figure 6.

The real-time vehicle speed of a successful drive cycle following the NEDC.

The total engine running data of a NEDC is recorded. Then, the engine cycle data can be extracted from the NEDC data according to the engine speed and the revolution counting sensor mounted on the flywheel of the 4JB1-T engine.

The experiment results

By repeatedly training the controller offline and modifying its optimal objectives, a successful control may be achieved among many trials. In a successful training, the rotating hub test shows the following control effect: the WAPS engine ultimately achieves the emission demand. Figure 6 is the measured vehicle speed which is strictly followed with the NEDC for emission evaluation. Figures 7 and 8 show that a very good tracking control effect of the NO_x has been achieved. The real-time dynamic HC and CO values of a successful drive cycle are also shown in Figures 9 and 10.

Figure 7.

A successful NOx emission control effect with the ADHDP controller.

Figure 8.

The relationship between the vehicle speed and the NOx value of a successful emission control.

Figure 9.

The relationship between the vehicle speed and the real-time HC value of a successful drive cycle.

Figure 10.

The relationship between the vehicle speed and the real-time CO value of a successful drive cycle.

From the following Tables 3 and 4, we can also see that the optimal result of the PID controller is not enough. There is still a gap to the China State-III standard. However, the optimal trade-off emission effect is significantly promoted with the ADHDP controller. The most indexes have been upgraded from the China State-II to be close or above to the China State-IV emission regulation. Table 4 and Figures 9 and 10 also show that the emission control of the HC, CO, and so on are not affected by the ADHDP controller. The control effect can be further improved through a finer engine data or more training in the future.

Table 3.

The optimal trade-off emission effect of the WAPS engine with a PID controller and without the ADHDP optimization.

Type-I test	CO (g/km)	HC (g/km)	NOx (g/km)	HC + NOx (g/km)	PM (g/km)
State-IV value	0.74		0.39	0.46	0.06
Measured value	1.533	0.635	1.354	1.989	0.0695
State-IV fuel consumption		8.50 (L/100 km)
Measured fuel consumption		9.70 (L/100 km)
Vehicle weight		1.81 (ton)

PID: proportional, integral, and derivative; ADHDP: action-dependent heuristic dynamic programming; CO: carbon monoxide; HC: hydrocarbons; NOx: nitrogen oxides; PM: particulate matter.

Table 4.

The optimal trade-off emission effect of the WAPS engine with the ADHDP optimization and a PID controller.

Type-I test	CO (g/km)	HC (g/km)	NOx (g/km)	HC + NOx (g/km)	PM (g/km)
State-IV value	0.74		0.39	0.46	0.06
Measured value	0.19	0.03	0.408	0.452	0.047
State-IV fuel consumption		8.50 (L/100 km)
Measured fuel consumption		8.77 (L/100 km)
Vehicle weight		1.81 (ton)

ADHDP: action-dependent heuristic dynamic programming; PID: proportional, integral, and derivative; CO: carbon monoxide; HC: hydrocarbons; NOx: nitrogen oxides; PM: particulate matter.

Remark

This work application designs a MIMO ADHDP control method. The ADHDP controller is essentially to explore the nonlinear coupled relationships among the NOx/PM emissions and the rail pressure, EGR rate, injection quantity, and injection timing in a high-dimension state space. Then, the ADHDP controls the trade-off relationship between the NOx and PM emissions by correcting the rail pressure, EGR rate, injection quantity, and injection timing based on its exploration on the relationship.

The PID control by the IMS Company is still reserved in the ECU. The ADHDP controller calculates its output of the rail pressure, EGR rate, injection quantity, and injection timing. These outputs act as a corrector to compare with the corresponding outputs of the PID controller. Then, it revises the ECU output and achieves its optimal control objective.

This application achieves the above intelligent control effect without considering its influence on the drivability and fuel economy. Despite this, the work is still a complicated control problem for a nonlinear MIMO coupled system. If the drivability and fuel economy are considered, it is a constrained control problem but more complicated. This will be researched in the next work.

The control effect is compared with the PID method in Tables 3 and 4. The PID controller is not proper for the nonlinear MIMO coupled system. It usually needs to decouple the system and, respectively, control it. The control coordination of the PID is not very well, and its integrated optimization effect is not satisfying. Clearly, the ADHDP method overcomes this shortage. The ADHDP control can optimize the emissions for all load conditions. It does not need the model of the controlled objective.

In addition, during the preliminary research stage, the ADHDP controller mainly aims at NEDC learning for the WAPS engine to pass the China-IV emission test. As neural networks themselves have a generalization ability, the ADHDP controller also has a certain generalization ability. However, this generalization ability needs to be tested and improved. Nevertheless, this ADHDP controller still shows its strong optimizing capability for the complicated nonlinear MIMO coupled system during the NEDC test.

Conclusion

This work upgrades the emission performance of the 4JB1-T WAPS engine from the China State-II to nearly the China State-IV regulation with the WAPS injector and the optimal ADHDP controller. The rotating hub test shows that the optimal ADHDP control design is an effective data-driven optimal method for the nonlinear MIMO coupled system, such as the engine emission control. The novel WAPS injector can also achieve a similar function as that of the Bosch high-pressure common rail system with a low cost.

For all that this article proposes a new data-driven approach for the engine control and calibration. It also adds another dimension to the existing literature on the WAPS engine emission control. Thus, the presented controller may have the potential to outperform the existing controllers with regard to three aspects:

The ADHDP controller does not require a mathematical model of the controlled system. This is because it can automatically learn the inherent dynamics and nonlinearities of the engine from the real engine data. It is a real data-driven method and has some meaning.

The proposed controller adopts the principle of the DP, artificial neural network, and feedback control, which makes it an optimal trade-off controller of a highly regarded intelligence and performance.

This controller can also learn to improve its performance during the actual vehicle operations and will adapt to uncertain changes of the environment and vehicle conditions. This is an inherent feature of the neural network learning controller. Thus, this technique may have the promise of an adaptive controller.

Although illustrated for the engine control, this ADHDP control system framework can also be applicable to a general data-driven nonlinear MIMO system.

Footnotes

Handling Editor: Haiping Du

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the NSFC projects of China under grant nos 61403250, 51779136, and 51509151, the bureau project of China under grant no. 2015HT056, and the Science Commission of Shanghai under grant no. 13510501600.

ORCID iD

Zhijian Huang,

References

Summers

Houtte

Psaras

Simultaneous control of particulate and NO_x emissions from diesel engines. Appl Catal B: Environ 1996; 10: 139–156.

Rathore

Srivastava

Agarwal

et al . Development of surface functionalized activated carbon fiber for control of NO and particulate matter. J Hazard Mater 2010; 173: 211–222.

Feng

et al . Experimental study on the nitrogen dioxide and particulate matter emissions from diesel engine retrofitted with particulate oxidation catalyst. Sci Total Environ 2014; 472: 56–62.

Chen

Wang

Air-fraction modeling for simultaneous diesel engine NO_x, and PM emissions control during active DPF regenerations. Appl Energ 2014; 122: 310–320.

Majumder

Chakraborti

Banerjee

et al . Experimental study on the role of ethanol on performance emission trade-off and tribological characteristics of a CI engine. Renew Energ 2016; 86: 972–984.

Rakopoulos

Kyritsis

DC.

Butanol or DEE blends with either straight vegetable oil or biodiesel excluding fossil fuel: comparative effects on diesel engine combustion attributes, cyclic variability and regulated emissions trade-off. Energy 2016; 115: 314–325.

Wei

Yao

Han

et al . Effects of methanol to diesel ratio and diesel injection timing on combustion, performance and emissions of a methanol port premixed diesel engine. Energy 2016; 95: 223–232.

Deb

Debbarma

Majumder

et al . Performance–emission optimization of a diesel-hydrogen dual fuel operation: a NSGA II coupled TOPSIS MADM approach. Energy 2016; 117: 281–290.

Roy

Das

Bose

et al . ANN metamodel assisted particle swarm optimization of the performance-emission trade-off characteristics of a single cylinder CRDI engine under CNG dual-fuel operation. J Nat Gas Sci Eng 2014; 21: 1156–1162.

10.

Kumar

Saravanan

Sethuramasamyraja

et al . Screening oxygenates for favorable NOx/smoke trade-off in a DI diesel engine using multi response optimization. Fuel 2017; 199: 670–683.

11.

Lin

Lee

et al . Reduction in emissions of nitrogen oxides, particulate matter, and polycyclic aromatic hydrocarbon by adding water-containing butanol into a diesel-fueled engine generator. Fuel 2012; 93: 364–372.

12.

Kumar

Saravanan

Rana

et al . Use of some advanced biofuels for overcoming smoke/NO_x trade-off in a light-duty DI diesel engine. Renew Energ 2016; 96: 687–699.

13.

Tschanz

Amstutz

Onder

et al . Feedback control of particulate matter and nitrogen oxide emissions in diesel engines. Control Eng Pract 2013; 21: 1809–1820.

14.

Nikzadfar

Shamekhi

AH.

Investigating the relative contribution of operational parameters on performance and emissions of a common-rail diesel engine using neural network. Fuel 2014; 125: 116–128.

15.

Wei

Kittelson

Northrop

WF.

Optimization of reactivity-controlled compression ignition combustion fueled with diesel and hydrous ethanol using response surface methodology. Fuel 2015; 160: 446–457.

16.

Deb

Majumder

et al . Application of artificial intelligence (AI) in characterization of the performance–emission profile of a single cylinder CI engine operating with hydrogen in dual fuel mode: an ANN approach with fuzzy-logic based topology optimization. Int J Hydrogen Energ 2016; 41: 14330–14350.

17.

Liu

Song

Effect of post injection strategy on regulated exhaust emissions and particulate matter in a HSDI diesel engine. Fuel 2016; 185: 1–9.

18.

Divekar

Chen

Tjong

et al . Energy efficiency impact of EGR on organizing clean combustion in diesel engines. Energ Convers Manage 2016; 112: 369–381.

19.

Liu

Zhang

Zhao

et al . A novel lambda-based EGR (exhaust gas recirculation) modulation method for a turbocharged diesel engine under transient operation. Energy 2016; 96: 521–530.

20.

Zhou

Yang

Comparison and combination of NLPQL and MOGA algorithms for a marine medium-speed diesel engine optimisation. Energ Convers Manage 2017; 133: 138–152.

21.

Powell

WB.

Approximate dynamic programming: solving the curses of dimensionality. Hoboken, Nj: John Wiley & Sons, 2007.

22.

Yang

Liu

et al . Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems. Inform Sci 2016; 328: 435–454.

23.

Bellman

Dynamic programming. Princeton, NJ: Princeton University Press, 1957.

24.

Zhang

Luo

et al . An overview of research on adaptive dynamic programming. Acta Automat Sin 2013; 39: 303–311.

25.

Wang

YT.

Online learning control by association and reinforcement. IEEE T Neural Networ 2001; 12: 264–276.

26.

Song

Wei

Sun

Nearly ﬁnite-horizon optimal control for a class of nonafﬁne time-delay nonlinear systems based on adaptive dynamic programming. Neurocomputing 2015; 156: 166–175.

27.

Govindhasamy

McLoone

Irwin

et al . Reinforcement learning for online control and optimisation. IEEE Contr Eng Book Ser 2005; 70: 293–326.

28.

Wei

Liu

Lin

Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE T Cybernetics 2016; 46: 840–853.

29.

Wei

Liu

Lin

et al . Discrete-time optimal control via local policy iteration adaptive dynamic programming. IEEE T Cybernetics 2017; 47: 3367–3379.

30.

Zhang

Sun

et al . Adaptive dynamic programming-based optimal control of unknown nonafﬁne nonlinear discrete-time systems with proof of convergence. Neurocomputing 2012; 91: 48–55.

31.

Khalil

HK.

Nonlinear systems. 2nd ed. Upper Saddle River, NJ: Prentice-Hall, Inc., 2002, pp. 111–194.

32.

Zhang

Ning

Zheng

WX.

Observer-based control for piecewise-affine systems with both input and output quantization. IEEE T Automat Contr 2017; 62: 5858–5868.

33.

Al-Tamimi

Lewis

Abu-Khalaf

Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE T Syst Man Cyt B 2008; 38: 943–949.

34.

Liu

Javaherian

Kovalenko

et al . Adaptive critic learning techniques for engine torque and air–fuel ratio control. IEEE T Syst Man Cy B 2008; 38: 988–993.

35.

Liu

Wang

Yang

An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs. Inform Sci 2013; 220: 331–342.

Approximate dynamic programming solution for the optimal nitrogen oxides/particulate matter trade-off control of a WAPS engine

Abstract

Keywords

Introduction

The novel WAPS injector

The structure and principle of the WAPS injector

The comparison with the counterpart of Bosch

The optimal control principle of the ADP

The DP and cost-to-go function–based control

The ADP method

The nonlinear MIMO ADHDP

The critic network

The action network

The recursive Levenberg–Marquardt algorithm

The normalization

The convergence analysis of the nonlinear MIMO ADHDP

Analysis

Lemma

Theorem

Proof

Experiment and results

The neural network model of the 4JB1-T engine

The utility function design

The critic network design

The action network design

The experimental devices

The experiment design

The experiment results

Remark

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References