Sage Journals: Discover world-class research

Abstract

To reduce the modeling burden for control of spark-ignition engines, reinforcement learning (RL) has been applied to solve the dilute combustion limit problem. Q-learning was used to identify an optimal control policy to adjust the fuel injection quantity in each combustion cycle. A physics-based model was used to determine the relevant states of the system used for training the control policy in a data-efficient manner. The cost function was chosen such that high cycle-to-cycle variability (CCV) at the dilute limit was minimized while maintaining stoichiometric combustion as much as possible. Experimental results demonstrated a reduction of CCV after the training period with slightly lean combustion, contributing to a net increase in fuel conversion efficiency of 1.33%. To ensure stoichiometric combustion for three-way catalyst compatibility, a second feedback loop based on an exhaust oxygen sensor was incorporated into the fuel quantity controller using a slow proportional-integral (PI) controller. The closed-loop experiments showed that both feedback loops can cooperate effectively, maintaining stoichiometric combustion while reducing combustion CCV and increasing fuel conversion efficiency by 1.09%. Finally, a modified cost function was proposed to ensure stoichiometric combustion with a single controller. In addition, the learning period was shortened by half to evaluate the RL algorithm performance on limited training time. Experimental results showed that the modified cost function could achieve the desired CCV targets, however, the learning time was reduced by half and the fuel conversion efficiency increased only by 0.30%.

Keywords

Internal combustion engines reinforcement learning fuel efficiency real-time learning

Introduction

Advances in computing technology, as well as cost reduction of sensors and data storage, have allowed the transportation sector to move away from simple rule-based static controllers to adaptive machine-learning-based strategies. Big data analytic tools have allowed the optimization of traffic patterns using real-time traffic feedback.¹ Computer vision and wireless communication have enabled a plethora of modern adaptive control strategies to optimize and guarantee a safe operation of connected and automated vehicles.^2,3 Hardware acceleration, sensing technology, and adaptive methods have enabled energy optimization at a variety of operating conditions for different powertrain architectures, ranging from hybrid powertrains⁴ to advanced combustion engines.^5–7 This study focuses on the latter, applying data-driven adaptive optimal control strategies to improve the efficiency of internal combustion engines.

Spark-ignition (SI) combustion engines dominate the current light-duty vehicle market, corresponding to 94% market share. In recent years, battery electric vehicles and plug-in hybrid electric vehicles have steadily gained market penetration. However, the U.S. Energy Information Administration forecasts that gasoline vehicles will remain the dominant vehicle type, comprising over 70% of the market through 2050.⁸ Medium- and heavy-duty vehicles will remain dependent on internal combustion engines for the foreseeable future due to their heavier duty cycles, with dilute SI engines offering the potential for efficiency and emissions improvement, particularly in the medium-duty sector.

Dilute combustion accomplished with exhaust gas recirculation (EGR) is a technologically proven and cost-effective way to reduce fuel consumption in a variety of engine platforms.^9,10 However, EGR affects combustion kinetics, reducing the combustion rate and making stable combustion more difficult to achieve. At the combustion stability limit, also called the dilute limit, the ignition becomes highly sensitive to the in-cylinder charge composition and sporadic misfires and partial burns occur, exacerbating cycle-to-cycle variability (CCV). These issues may become more pronounced for future medium-duty SI gaseous and low-carbon fuels with reduced flame speeds. Moreover, similar stability limits have been found in diesel combustion during cold starts,^11,12 diesel combustion in aerial powertrains,¹³ spark-assisted compression ignition,¹⁴ and gasoline compression ignition.¹⁵ Thus, a robust and adaptable control strategy for minimizing combustion CCV can have wide applications within the transportation sector to improve vehicle efficiencies.

Current production engines avoid regions of unstable combustion through extensive manual calibration efforts, utilizing tables to define acceptable operating conditions, leaving efficiency opportunities on the table. Active controls to stabilize combustion at the limits would enable further efficiency gains, but robust methods for achieving this have thus far proven elusive. Previous studies have determined that CCV at the dilute limit presents deterministic patterns,¹⁶ with cycle-to-cycle communication based on the composition of residual gases carried over from the previous cycle.^17,18 The feasibility of using the dynamics of dilute CCV for combustion stability control was demonstrated using a proportional feedback control to adjust the fuel injection quantity,¹⁹ providing a proof of concept that changing the quantity of fuel injected on a cycle-resolved basis can impact combustion stability. Model-based predictive control has also been attempted,²⁰ but was limited by the difficulty of achieving a control-oriented model that is both predictive and computationally inexpensive.²¹ Machine learning approaches, including artificial neural networks (ANNs), offer the potential to address these difficulties. Early attempts at using ANNs for combustion stability control were demonstrated for lean combustion²² and for EGR dilution,^23–25 using an ANN-based observer for state estimation and an ANN-based controller to determine next-cycle fueling. This approach was later improved by a reinforcement learning (RL)-based controller that used an ANN-based adaptive-critic structure.^26,27 All these applications, however, were trained offline using the model of Daw et al.²⁸ While they showed improvements in CCV, they also exhibited fuel enrichment, and it was not possible to determine how much of the improvement was due to next-cycle control actions. Even though the model has been improved²⁹ and recent model-based controllers have been designed,^30,31 offline training suffers from inaccuracies and uncertainties of the model. Additionally, the resulting control policy is restricted to the domain from where the data used to calibrate the model were collected, rendering a suboptimal solution.

The implementation of model-free control with online learning for internal combustion engines remains a challenging problem,³² with the bigger advantage of providing globally optimal solutions for the problem of combustion CCV control. Modern engine control units (ECUs) are now capable of online calculation of advance control strategies for cycle-to-cycle^33,34 and even in-cycle³⁵ combustion control. Online learning has proven useful to avoid the dilute limit³⁶ and to exploit the stochastic properties of CCV for optimized combustion.³⁷ Even though online learning has been part of the control design, the control command itself was calculated thanks to a simplified model of the SI engine. Recently, a model-free RL controller has been designed by Henry de Frahan et al. to adaptively adjust the fuel injection timing in diesel engines.³⁸ However, the controller was trained offline on a computer model rather than using an experimental engine in real-time.

This study introduces a novel reinforcement learning strategy for optimal fuel quantity control, leveraging online model-free learning of combustion dynamics during real-time engine operation. Reinforcement learning was chosen as the preferred data-driven adaptive optimal control method based on its strong theoretical foundation and convergence guarantees.³⁹ The approach involved operating the engine under increased dilution levels beyond the dilute limit and utilizing a control-oriented model for data-efficient reinforcement learning. The proposed algorithm, tested under three scenarios, demonstrated significant improvements in fuel conversion efficiency and combustion stability, effectively extending the dilute limit. The results highlight the effectiveness of model-free learning in comparison to traditional model-based approaches, paving the way for future research to explore the method’s applicability across diverse engine operating conditions.

The remaining sections are organized as follows. The experimental setup is first introduced. The combustion conditions at the dilute limit and the definition of the objectives of the controller are later discussed. The control-oriented model used for selecting the system states and the cost function is then presented. The RL algorithm used for model-free learning of the optimal control policy follows. The experimental results for three different control designs and the comparison between open-loop and closed-loop performance are later presented. Finally, the paper presents conclusions.

Experimental setup

A single-cylinder version of a 2.3L Ford Ecoboost engine was used for experimental demonstration. The SI direct-injection experimental engine was equipped with an external, cooled, low-pressure EGR system. The airflow was kept constant using an Alicat mass flow controller. The cam timing was chosen to avoid positive or negative valve overlap. The engine speed was kept constant using an engine dynamometer. The spark advance was controlled by a slow proportional-integral (PI) controller targeting an optimal combustion phasing. A wide-band exhaust oxygen sensor was used to monitor the air-fuel equivalence ratio $λ$ . The EGR fraction in the intake manifold was calculated using an intake oxygen sensor. Research-grade E10 gasoline known as RD5-87, which was designed to be representative of regular-grade market gasoline, was utilized. The start of injection was set to −280°aTDC to contribute to a homogeneous air-fuel mixture and avoid possible fuel stratification that could affect the CCV. In-cylinder pressure was measured using a Kistler 6125C flush-mounted piezoelectric pressure transducer. Exhaust gas analyzers were used to estimate the average combustion efficiency. Table 1. summarizes the combustion chamber geometry and the main settings used during all experiments.

Table 1.

Engine geometry and settings.

Displaced Volume	$565.25 c m^{3}$
Bore × Stroke	$87.5 mm$ × $94 mm$
Connecting Rod	$149 mm$
Compression Ratio	$10 : 1$
Intake Valve Opening (IVO)	$- 359 ° aTDC$
Intake Valve Closing (IVC)	$- 100 ° aTDC$
Exhaust Valve Opening (EVO)	$120 ° aTDC$
Exhaust Valve Closing (EVC)	$359 ° aTDC$
Start of Injection	$- 280 ° a TDC$
Engine Speed	$2000 rpm$
Fresh Airflow	$276.5 g / s$
Coolant & Oil Temperature	$90 ° C$
EGR Cooler Temperature	$50 ° C$
Intake Manifold Temperature	$40 ° C$

An open LabVIEW-based ECU implemented on a National Instruments Powertrain Controls (formerly Drivven, Inc.) platform was used for low-level and high-level control of the engine actuators. In-cylinder pressure data processing needed for control design was done using an in-house LabVIEW-based Oak Ridge Combustion Analysis System (ORCAS) embedded in the ECU. In order to guarantee cycle-to-cycle control at an engine speed of 2000 rpm, equivalent to 60 ms per combustion cycle, the proposed learning algorithm was integrated into the ECU without communication overhead.⁴⁰ Finally, the control parameters of the ECU were modified through a host computer with a LabVIEW interface. Figure 1 shows the setup and instrumentation of the experimental engine test cell.

Figure 1.

Experimental setup with rapid prototyping ECU.

Dilute limit in spark-ignition engines

While EGR increases the overall fuel efficiency, it does so up to the stability limit. At high levels of EGR, ignition becomes highly sensitive to changes in cylinder composition. This results in a decrease of efficiency past the dilute limit due to either poor flame initiation (misfires)^41,42 or propagation (partial burns)⁴³ which increase the combustion CCV even if the spark advance is optimized.⁴⁴ Moreover, the fundamental characteristics of combustion CCV can change depending on engine speed, load, and phasing.^45–47 Since SI engines operate over a wide range of speeds and loads, complete characterization of the dilute limit over the entire operating regime requires a large amount of time and effort. To reduce the testing burden, the Advanced Combustion and Emission Control (ACEC) tech team of the United States Council for Automotive Research (USCAR) has chosen a relevant engine speed and load condition for different technology applications.⁴⁸ In particular, for a downsized boosted engine that seeks to operate at heavier loads, an engine speed of 2000 rpm and 20% load is suggested. For this engine, that load equates to 4 bar brake mean effective pressure (BMEP), or about 5 bar indicated mean effective pressure (IMEP).

Figure 2 shows different combustion metrics at 2000 rpm, 5 bar IMEP, 0% EGR fraction, and stoichiometric combustion, approximately recreating the operating condition suggested by the ACEC tech team. EGR fraction was then increased until efficiency losses due to high CCV were detected. Each marker corresponds to an average value over 2000 engine cycles. At each cycle k, the combustion features have been estimated from the in-cylinder pressure sensor as follows:

• Indicated mean effective pressure:

{IMEP}_{k} = \frac{1}{V_{d}} \int_{IVO}^{EVC} P_{θ} [k] \frac{d V_{θ}}{d θ}

(1)

Figure 2.

Averages over 2000 cycles of different combustion metrics as EGR increases toward the dilute limit.

• Gross heat release (neglecting crevices):

\begin{matrix} Q_{gross} [k] = \int_{IVC}^{EVO} \frac{1}{γ - 1} V_{θ} \frac{d P_{θ} [k]}{d θ} \\ + \frac{γ}{γ - 1} P_{θ} [k] \frac{d V_{θ}}{d θ} + \frac{d Q_{HT} [k]}{d θ} d θ \end{matrix}

(2)

• Combustion efficiency:

η_{c} [k] = \frac{Q_{gross} [k]}{M_{fuel} [k] Q_{LHV}}

(3)

• Fuel conversion efficiency:

η_{i} [k] = \frac{IMEP [k]}{M_{fuel} [k] Q_{LHV}}

(4)

• Combustion phasing (50% mass burned):

\int_{IVC}^{CA 5 0_{k}} \frac{d Q_{gross} [k]}{d θ} d θ = 0.5

(5)

Here, $P_{θ}$ is the in-cylinder pressure as a function of crank angle $θ$ , $V_{θ}$ is the in-cylinder volume as function of crank angle, $V_{d}$ is the displacement volume, $γ = 1.3$ is the polytropic coefficient, $Q_{HT}$ is the convective heat transfer to the cylinder walls, $M_{fuel}$ is the in-cylinder fuel mass, and $Q_{LHV} = 41.61$ MJ/kg is the lower heating value of the RD5-87 fuel. In addition, the average spark advance, intake manifold pressure, and exhaust manifold pressure were included.

As the EGR fraction increases, the combustion kinetics change and the combustion duration elongates, retarding the combustion phasing. A proportional-integral (PI) controller was used to increase the spark advance to compensate for the longer combustion duration and maintain an optimal combustion phasing of CA50 = 8°aTDC.^49,50 Since the intake airflow was kept constant at 276.5 g/s, the increase in EGR fraction produced the increase in intake manifold pressure observed in Figure 2. The exhaust manifold pressure, however, remained fairly constant at different EGR levels, indicating that the observed increase in IMEP and fuel conversion efficiency are due to improved combustion rather than improved gas exchange. On the other hand, the combustion efficiency showed a minor increase with EGR up to the dilute limit. Combustion CCV was characterized by the coefficient of variation (CoV) of IMEP and gross heat release. The increase in IMEP and gross heat release are followed by an increase in CoV due to the increased sensitivity to in-cylinder composition. The industry CoV of IMEP limit is 3%, based on the levels of noise, vibration, and harshness experienced by the driver. For this operating condition, the dilute limit was reached at approximately 25% EGR fraction at optimal spark advance. Furthermore, the experimental data in Figure 2 showed that peak IMEP and peak fuel conversion efficiency also occur at the dilute limit of 25% EGR fraction.

When the dilution levels were increased past the limit, to 27.5% EGR fraction, a drastic increase in CCV driven by sporadic partial burns and misfires rapidly increased the CoV of IMEP and the CoV of $Q_{gross}$ . The low-energy combustion events reduced the average values of the IMEP, gross heat release, fuel conversion efficiency, and combustion efficiency. Even though the PI controller for the spark advance was tasked to maintain an average of CA50 = 8°aTDC, partial and slow burns contributed to a slightly retarded combustion phasing. The same high sensitivity of ignition with respect to the in-cylinder composition that generates sporadic misfires and partial burns, however, can be used to reduce the CCV and recover the efficiency gains at high EGR. Previous studies have shown experimentally that small perturbations in fuel quantity, ranging from 0.5% to 2% additional fuel, can effectively reduce the combustion CCV.^51,52 Therefore, we propose an injection quantity controller to alter the in-cylinder mass composition and stabilize the charge to avoid partial burns and misfires.

Model-based cost function

For the cost function in the RL algorithm, a control-oriented model was considered. The physics-based approach for lean combustion modeling presented by Daw et al.²⁸ was used. The coupling between combustion cycles is through the carryover of residual gas from the previous cycle to the next and can be described by the following discrete-time system:

\begin{matrix} {[\begin{matrix} M_{fuel} \\ M_{gas} \end{matrix}]}_{k + 1} = X_{res} [k] [\begin{matrix} 1 - η_{c} [k] & 0 \\ η_{c} [k] & 1 \end{matrix}] {[\begin{matrix} M_{fuel} \\ M_{gas} \end{matrix}]}_{k} \\ + [\begin{matrix} m_{fuel, in} [k] \\ \frac{m_{air, in}}{1 - X_{EGR}} \end{matrix}] \end{matrix}

(6)

Q_{gross} = η_{c} [k] M_{fuel} [k] Q_{LHV}

(7)

where $M_{gas}$ is the in-cylinder gas mass (air+burned gas), $X_{res}$ is the residual gas fraction, $X_{EGR}$ is the EGR fraction, $m_{air, in}$ is the mass of fresh air per combustion cycle, and $m_{fuel, in}$ is the fuel injection quantity to be controlled. The first term of equation (6) corresponds to the residual mass in the cylinder, while the second term corresponds to the fresh mass available for the next combustion cycle. The residual gas fraction was calculated assuming an isentropic exhaust process and zero valve overlap.⁵³

X_{res} [k] = \frac{V_{EVC}}{V_{EVO}} {(\frac{P_{EVC}}{P_{EVO} [k]})}^{1 / γ} .

(8)

The subscripts indicate that only the values EVO and EVC are needed. However, in order to perform online calculations of the in-cylinder mass, it was assumed that $P_{EVC} = 1 bar$ . This allowed the ORCAS to start the calculations at EVO and enabled the host computer to send the control command to the ECU before EVC. A previous study by the authors showed that this system is stable and fully controllable.³⁷ During steady-state and stoichiometric combustion, the fuel injection quantity is determined by the amount of fresh air and the stoichiometric air-fuel ratio $AF R_{s} = 14.2$ as follows:

m_{fuel, in}^{s} = \frac{m_{air, in}}{AF R_{s}} .

(9)

To avoid engineering units and maintain the variables at similar scales, consider the following normalization:

F_{k} = M_{fuel} [k] / m_{fuel, in}^{s}

(10)

G_{k} = M_{gas} [k] / m_{air, in}

(11)

u_{k} = m_{fuel, in} [k] / m_{fuel, in}^{s}

(12)

H_{k} = Q_{gross} [k] / (m_{fuel, in}^{s} Q_{LHV}) .

(13)

Finally, one can show that the normalized system obeys the following set of equations:

x_{k + 1} = X_{res} [k] [\begin{matrix} 1 - η_{c} [k] & 0 \\ \frac{η_{c} [k]}{AF R_{s}} & 1 \end{matrix}] x_{k} + [\begin{matrix} u_{k} \\ \frac{1}{1 - X_{EGR}} \end{matrix}]

(14)

H_{k} = [\begin{matrix} η_{c} [k] & 0 \end{matrix}] x_{k}

(15)

where $x_{k} = {[\begin{matrix} F_{k} & G_{k} \end{matrix}]}^{T}$ denotes the normalized state. Figure 3 illustrates the block diagram for online calculation of cycle-to-cycle variables. The figure emphasizes that the model-based combustion cycle (gray shaded area) is not synchronized with the ECU cycle. For this reason, the control command $u_{k}$ needs to be calculated before the end of the ECU cycle k.

Figure 3.

Calculations needed for cycle-to-cycle analysis.

Equation (14) can be written as $x_{k + 1} = f (x_{k}, u_{k}, w_{k})$ , where $w_{k}$ is a stochastic variable representing the combustion CCV. Note that the stochastic variable $w_{k}$ is not explicitly written in equation (14). Rather, this variable causes the CCV observed in the experimentally calculated values of $η_{c} [k]$ and $X_{res} [k]$ . Also note that $u_{k} = 1$ corresponds to stoichiometric combustion, while $u_{k} > 1$ causes a rich mixture and $u_{k} < 1$ causes a lean mixture. At the operating condition selected for this study, the steady-state values of the residual gas fraction and combustion efficiency during nominal combustion are $X_{res}^{ss} \approx 5 %$ and $η_{c}^{ss} \approx 96 %$ . Therefore, the steady-state values at stoichiometric conditions of the normalized states can be approximated by solving the following algebraic equation:

[\begin{matrix} F^{ss} \\ G^{ss} \end{matrix}] \approx [\begin{matrix} 0 & 0 \\ 0 & X_{res}^{ss} \end{matrix}] [\begin{matrix} F^{ss} \\ G^{ss} \end{matrix}] + [\begin{matrix} 1 \\ 1 / (1 - X_{EGR}) \end{matrix}] .

(16)

Then, $F^{ss} \approx 1$ and $G^{ss} \approx 1.45$ at 27.5% EGR fraction. The normalized heat release then becomes $H^{ss} \approx η_{c}^{ss}$ .

The steady-state analysis only applies to nominal combustion cycles. During misfires, the combustion efficiency drops to zero (no combustion) increasing the combustion CCV. Therefore, $H_{k}$ can be used as a proxy variable to quantify and control CCV. Consider the following model-based cost function to be minimized by the RL controller at each combustion cycle:

g (x_{k}, u_{k}, w_{k}) = (u_{k} - 1)^{2} + ρ (H_{k} - η_{c}^{ss})^{2} .

(17)

The first term penalizes the deviations from stoichiometric combustion while the second term penalizes the deviations from nominal combustion without misfires or partial burns. The parameter $ρ$ was introduced to balance the trade-off between the two terms of the cost function. A previous study by the authors determined that a value of $ρ = 0.2$ can strike a good balance between the two components.³⁷ Such cost was calculated by ORCAS and readily available for the learning algorithm at each combustion cycle. The state $x_{k}$ , however, is not measurable by engine sensors. Therefore, cycle-to-cycle estimates were generated by the combustion model.

Reinforcement learning algorithm

Consider the control problem of finding an optimal stationary policy $u_{k} = π (x_{k})$ such that the following infinite horizon total cost is minimized:

J (x_{0}) = E_{w} [\sum_{k = 0}^{\infty} α^{k} g (x_{k}, π (x_{k}), w_{k})]

(18)

where $α \in (0, 1)$ is a discounting factor included to guarantee the existence of a solution.³⁹ The exact solution to the problem is given by Bellman’s equation:

J (x) = min_{u} E_{w} [g (x, u, w) + α J (f (x, u, w))] .

(19)

This is a challenging functional equation that could not be solved analytically for the problem at hand. RL was used to approximate the solution of the infinite horizon discounted cost problem. Q-learning was the method of choice for the following reasons: (1) simplicity in implementation, given that we wrote the algorithm within the native LabVIEW programing environment, (2) convergence, given that the update rule ensures that the Q-values eventually converge to the optimal value, and (3) off-Policy Learning, given that it can learn from data generated by a different policy. The third aspect is particularly important since one proposed method merges RL and $λ$ feedback to guarantee stoichiometric combustion. However, implementing Q-learning requires the discretization of the state and control spaces to a finite-state Markov decision process (MDP). Although there exist RL algorithms that can deal with continuous state and control spaces, such as the actor-critic methods that have been successfully applied to a wide range of problems, they have significantly higher complexity and their sensitivity to hyperparameters may lead to slow convergence, instability, or poor generalization.⁵⁴

Let $x_{k} \in X \subset R^{20 \times 20}$ be the discretized state space. For each state, the intervals chosen were $0.96 \leq F_{k} \leq 1.18$ and $1.43 \leq G_{k} \leq 1.61$ , each of which was divided into 20 equidistant points. Note that the steady-state values $F^{ss} \approx 1$ and $G^{ss} \approx 1.45$ previously calculated do not lie at the center of the intervals. This is necessary because $F_{k}$ and $G_{k}$ increase significantly after a misfire occurs due to the unburned air-fuel residual mixture. Let $u_{k} \in U \subset R^{15}$ be the discretized control space. The interval chosen for the control space was $0.975 \leq u_{k} \leq 1.025$ , centered at the stoichiometric value and allowing a decrease/increase of maximum 2.5% fuel. Let $p_{k + 1 | k} (u_{k}) = \Pr (x_{k + 1} | x_{k}, u_{k})$ be the transition probability of the MDP, then the expected value in equation (19) can be simplified to the Q-factor:

Q (x_{k}, u_{k}) = \sum_{x_{k + 1} \in X} p_{k + 1 | k} (u_{k}) (g (x_{k}, u_{k}) + α J (x_{k + 1}))

(20)

Since the optimal cost satisfies $J (x) = min_{u} Q (x, u)$ , then the optimal Q-factor solves the equation³⁹:

Q (x_{k}, u_{k}) = \sum_{x_{k + 1} \in X} p_{k + 1 | k} (u_{k}) (g (x_{k}, u_{k}) + α min_{u \in U} Q (x_{k + 1}, u)) .

(21)

Let $Q \in R^{20 \times 20 \times 15}$ be the Q-table with Q-factors $Q (x, u)$ . If the transition probability is known, then equation (21) can be solved for $Q$ using algebraic methods. Even though previous studies have focused on modeling and simulating the system described by equation (14), the transition probability is not explicitly known but rather the result of simulations. More importantly, the simple physics-based model does not capture all the complexity of the combustion process typically recreated by computational fluid dynamics simulations.⁵⁵

The Q-learning algorithm introduced by Watkins⁵⁶ was used to adaptively learn the Q-table without knowledge of the transition probability (model-free learning). The Q-factor update law used a constant learning rate $κ$ as follows:

Q_{k + 1} (x_{k}, u_{k}) = (1 - κ) Q_{k} (x_{k}, u_{k}) + κ (g (x_{k}, u_{k}) + α min_{u \in U} Q_{k} (x_{k + 1}, u))

(22)

Algorithm 1 Q-learning for fuel quantity control
$n = 0, k = 0, Q_{0} ~ U (1, 2)$ while $n \leq N$ do Intake phase (IVO to IVC): if $k > 0$ then apply $u_{k - 1}$ command else apply open-loop $u_{k - 1} = 1$ command end if Combustion phase (IVC to EVO): calculate $H_{k}$ from in-cylinder pressure generate random variable $p ~ U (0, 1)$ if $p < ε_{n}$ then pick randomly $u_{k} ~ U {U}$ else optimize $u_{k} = \arg min_{u \in U} Q_{k} (x_{k}, u)$ end if Exhaust phase (EVO to EVC): calculate cycle cost $g (x_{k}, u_{k})$ calculate state $x_{k + 1}$ using $u_{k}$ and equation (14) update $Q_{k + 1} (x_{k}, u_{k})$ using equation (22) move to next combustion cycle: $k = k + 1$ if $k \equiv 0 (mod 2000)$ then move to next episode: $n = n + 1$ end if end while

Algorithm 1 Q-learning for fuel quantity control

n = 0, k = 0, Q_{0} ~ U (1, 2)

while

n \leq N

do
Intake phase (IVO to IVC):
if

k > 0

then
apply

u_{k - 1}

command
else
apply open-loop

u_{k - 1} = 1

command
end if
Combustion phase (IVC to EVO):
calculate

H_{k}

from in-cylinder pressure generate random variable

p ~ U (0, 1)

p < ε_{n}

then
pick randomly

u_{k} ~ U {U}

else
optimize

u_{k} = \arg min_{u \in U} Q_{k} (x_{k}, u)

end if
Exhaust phase (EVO to EVC):
calculate cycle cost

g (x_{k}, u_{k})

calculate state

x_{k + 1}

using

u_{k}

and equation (14)
update

Q_{k + 1} (x_{k}, u_{k})

using equation (22)
move to next combustion cycle:

k = k + 1

k \equiv 0 (mod 2000)

then
move to next episode:

n = n + 1

end if
end while

A $ε$ -greedy approach was used to balance the exploration-exploitation trade-off and avoid sub-optimal solutions. Let $n = {1, 2, \dots, N}$ be the episode number where N is the total number of episodes used in the Q-learning algorithm. Due to the low probability of misfire events at the dilute limit, each episode consisted of 2000 engine cycles in order to guarantee that the total cost includes misfire events at each episode. Let $Q_{0}$ be the initial condition for the Q-table where each Q-factor is chosen randomly from the interval $(1, 2)$ according to a uniform distribution, in other words:

Q_{0} = [Q (x, u) ~ U (1, 2)], \forall x \in X, u \in U .

(23)

Such an interval was chosen to guarantee a large enough initial condition. For each episode, let $ε_{n} = 1 - n / N$ be the probability of choosing a random action over the optimal. Finally, algorithm 1 shows the Q-learning approach used for fuel quantity control in this study.

Experimental results

The Q-learning hyperparameters N, $α$ , and $κ$ were calibrated using the computer model previously described. Simulations were done in Python, and the code can be found in Code Ocean https://codeocean.com/capsule/8130780/tree/v1. The parameter N was constrained by the length of the experiment, while the parameters $α = 0.1$ and $κ = 0.3$ showed acceptable convergence behavior in the simulations. Note that simulations were used only to determine the best set of hyperparameters. During experimental testing, the RL algorithm started from the random initial condition $Q_{0}$ , as defined by equation (23), and algorithm 1 was fully implemented and executed in the ECU.

Two training scenarios were considered, pictured in Figure 4. In the first, the cylinder-pressure-based controller was given full authority over the control action, resulting in $m_{fuel, in} [k] = m_{fuel, in}^{s} u_{k}$ . In the second scenario, the RL controller coordinated the fuel command with a PI controller based the exhaust oxygen sensor. A slow PI controller was tuned to maintain, on average, stoichiometric combustion based on $λ$ feedback. Then, the amount of fuel injected per cycle was calculated as $m_{fuel, in} [k] = m_{fuel, in}^{s} u_{k} λ_{FB} [k]$ , where $λ_{FB} [k]$ is the output of the PI controller driven by the $λ_{k}$ error.

Figure 4.

Feedback controllers considered during experiments.

RL controller without $λ$ feedback

The first model-free online learning experiment was conducted using the cost function in equation (17) without $λ$ feedback over $N = 195$ episodes. Given that each episode (2000 engine cycles) took a total of 2 min to finish, the total training time in the experimental engine was 6h30min. Figure 5 shows the total cost per episode and the average values of several variables of interest during training. The total cost shows a steady decrease during training. The sudden drop in total cost observed before episode 50 was due to an adjustment of the EGR valve. Note that $E [λ] > 1$ in the early stages of learning. This generates an overly lean mixture which exacerbates variability. This can also be observed in the initial increase of the average EGR fraction. After the EGR valve was adjusted, the average EGR fraction remained fairly constant during the rest of the experiment. The relatively constant amount of burned gas in the cylinder can also be observed by the stationary combustion efficiency. If the EGR levels are too high, generating an excessive number of misfires and partial burns, the combustion efficiency is expected to drop significantly. As the learning phase evolves and the $ε$ -greedy strategy favors exploitation over exploration, the average IMEP increase, increasing the fuel conversion efficiency. Toward the end of the Q-learning, where most of the optimization was done, the average command $u_{k}$ slightly incurs into rich conditions $(E [u] > 1)$ but ultimately settles for a lean condition $(E [u] < 1)$ . The advantages of lean combustion for increasing efficiency have been documented,⁵⁷ and the RL strategy was able to learn it. Therefore, the controller favors lean combustion with an increased fuel conversion efficiency over a stoichiometric condition. Nonetheless, the CoV of IMEP was significantly reduced compared to the open-loop conditions.

Figure 5.

Training of the RL controller without $λ$ feedback.

Figure 6 shows the closed-loop response of the controller after the learning phase. Several partial burns were identified, but no misfire occurred during closed-loop. The control $u_{k}$ remained most of the time oscillating between lean values. Occasionally, the controller issued rich commands $(u_{k} > 1)$ to the injector. As previously mentioned, rich conditions help stabilize the combustion process to avoid misfires. Therefore, the controller operated at lean conditions to achieve high $Q_{k}$ , generating high fuel conversion efficiency, but occasionally issued rich commands to avoid misfires.

Figure 6.

Closed-loop response of RL without $λ$ feedback.

The average lean condition that the controller generated for combustion, although effective for increasing efficiency, is not compatible with a current three-way catalyst (TWC) that requires stoichiometric combustion to work effectively. To maintain TWC compatibility, consider using the exhaust oxygen sensor feedback controller $λ_{FB} [k]$ after the training phase. In such a case, $λ_{FB} [k]$ will be commanding a fuel increase when $E (u) < 1$ and vice-versa. Table 2 shows the comparison between the open-loop (OL) response before learning and closed-loop (CL) response after learning. The second column shows the closed-loop average values without $λ$ feedback, while the fourth column $(CL + λ_{FB})$ shows the average response when the RL controller and $λ$ feedback were both enabled. Table 2 also shows the average change of the combustion variables when operated in closed-loop with and without $λ$ feedback. Although the original RL controller showed a 58% reduction in the total cost due to an increase of 1.22% in fuel conversion efficiency and a 50% reduction in CoV of IMEP, the $CL + λ_{FB}$ strategy showed a superior performance. This was due to the near-stoichiometric operation achieved with the additional $λ$ feedback, which is less prone to partial burns than lean combustion. The reduction of partial burns further reduced the CoV of IMEP to the desired 3% target and increased the average fuel conversion efficiency by 1.33% compared to open-loop.

Table 2.

Performance of controller trained without $λ$ feedback.

	OL	CL	Change (%)	CL + $λ_{FB}$	Change (%)
Total Cost	1.67	0.7	−58.0	0.51	−69.2
$E [u]$	1	0.99	−0.86	0.99	−0.52
$E [λ]$	1	1.01	0.93	1.01	0.33
$E [IMEP]$	5.18	5.2	0.35	5.26	1.60
CoV IMEP	8.55	4.23	−50.5	3.01	−64.8
$E [η_{i}]$ (%)	36.05	36.49	1.22	36.53	1.33
$E [η_{c}]$ (%)	94.54	94.55	0.01	94.41	−0.13

The previous study by the authors solved the one-step-ahead control problem with the same cost function proposed here.³⁷ In such a study, online learning was done to model the stochastic properties of CCV but the solution was calculated using the model described by equation (14). An increase of 0.5% in fuel conversion efficiency was reported under rich conditions. The current result of a 1.33% increase in fuel conversion efficiency at near-stoichiometric conditions highlights the benefits of the model-free RL strategy. A graphical comparison of the improvements made by the RL controller is depicted in Figure 7, which shows the return maps of $Q_{gross}$ for the three different tests summarized in Table 2. The return map is centered at the nominal value with “arms” extending toward $Q_{gross} = 0$ representing misfires. The density of the “arms” in the return map indicates the number of partial burns and misfires. Finally, the asymmetric shape is due to the combustion cycle-to-cycle dynamics. The return map during closed-loop without $λ$ feedback shows no misfires and a more symmetric shape. However, the last return map where the $λ$ feedback was enabled shows a truly symmetric and compact distribution which correlates with the reduction of combustion CCV.

Figure 7.

Return maps of gross heat release at OL (left), CL without $λ$ feedback (center), and CL with $λ$ feedback (right).

RL controller with $λ$ feedback

Given the benefits of using the $λ$ feedback controller in addition to the RL controller, online learning was conducted with the feedback command $λ_{FB} [k]$ enabled at all times. In addition, a third term in the cost function define in equation (17) was introduced. Previous research trained spiking neural networks to control fuel quantity commands, extending the dilute limit by penalizing heavily the occurrence of misfires and partial burns.^58,59 Therefore, consider the following cost function:

g (x_{k}, u_{k}, w_{k}) = (u_{k} - 1)^{2} + ρ (H_{k} - η_{c}^{ss})^{2} + α I (H_{k} < 0.6)

(24)

where $I (\cdot)$ is the indicator function and $α = 0.05$ penalizes the occurrence of misfires and partial burns. The Q-learning strategy described by algorithm 1 was used to find the optimal $u_{k}$ . However, the actual fuel command at any point in the experimental engine was calculated as $m_{fuel, in} [k] = m_{fuel, in}^{s} u_{k} λ_{FB} [k]$ . In this case, a total of $N = 200$ episodes were used during the learning phase, taking a total of 6h40min. Figure 8 shows the average training results per episode. Note how $E [λ] = 1$ during the learning phase, implying that $E [u_{k} λ_{FB} [k]] = 1$ . Similarly, the EGR fraction and the combustion efficiency remained fairly constant, signaling the stationary of the exhaust residual gases. The reduction of the total cost is driven by the increase in fuel conversion efficiency and the reduction of the CoV of IMEP. Toward the end of the learning phase, we observe that the RL controller alternates between rich and lean, ultimately settling for an average rich fuel command. However, this was counteracted by the $λ$ feedback, maintaining an stoichiometric condition.

Figure 8.

Training of the RL controller with $λ$ feedback.

Figure 9 shows the closed-loop response of the RL controller within the proposed coordinated control strategy after the learning phase. Once more, no misfires were observed during the experiment. This highlights the importance of the additional penalty term in the cost function. Therefore, the CoV of IMEP was reduced fairly close to the desired 3% limit. For this control strategy, the injection quantity command does not follow a simple pattern. For some cycles, the controller switches between stoichiometric and lean conditions. After some time, the average command switches to the rich region $(u_{k} > 1)$ . On average, however, the combined controller maintains the stoichiometric command.

Figure 9.

Closed-loop response of RL and $λ$ feedback.

Table 3 shows the comparison between the open- and closed-loop systems after being trained with $λ$ feedback enabled. Here, a 60% reduction in the total cost was achieved due to a 51% reduction of the CoV of IMEP and a 1.09% increase in fuel conversion efficiency. Note how the desired $E [λ] = E [u] = 1$ was achieved with a small difference between open- and closed-loop. The increase in fuel conversion efficiency was not as high as the previous controller because the previous strategy produced slightly leaner combustion, contributing to higher efficiency. The complexity of engine experiments and the stochasticity of the system makes it difficult to guarantee the same open-loop condition for both experiments. Thus, the use of percentage change relative to the corresponding open-loop was deemed as an acceptable metric to compare the different control strategies presented in this study.

Table 3.

Performance of controller trained with $λ$ feedback.

	OL	CL + $λ_{FB}$	Change (%)
Total Cost (-)	1.27	0.51	−60.0
$E [u]$ (-)	1	1	0.25
$E [λ]$ (-)	1	1	−0.21
$E [IMEP]$ (bar)	5.2	5.29	1.68
CoV IMEP (%)	6.47	3.17	−50.9
$E [η_{i}]$ (%)	36.35	36.75	1.09
$E [η_{c}]$ (%)	94.67	94.73	0.06

RL controller with modified cost function

Recall that the cost functions defined by equations (17) and (24) are based on the physical understanding of the cycle-to-cycle dynamics. Although they were designed to achieve the goals of stoichiometric combustion, higher fuel efficiency, and reduction of combustion CCV, they rely on the explicit variables $u_{k}$ and $H_{k}$ from the control-oriented model. The model-free approach of RL, however, allows for the inclusion of other, more suitable variables, that implicitly depend on $u_{k}$ and $H_{k}$ , but for which the explicit relations are not necessarily known. Consider the following modified cost function to achieve the control objectives previously mentioned:

g (x_{k}, u_{k}) = (λ_{k} - 1)^{2} + ρ^{*} (η_{i} [k] - η_{i}^{*})^{2} + α I (H_{k} < 0.6)

(25)

The first term uses the exhaust oxygen sensor to penalize deviations from stoichiometric conditions and serves as the replacement of the PI controller previously used. The second term attempts to directly increase the fuel conversion efficiency to a maximum value of $η_{i}^{*} = 0.37$ , where the weighting factor used was $ρ^{*} = 0.05$ . Finally, the third term is similar to the previous cost function, where $α = 0.05$ and the indicator function $I (H_{k} < 0.6)$ heavily penalizes misfires and partial burns. Given that the $λ$ feedback is now embedded in the cost function, the fuel quantity issued was calculated only using the RL controller as $m_{fuel, in} [k] = m_{fuel, in}^{s} u_{k}$ . For this controller, the Q-learning algorithm was trained using $N = 100$ episodes during a period of 3h20min in the experimental engine. The goal of the limited number of episodes was to test the effectiveness of the proposed engine control strategy in a shorter training period.

Figure 10 shows the experimental results over the training period using the modified cost function. The results display a reduction of the total cost due to (1) average stoichiometric conditions, (2) a moderate increase in fuel conversion efficiency, and (3) elimination of misfires and partial burns. Note that, toward the end of the learning phase, the controller consistently maintained rich conditions $(E [u] > 1)$ . However, the exhaust oxygen sensor indicated near-stoichiometric conditions at the end of the learning. This issue was probably caused by a miss-calibration of the open-loop command $m_{fuel, in}^{s}$ at the beginning of the learning phase. However, this small mistake shows the robustness of the controller under perturbations on initial conditions and highlights the advantages of online learning and adaptation. Ultimately, the proposed controller with the modified cost function was able to approach the 3% CoV of IMEP desired with a net fuel efficiency gain.

Figure 10.

Training of RL controller with modified cost.

Figure 11 shows the response of the controller after the learning phase. Similar to the previous cases, no misfire was observed during 2000 cycles, contributing to the reduction of CoV of IMEP. The control command showed some distinct patterns during sustained periods of time, oscillating between four particular values. There were, however, moments where the controller was less predictable, probably leveraging the cycle-to-cycle dynamics to avoid partial burns and misfires.

Figure 11.

Closed-loop response of RL with modified cost.

Table 4 presents the comparison between open- and closed-loop results of the average combustion indicators. The design of the new cost function guarantees average stoichiometric combustion by maintaining the control command centered at one. A net increase in fuel conversion efficiency of 0.3% was observed. The modest increase in efficiency was probably the consequence of the shortened learning phase. Nonetheless, the cost function was reduced by 53.4% with a reduction of 34.4% in the CoV of IMEP near the desired CCV target.

Table 4.

Performance of controller trained with modified cost.

	OL	CL	Change (%)
Total Cost (-)	0.43	0.2	−53.40
$E [u]$ (-)	1	1	−0.09
$E [λ]$ (-)	1	1	−0.03
$E [IMEP]$ (bar)	5.24	5.25	0.20
CoV IMEP (%)	5.35	3.51	−34.39
$E [η_{i}]$ (%)	36.53	36.64	0.30
$E [η_{c}]$ (%)	95.17	94.96	−0.22

Conclusion and outlook

A reinforcement learning strategy was proposed to design an optimal fuel quantity control based on online model-free learning of the combustion dynamics by running the engine in real-time. Full authority over the fuel injection control strategy was achieved using a LabVIEW-based open ECU on an experimental SI engine. The engine was operated at a constant speed and intake airflow according to the ACEC tech team guidelines. The dilution levels were increased to 27.5% EGR fraction, corresponding to a condition past the dilute limit. At each cycle, several combustion parameters were calculated using a simple cycle-to-cycle combustion dynamics model. The control-oriented model allowed for the identification of the system states needed for a data-efficient RL algorithm. The cycle cost was designed to extend the dilute combustion limit by (1) reducing combustion CCV, (2) improving fuel conversion efficiency, and (3) operating at stoichiometric conditions. An infinite horizon discounting cost problem was formulated and the solution was approximated by discretizing the state and control spaces to use Q-learning. A $ε$ -greedy decision process was used to encourage exploration and interactions with the engine. The algorithm was tested under three scenarios:

Full authority, training for 6h30min

Coordination + $λ$ feedback, training for 6h40min

Full authority + modified cost, training for 3h20min

In the first scenario, the controller was trained with full authority over the control space. During closed-loop, the improvement of fuel conversion efficiency was the highest observed, at 1.33% over the open-loop condition. However, this was due to a slightly leaner mixture, rendering it incompatible with standard TWCs. In the second scenario, the controller was trained in coordination with a $λ$ feedback controller to maintain a stoichiometric condition. This closed-loop controller was able to extend the dilute limit and improve the fuel conversion efficiency by 1.09% compared to the open-loop. Finally, the third scenario used a modified cost function to guarantee stoichiometric conditions without the need for a second $λ$ feedback controller. The experimental results showed that the controller achieved the desired goals. However, the fuel conversion efficiency improvement was limited to 0.3%, probably due to the reduced training time. Nonetheless, all the aforementioned controllers reduced significantly the CoV of IMEP, bringing it near to the limit imposed by industry standards. The results presented here were compared against previous studies where a model-based approach was used, highlighting the advantages of model-free learning for engine control. A significant challenge faced by RL in experimental testing on real-world systems is the issue of convergence time. In our case, we limited to operating the engine for up to 7 h. This time constraint implies that if RL requires an extended duration to converge to the global minimum, the resulting control strategy may fall short of optimality. Other than observing the convergence behavior of the episodic cost, we did not have a way to validate that the policy at the end of the training was optimal. Future research will focus on the application of the proposed model-free method under different engine operating conditions and longer training times to quantify the net gain over the entire engine map.

Footnotes

Appendix

Authors’ note

This manuscript has been authored in part by UT-Battelle, LLC, under contract DEAC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ().

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory and used resources at the National Transportation Research Center, a DOE User Facility.

ORCID iDs

Bryan P Maldonado

Brian C Kaul

References

Alsrehin

Klaib

Magableh

. Intelligent Transportation and control systems using data mining and machine learning techniques: a comprehensive study. IEEE Access 2019; 7: 49830–49857.

Guanetti

Kim

Borrelli

. Control of connected and automated vehicles: State of the art and future challenges. Annu Rev Control 2018; 45: 18–40.

Liu

Zhong

, et al. Computing Systems for autonomous driving: State of the art and challenges. IEEE Internet Things J 2021; 8(8): 6469–6486.

Zhu

Prucka

. Transient hybrid electric vehicle powertrain control based on iterative dynamic programing. J Dyn Syst Meas Control 2022; 144(2): 021003-1-021003-11. DOI: 10.1115/1.4052230

Kosowatz

. Rekindling the Spark. Mech Eng 2017; 139(11): 28–33.

Maldonado

Kaul

Szybist

. Artificial Neural Networks for In-Cycle Prediction of Knock Events. SAE Technical Paper 2022-01-0478, 2022.

Kaul

Maldonado

Michlberger

, et al. Analysis of Real-World Preignition Data Using Neural Networks. SAE Technical Paper 2023-01-1614, 2023.

US Energy Information Administration. Annual Energy Outlook2021. Technical report, U.S. Department of Energy, 2021.

Abd-Alla

. Using exhaust gas recirculation in internal combustion engines: a review. Energy Convers Manag 2002; 43(8): 1027–1042.

10.

Alger

Gingrich

Roberts

Mangold

. Cooled exhaust-gas recirculation for fuel economy and emissions improvement in gasoline engines. Int J Engine Res 2011; 12(3): 252–264.

11.

Bieniek

Stefanopoulou

Hoard

, et al. Retard to the limit: closed-loop COVIMEP control for aggressive exhaust heating. IFAC-PapersOnLine 2019; 52(5): 624–629.

12.

Maldonado

Bieniek

Hoard

, et al. Modelling and estimation of combustion variability for fast light-off of diesel aftertreatment. Int J Powertrains 2020; 9(1-2): 98–121.

13.

Ahmed

Middleton

Stefanopoulou

Kim

Kweon

CBM

. Closed-loop diesel combustion control leveraging ignition assist. IEEE Control Syst Lett 2022; 6: 1628–1633.

14.

Triantopoulos

Bohac

Sterniak

, et al. Cycle-to-cycle variability in spark-assisted compression ignition engines near optimal mean combustion phasing. Int J Engine Res 2023; 24(2): 420–436.

15.

Krishnamoorthi

Agarwal

, et al. Combustion instabilities and control in compression ignition, low-temperature combustion, and gasoline compression ignition engines. In: Kalghatgi

Agarwal

Goyal

(eds) Gasoline compression ignition technology: future prospects. Springer Singapore, 2022, pp.183–216, 7.

16.

Kaul

Wagner

Green

. Analysis of cyclic variability of heat release for High-EGR GDI Engine Operation with Observations on implications for Effective Control. SAE Int J Engines 2013; 6(1): 132–141.

17.

Daw

Kennel

Finney

Connolly

. Observing and modeling nonlinear dynamics in an internal combustion engine. Phys Rev E 1998; 57: 2811–2819.

18.

Finney

Kaul

Daw

, et al. Invited review: a review of deterministic effects in cyclic variability of internal combustion engines. Int J Engine Res 2015; 16(3): 366–378.

19.

Daw

Green

Wagner

. Controlling cyclic combustion variations in lean-fueled spark-ignition engines. AIP Conf Proc 2002; 622(1): 265–277.

20.

Green

Wagner

Daw

. Model based control of cyclic dispersion in lean spark ignition combustion. In: Proceedings of the 2002 Technical Meeting of the Central States Section of the Combustion Institute. pp.1–6.

21.

Di Cairano

Kolmanovsky

. Automotive applications of model predictive control. In: Raković

Levine

(eds) Handbook of model predictive control. Cham: Springer International, 2019, pp.493–527.

22.

Vance

Kaul

Jagannathan

Drallmeier

. Output Feedback controller for operation of spark ignition engines at lean conditions using neural networks. IEEE Trans Control Syst Technol 2008; 16(2): 214–228.

23.

Singh

Vance

Kaul

, et al. Neural Network Control of Spark Ignition Engines with High EGR Levels. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings. pp.4978–4985.

24.

Vance

Singh

Kaul

Jagannathan

Drallmeier

. Neural network controller development and implementation for spark ignition engines with high EGR levels. IEEE Trans Neural Netw 2007; 18(4): 1083–1100.

25.

Vance

Kaul

Jagannathan

Drallmeier

. Neuro emission controller for minimising cyclic dispersion in spark ignition engines with EGR levels1. Int J Gen Syst 2009; 38(1): 45–72.

26.

Shih

Kaul

Jagannathan

Drallmeier

. Reinforcement learning-based dual-control methodology for complex nonlinear discrete-time systems with application to spark engine EGR operation. IEEE Trans Neural Netw 2008; 19(8): 1369–1388.

27.

Shih

Kaul

Jagannathan

Drallmeier

. Reinforcement-learning-based output-feedback control of nonstrict nonlinear discrete-time systems with application to engine emission control. IEEE Trans Syst Man Cybern B 2009; 39(5): 1162–1179.

28.

Daw

Finney

Green

, et al. A Simple Model for Cyclic Variations in a Spark-Ignition Engine. In SAE Technical Paper 962086. pp. 2297–2306.

29.

Maldonado

Kaul

. Control-Oriented Modeling of Cycle-to-Cycle Combustion Variability at the Misfire Limit in SI Engines. In: Proceedings of the ASME 2020 Dynamic Systems and Control Conference. p. V002T26A001.

30.

Schuman

Young

Mitchell

, et al. Low Size, Weight, and Power Neuromorphic Computing to Improve Combustion Engine Efficiency. In: 2020 11th International Green and Sustainable Computing Workshops (IGSC). pp.1–8.

31.

Maldonado

Kaul

Schuman

Young

Mitchell

. Next-cycle optimal fuel control for cycle-to-cycle variability reduction in EGR-diluted combustion. IEEE Control Syst Lett 2021; 5(6): 2204–2209.

32.

Maldonado

Stefanopoulou

Kaul

, et al. Chapter 8 -artificial-intelligence-based prediction and control of combustion instabilities in spark-ignition engines. In: Badra

Pal

Pei

(eds) Artificial intelligence and data driven optimization of internal combustion engines. Cambridge, MA: Elsevier, 2022, pp.185–212.

33.

Maldonado

Zaseck

Kitagawa

Stefanopoulou

. Closed-loop control of combustion initiation and combustion duration. IEEE Trans Control Syst Technol 2020; 28(3): 936–950.

34.

Maldonado

Stefanopoulou

. Cycle-to-cycle feedback for combustion control of spark advance at the misfire limit. J Eng Gas Turbine Power 2018; 140(10): 102812.

35.

Jorques Moreno

Stenlåås

Tunestål

. Indicated efficiency optimization by in-cycle closed-loop combustion control of diesel engines. Control Eng Pract 2022; 122: 105097-1-105097-16.

36.

Maldonado

Kolmanovsky

Stefanopoulou

. Learning reference governor for cycle-to-cycle combustion control with misfire avoidance in spark-ignition engines at high exhaust gas recirculation–diluted conditions. Int J Engine Res 2020; 21(10): 1819–1834.

37.

Maldonado

Kaul

Schuman

Young

Mitchell

. Next-cycle optimal dilute combustion control via online learning of cycle-to-cycle variability using kernel density estimators. IEEE Trans Control Syst Technol 2022; 30(6): 2433–2449.

38.

Henry de Frahan

Wimer

Yellapantula

Grout

. Deep reinforcement learning for dynamic control of fuel injection timing in multi-pulse compression ignition engines. Int J Engine Res 2022; 23(9): 1503–1521.

39.

Bertsekas

. Reinforcement learning and optimal control. Athena scientific optimization | computation series. Nashua, NH: Athena Scientific, 2019.

40.

Luo

Maldonado

Liu

, et al. Portable In-cylinder pressure measurement and signal processing system for real-time combustion analysis and engine control. SAE Int J Adv Curr Pr Mobil 2020; 2(6): 3432–3441.

41.

Aleiferis

Taylor

Whitelaw

, et al. Cyclic Variations of Initial Flame Kernel Growth in a Honda VTEC-E Lean-Burn Spark-Ignition Engine. SAE Technical Paper 2000-01-1207. SAE International, pp.1340–1380.

42.

Lian

Martz

Maldonado

, et al. Prediction of flame burning velocity at early flame development time with high exhaust gas recirculation and spark advance. J Eng Gas Turbine Power 2017; 139(8): 082801–082801.

43.

Cha

Kwon

Cho

Park

. The effect of exhaust gas recirculation (EGR) on combustion stability, engine performance and exhaust emissions in a gasoline engine. KSME Int J 2001; 15(10): 1442–1450.

44.

Maldonado

Stefanopoulou

Scarcelli

, et al. Characteristics of Cycle-to-Cycle Combustion Variability at Partial-Burn Limited and Misfire Limited Spark Timing Under Highly Diluted Conditions. In: Proceedings of the ASME 2019 Internal Combustion Engine Division Fall Technical Conference. p. V001T03A018.

45.

Maldonado

Stefanopoulou

. Non-Equiprobable Statistical Analysis of Misfires and Partial Burns for Cycle-to-Cycle Control of Combustion Variability. In: Proceedings of the ASME 2018 Internal Combustion Engine Division Fall Technical Conference. ASME, p. V002T05A003.

46.

Zhang

Shen

. Chaos theory-based time series analysis of in-cylinder pressure and its application in combustion control of SI engines. J Therm Sci Technol 2020; 15(1): JTST0001–JTST0001.

47.

Stiffler

Kaul

Drallmeier

. Cyclic dynamics of misfires and partial burns in a dilute spark-ignition engine. Proc IMechE, Part D: J Automobile Engineering 2021; 235(2-3): 333–345.

48.

The Advanced Combustion and Emission Control (ACEC) Tech Team. Advanced Combustion and Emission Control Roadmap. Technical report, U.S. DRIVE Partnership, 2018.

49.

Pipitone

. A comparison between combustion phase indicators for optimal spark timing. J Eng Gas Turbine Power 2008; 130(5): 052808-1-052808-11. DOI: 10.1115/1.2939012

50.

Caton

. Combustion phasing for maximum efficiency for conventional and high efficiency engines. Energy Convers Manag 2014; 77: 564–576.

51.

Wallner

Sevik

Scarcelli

, et al. Effects of Ignition and Injection Perturbation under Lean and Dilute GDI Engine Operation. In: JSAE/SAE 2015 International Powertrains, Fuels & Lubricants Meeting. SAE International, pp. 1–10.

52.

Jatana

Kaul

. Determination of SI combustion sensitivity to fuel perturbations as a cyclic control input for highly dilute operation. SAE Int J Engines 2017; 10(3): 1011–1018.

53.

Maldonado

Kaul

. Evaluation of residual gas fraction estimation methods for cycle-tocycle combustion variability analysis and modeling. Int J Engine Res 2022; 23(2): 198–213.

54.

Konda

Tsitsiklis

. Actor-critic algorithms. In: Solla

Leen

Müller

(eds) Advances in neural information processing systems. Cambridge, MA: MIT Press, Vol. 12, 2001, pp. 1008–1014.

55.

Ameen

Mirzaeian

Millo

Som

. Numerical prediction of cyclic variability in a spark ignition engine using a parallel large eddy simulation approach. J Energy Resour Technol 2018; 140(5): 052203.

56.

Watkins

. Learning from delayed rewards. PhD Thesis, King’s College, Cambridge, 1989.

57.

Heywood

(ed.). Internal combustion engine fundamentals. 2nd ed. New York: McGraw-Hill Education, 2018.

58.

Schuman

Young

Maldonado

, et al. Real-Time Evolution and Deployment of Neuromorphic Computing at the Edge. In: 12th International Green and Sustainable Computing Workshop (IGSC), Pullman, WA, USA, pp.1–8.

59.

Maldonado

Kaul

Schuman

, et al. Dilute Combustion Control Using Spiking Neural Networks. SAE Technical Paper 2021-01-0534. SAE International, pp.1–13.

Reinforcement learning applied to dilute combustion control for increased fuel efficiency

Abstract

Keywords

Introduction

Experimental setup

Dilute limit in spark-ignition engines

Model-based cost function

Reinforcement learning algorithm

Experimental results

RL controller without λ feedback

RL controller with λ feedback

RL controller with modified cost function

Conclusion and outlook

Footnotes

Appendix

Authors’ note

Declaration of conflicting interests

Funding

ORCID iDs

References

RL controller without $λ$ feedback

RL controller with $λ$ feedback