Sage Journals: Discover world-class research

Abstract

A common strategy of extending the lifetime of an aging system is to reduce its workload below the normal operating level, a practice known as derating. While derating can slow the deterioration process, it often comes at the expense of reduced performance. Thus, derating involves a trade-off between performance and deterioration. Central to the optimal derating strategy is the relationship between deterioration and workload, also referred to as the pd-relationship. In practice, however, this relationship is rarely known a priori. We consider the workload optimization when the pd-relationship can be adaptively learned through sequential experimentation, or active learning. We show that the workload not only influences the performance and deterioration but also controls the speed of learning. The decision-maker must therefore account for the complex interplay between performance, deterioration, and information in real time. We formulate this problem as a partially observable Markov decision process and characterize the optimal policy. A key structural insight is that the optimal workload is always less than the myopic load. We further propose an efficient algorithm based on the fast Gauss transform to compute the optimal policies. The model is validated with vibration data and the performance of the optimal policy is compared against several heuristic policies.

Keywords

Active Bayesian Learning Condition-Based Maintenance Partially Observable Markov Decision Process (POMDP)Optimal Policies

1. Introduction

Almost all systems deteriorate with usage and experience a reduction in performance. The rate of deterioration often depends on the system’s workload, such as the production rate of a manufacturing system, the rotation speed of a wind turbine, or the loading of a power transformer. A high workload boosts the immediate performance but exerts more stress on the system and accelerates it deterioration, causing a reduction of its future performance. The setting of workload should therefore balance the short-term gain in performance and the long-term loss due to deterioration. This important problem has received much attention in the reliability and maintenance literature, for example, Conrad and McClamroch (1987), Schweitzer and Seidmann (1991), Sloan and Shanthikumar (2000), Cassady and Kutanoglu (2005), and uit het Broek et al. (2020).

To understand the practical challenges of workload control, refer to the following discussions on a power system forum (MikeHolt.com, 2015):

In my workplace, we had a good discussion on how to extend the life of a transformer. A question was raised on the effect on the transformer lifespan between 100% loading and 90% or 60%. Many argued that the 60% and 90% loading would extend the life of the transformer, with the 60% adding more life-years to the transformer. A small number of us argued that 60% or 90% loading will have the same effect compared to 100% loading $\dots$

One response to this posting is:

But at 60% loading, the relative efficiency is going to drop, meaning you will likely spend way more on wasted energy over the life of that transformer than the value of the extra life you attain. At 90%, the difference in efficiency will not be that great, plus you will be able to expect longer life.

The discussion above is about the optimal strategy of derating, which is the practice of reducing the workload below the normal operating level in order to extend the remaining useful life of the system. As a system deteriorates, it will often enter a new regime characterized by faster-than-normal degradation, in which the normal operating load is no longer optimal. By operating the transformer at a reduced load in this regime, less heat will be produced within the transformer, thereby reducing the rate of insulation degradation (Faiz et al., 2015). A recent vision letter from Palo Alto Research Center states that “(t)he ability of a machine to not only report its future failures, but also adjust its own functions according to its health condition is integral” (PARC, 2022). Such capabilities are also the defining features of “self-aware” or “self-configure” machines (Lee et al., 2013).

An everyday example of derating is setting the phone into the low-power mode when the battery gets low, which extends the battery life at the expense of reduced performance. This strategy has recently garnered significant interest from electric vehicle makers, who can use derating to prolong the life of Lithium-ion batteries, which are among the most expensive components of electric vehicles (Barreras et al., 2018; Ruan et al., 2023; Sun et al., 2018). Derating is also a common strategy for operating wind turbines approaching the end of service life (Bech et al., 2018). The operators can reduce the mechanical stress on the turbine by adjusting the angle of the blades and thereby lowering the amount of power that the turbine can generate. Derating is an attractive strategy because it can be implemented at minimal operation cost, but it should be implemented carefully by considering the trade-off between reduced performance and extended lifetime.

A common assumption in the existing literature is that the relationship between workload and deterioration is known, for example, uit het Broek et al. (2020). We shall follow the terminology of uit het Broek et al. (2020) and call it the pd-relationship, which stands for the production-deterioration relationship. But in many situations, the decision-maker (DM) is uncertain about the pd-relationship, namely, uncertain about how the system would deteriorate under different loads. This is well exemplified by the discussion quoted above about power transformers; the professionals are uncertain as to how the load affects deterioration. Moreover, no historical data is available for estimating the degradation under workloads that have not been used before.

It is natural to experiment with different workloads while the system is operating and observe how the system deteriorates differently under various workloads. This would allow the DM to estimate the pd-relationship with greater accuracy as data accumulate. It is possible to implement this online learning strategy in systems that are equipped with sensors, so that the workload can be adaptively adjusted in response to new data.

Generally speaking, different workloads can provide different levels of information about the unknown pd-relationship. For example, operating the system at a load significantly below its normal operating load, such as 60%, often provides more information about the pd-relationship than a workload marginally below the standard, such as 90%. Thus, each workload can be seen as a distinct learning mode for the pd-relationship. Choosing among different workloads is thus effectively an active learning process in which the DM chooses how much information to acquire at a given time.

The goal of this article is to develop a sequential decision model that adaptively learns the optimal workload by sequentially experimenting with various workloads. In this model, we account for both the uncertainty about pd-relationship and the ambiguity surrounding the actual degradation state. By changing the workload, the DM can control (i) performance, (ii) deterioration, and (iii) information. Previous studies have examined the joint control of performance and deterioration. This paper extends this literature by considering the control of information acquisition. The DM faces a complex interplay between performance, deterioration, and information.

We will consider partially observable systems, whose true degradation state cannot be directly observable. Maintenance practitioners are well acquainted with the false negatives and false positives associated with sensor data, as most sensors are subject to measurement errors. For example, a drop in temperature might cause the activation of a low-pressure alert suggesting a tire leakage when the tire is intact (false positive). Conversely, an increase in temperature might fail to trigger the warning system even when there is an actual tire leakage (false negative). A fully observable system can be analyzed as a special case of the partially observable system.

1.1. Relevant Literature

This article lies at the intersection of three streams of research in the maintenance decision-making literature: (1) dynamic load control, (2) parameter learning, and (3) partially observable deteriorating systems. We will review relevant studies in each stream.

1.2. Dynamic Load Control

In light of the principle “prevention is better than cure,” it is natural to consider reducing the workload of a machine to slow its deterioration. The idea of derating dates at least to the work of Taylor (1906) on the study of tool wear under different cutting speed.

Conrad and McClamroch (1987) developed a diffusion-threshold process to model the deterioration of tool wear. In their model, the wear evolves according to a Brownian motion whose drift depends on the workload. The machine fails as soon as the wear exceeds a threshold and is replaced by a new one. The DM decides the workload setting at each time points as well as the replacement times. The authors studied the open-loop control policy and one-step look ahead feedback control policy. Our degradation model also involves a Brownian motion process with varying drift term, but we use it to model the noisy observations from the sensor. The true degradation, which determines the performance and failure time, is not directly observable in our model. Rishel (1989) noted that the scalar Brownian motion can generate negative increments and thus is not suitable for modeling wear that only accumulates over time. Our model does not suffer from this limitation because we use Brownian motion to model the noisy observations rather than the actual wear. The latter is represented by the cumulative drift in our model and it is always nondecreasing in time. Furthermore, we consider online learning of the model parameters, which are assumed to be known by Conrad and McClamroch (1987) and Rishel (1989).

The seminal work of Conrad and McClamroch (1987) inspired the optimization of production rate in manufacturing systems. For example, Schweitzer and Seidmann (1991) studied the optimization of processing rate in multi-machine manufacturing systems under the queueing network framework. There is a stream of research on the joint scheduling of production and maintenance, which also address the trade-off between productivity and deterioration (Batun and Maillart, 2012; Sloan and Shanthikumar, 2000, 2002). In this literature, the DM can choose which product to process at a machine subject to deterioration. The products differ in terms of the reward and impact on the state of the machine. However, a common assumption is that the state of the machine is fully observable and the parameters of the system are known, which differ from the present article.

Recently, uit het Broek et al. (2020) studied the important problem of condition-based production planning. By assuming that the relation between production rate and deterioration rate (i.e., the pd-relationship) is known, the authors characterized the optimal control policies for a deterministic system and showed that similar structures hold in stochastic systems through extensive numerical studies. Our work differs from uit het Broek et al. (2020) in two major aspects: first, we consider the uncertainty regarding the pd-relationship and perform active learning of this relationship in a Bayesian fashion. Second, we consider partially observable stochastic systems, whose true degradation level is not directly observable and needs to be inferred from monitoring data. Another difference is that we jointly optimize the maintenance and workload decisions by accounting for their interdependency, whereas uit het Broek et al. (2020) assumed that maintenance is conducted at fixed times. For systems with adjustable workload, failure risk can be lowered through a load reduction, which may postpone the maintenance. Therefore, the optimal timing of maintenance depends on the workload history.

1.3. Partially Observable Deteriorating Systems

There is a large body of literature on maintenance optimization for partially observable deteriorating systems. See de Jonge and Scarf (2020) for a recent review. Partially observable stochastic deterioration is commonly modeled using hidden Markov model, cf. Maillart (2006) and Kim and Makis (2013). This literature is also closely related to the economic design of Bayesian control charts (Makis, 2008; Panagiotidou and Tagaras, 2010; Tagaras and Nikolaidis, 2002; Tagaras and Nenes, 2007; Wang and Lee, 2015). Both frameworks assume that the monitoring data are generated from a latent stochastic process representing the evolution of the true state. However, most existing studies focus on how to respond to the natural deterioration of the system (e.g., through replacement) rather than how to actively alter the course of deterioration, like we do in this article.

In this article, we propose a different framework to model partially observable systems. Our model is inspired by the Brownian motion process widely used in the degradation modeling literature (Conrad and McClamroch, 1987; Elwany et al., 2011). The difference is that we use Brownian motion to model the observations instead of the true degradation state. The latter is treated as a latent process in our model and can only be inferred from the observations. We argue that the model with latent state is more realistic because sensor data may not reveal the true state of the machine.

When the workload is adjustable, the remaining useful life of the product can be controlled and the decision involves the prognostics of the machine’s health condition in a nonstationary environment. The nonstationary deterioration caused by changing environment is studied by Kharoufeh et al. (2013) and Flory et al. (2014). But unlike these works, which consider deteriorations in an exogenous environment, our work consider deteriorations that can be endogenously controlled by the DM.

1.4. Online Parameter Learning

There is a fast-growing literature on maintenance optimization with Bayesian learning. The motivation is that the deterioration is often heterogeneous, with some units deteriorating faster than others due to unobserved idiosyncrasies. With online monitoring data, it is possible to learn the unique degradation pattern of each unit in real time and tailor the maintenance strategy toward an individual unit.

The online learning of partially observable systems with heterogeneity has been studied by Van Oosterom et al. (2017) and Abdul-Malak et al. (2019), where the heterogeneity is modeled in terms of discrete hidden states. Another related stream of studies consider learning the unknown parameters over a continuous space with respect to a conjugate prior. Elwany et al. (2011) considered a degradation system with unknown parameters. By assuming that the system will be replaced when the observed signal exceeds a pre-determined threshold, the authors characterized the optimal replacement policy. Chen et al. (2015) optimized the inspection interval in heterogenous degradation systems modeled by an inverse Gaussian process. Drent et al. (2023) studied the optimal stopping of shock degradation systems with unknown parameters. Skandari and Shechter (2021) considered the optimal control of fully observable Markov process with uncertain transition rates.

Unlike these studies, which decide whether to gather information in a period, we consider how much information to gather in the period. A distinguishing feature of our model is that the DM can change the mode of learning by choosing different workloads, with some loads yielding more information about the pd-relationship than others. In contrast, the existing literature on parameter learning (surveyed above) mostly consider a single workload, which corresponds to a single learning mode.

The problem of choosing among multiple alternative learning modes is generally referred to as active learning (Powell and Ryzhov, 2012; Ryzhov and Powell, 2011). Such problem arise in various applications including inventory management (Lariviere and Porteus, 1999; Lu et al., 2008) and demand learning (Farias and Van Roy, 2010; Harrison et al., 2012; Wang, 2021). A central dilemma in active learning is the trade-off between exploration and exploitation. In our setting, the deterioration of the machine gives rise to a dynamic process, in which the reward structure changes as the degradation accumulates. Tian et al. (2022) studied the adaptive clinical trials in which the DM needs to track both the posterior belief and system state, but in their application the system state is the number of patients in the system, which is fully observable. The complicating factor in our problem is that the DM cannot observe the true degradation level.

This article makes several contributions to the maintenance decision-making literature. First, we develop a dynamic load optimization model that can adaptively learn the pd-relationship. We show that the rate of learning can be adjusted by varying the workload. As such, the workload optimization is effectively an active learning problem, which has not been well studied in the maintenance literature. Second, we characterize the structure of the optimal policy. We prove that the optimal workload should be not greater than the Bayesian myopic load. To the best of our knowledge, no optimal policy has been established for learning the pd-relationship in a noisy environment. Third, we exploit the structural properties of optimal policies to develop an efficient algorithm, which is based on a novel application of the fast Gauss transform.

2. Model Formulation

We will first describe a dynamic degradation model and then formulate the dynamic derating problem. We consider a scenario in which a system has been operating under a constant workload $u_{A}$ for an extended period of time. The load $u_{A}$ corresponds to the normal operating load by design. Evidence suggests that the system has entered a fast degradation regime, which usually happens near the end of the service life. For example, the sensors have detected persistently higher magnitude of vibration from a gearbox, or higher concentration of dissolved gas in a power transformer. To extend the remaining useful life of the system, the DM is considering reducing the workload to, for example, $0.6 u_{A}$ , which is 60% of the normal load. However, she is uncertain about the deterioration rate under this new load because the system has not been operated under this load. Although the DM can try new loads and observe the change of degradation, there is substantial uncertainty involved because the true degradation state cannot be observed directly.

2.1. Dynamic Degradation Model

We set the time origin $t = 0$ as the starting time of the derating decision process. Let $u_{t}$ be the workload in period $t$ , taking value from the set $U \subset R^{+}$ , discrete or continuous, with the minimum and maximum elements denoted by $\underline{u}$ and $\bar{u}$ , respectively. By definition, the load used in derating is lower than the normal operating load, thus we assume that $\bar{u} \leq u_{A}$ . The cumulative degradation up to period $n$ is denoted by the following equation:

η_{n} = \sum_{t = 1}^{n} d_{t}, n = 1, \dots, T,

(1)

Here

d_{t}

represents the amount of degradation occurred during period

t

, which will be referred to as the degradation rate. The assumption is that the degradation level is shifted to zero at the beginning of the decision horizon.

Different workloads exert different stress on the system, which ultimately result in different degradation rates. The pd-relationship, namely, the relationship between the workload $u_{t}$ and the degradation rate $d_{t}$ is assumed to be

d_{t} = α + β ψ (u_{t}) .

(2)

This model describes the pd-relationship in the fast degradation regime. The intercept

α

can be interpreted as the baseline degradation rate unrelated to the workload. The slope

β

measures the impact of workload on degradation. The basis function

ψ : U \to R^{+}

is increasing and can be nonlinear. Examples include logarithm, exponential, or polynomial, which allow us to model the nonlinear relation between workload and the degradation rate.

We assume that the function $ψ$ is known but the parameters $α$ and $β$ are unknown. We treat the unknown parameters as random variables in the Bayesian paradigm. The DM can collect additional information while the system is operating in order to refine her knowledge about the unknown parameters. As will be shown inSection 3, different loads can provide different amount of information about the unknown parameters. Since a higher workload tends to accelerate the degradation, we assume that $β > \underline{β} > 0$ , where $\underline{β}$ is a known lower bound.

2.2. Condition Monitoring Data

Although the actual degradation level $η_{n}$ is not directly observable, the DM can observe the condition monitoring signal $x_{n}$ in real time, which depends on the degradation level as follows:

x_{n} = f (η_{0} + η_{n} + \sum_{t = 1}^{n} ϵ_{t}) .

(3)

Here

η_{0}

represents the initial degradation and

f : R \to R

is a known transformation function that is strictly monotone, so that the inverse transform

f^{- 1}

is well defined. Examples include the exponential and logarithm functions. The errors

ϵ_{t} \sim N (0, σ^{2})

are iid Gaussian with known variance

σ^{2}

. The cumulative sum of Gaussian errors

\sum_{t = 1}^{n} ϵ_{t}

leads to a random walk. Error terms of this form have been widely used to model condition monitoring data because they can capture the temporal correlation in the signal and are flexible enough to represent complex dynamics. Notable examples are Conrad and McClamroch (1987), Doksum and Hbyland (1992), Whitmore and Schenkelberg (1997), Elwany et al. (2011), among others.

Combining (1) to (3), we obtain

x_{n} = f (η_{0} + \sum_{t = 1}^{n} (α + β ψ (u_{t})) + \sum_{t = 1}^{n} ϵ_{t}) .

(4)

Note that the condition monitoring signal

x_{n}

depends on the unknown parameters

α, β

as well as the workload history

{u_{1}, \dots, u_{n}}

. Thus, the signal is nonstationary in a dynamic workload environment.

2.3. Action, Reward, and Objective

Given the workload history ${\tilde{u}}_{n - 1} = {u_{1}, \dots, u_{n - 1}}$ and the condition monitoring data ${\tilde{x}}_{n - 1} = {x_{1}, \dots, x_{n - 1}}$ up to the period $n - 1$ , the DM first decides whether to stop the process immediately at period $n$ or to continue the operation to the next period. If the decision is to stop, the DM collects an award $R_{n}$ . If continue, the DM will choose the workload $u_{n} \in U$ to be used in period $n$ . This generates an expected reward $r_{n} (u_{n}, η_{n})$ as well as an observation $x_{n}$ . The DM then updates the workload and monitoring histories to ${\tilde{u}}_{n} = {{\tilde{u}}_{n - 1}, u_{n}}, {\tilde{x}}_{n} = {{\tilde{x}}_{n - 1}, x_{n}}$ , respectively, and proceed to the next decision period.

The reward function $r_{n} (u_{n}, η_{n})$ is non-decreasing in $u_{n}$ and non-increasing in $η_{n}$ , which means that the workload boosts the reward but the deterioration hurts it. A simple example is the following piece-wise linear function:

\begin{aligned} r_{n} (u_{n}, η_{n}) = {\begin{cases} r u_{n}, & η_{n} \leq K, \\ - c_{f}, & η_{n} > K . \end{cases} \end{aligned}

(5)

Here

u_{n}

represents the number of products manufactured per unit time, each yielding an average profit

r

. When the exponential degradation is below the failure threshold, that is,

η_{n} \leq K

, the system is operating and generating an expected profit

r u_{n}

per period. Otherwise, the system fails and incurs a cost

c_{f}

per period.

Another example is $r_{n} (u_{n}, η_{n}) = u_{n} (K - e^{η_{n}})$ . Here, the per-unit profit is $K - e^{η_{n}}$ , which is decreasing in the degradation level $η_{n}$ , because the system deterioration can increase the number of defective products. The degradation starts from a low level, at which the profit is positive, $K - e^{η_{n}} > 0$ . As the system deteriorates, $η_{n}$ increases and the profit decreases exponentially due to accelerating failure. Once $e^{η_{n}} > K$ , the system fails, in which case continuing the production becomes costly, so the profit becomes negative, $K - e^{η_{n}} < 0$ . Note that the failure cost is endogenous because it depends on the condition of the system as well as the stopping time; if the process is stopped later, the cost will be higher.

Keep in mind that the reward function $r_{n}$ is general and is applicable to non-threshold case as well. The only assumption we make is that the performance is decreasing in degradation level. The DM’s objective is to maximize the expected total reward over $T$ periods. The interrelationship among the key variables are represented by the influence diagram in Figure 1. The reward is post-hoc since it may not be observable in real time. For example, certain defects in a product may not be noticed until the product is delivered to users.

Figure 1.

The influence diagram of the decision problem.

2.4. Relation to Existing Models

The model proposed in this article unifies several important degradation models in the literature and generalizes them to the dynamic environment. One example is the Arrhenius model, which is an empirical model describing how the failure time depends on the environmental stress such as the temperature. The relation between the failure time $t_{f}$ and the fixed stress $u$ is deterministically represented by

t_{f} = a \exp (b / u),

where

a, b

are constants. One way to understand this model is to consider a system with constant degradation rate

d = \exp (- b / u)

, which is higher when

u

gets larger. The system fails when the linear degradation path

η_{n} = n d

exceeds the threshold

a

, so the time-to-failure is given by

t_{f} = a / d = a \exp (b / u)

. Obviously, the Arrhenius model is a special case of our model with

ψ (u) = \exp (- b / u)

α = 0

, and

β = 1

Another closely related model is the exponential degradation model with Brownian errors. Elwany et al. (2011) considered the following model (equation (2) in their paper):

S_{n} = ϕ + θ \exp (β \cdot t_{n} + ϵ (t_{n}) - t_{n} \cdot σ^{2} / 2),

where

S_{n}

is the condition monitoring signal,

ϕ

is a constant,

θ

and

β

are unknown parameters. The error term

ϵ (t_{k})

follows a Brownian motion with zero mean and variance

σ^{2}

. This models describes the degradation under a constant workload. By fixing the workload at

u_{t} \equiv u

, our model serves as a discrete-time approximation to this model. To see this, let

x_{n} = S_{n} - ϕ

η_{0} = \ln (θ)

, replace

α + β u

with

β - σ^{2} / 2

and let

f = \exp

in (4). It is obvious that our model reduces to the exponential degradation model with the Brownian motion approximated by a random walk.

2.5. Optimal Derating Problem

We assume that the degradation rate under $u_{A}$ is known and is given by the following equation:

d_{A} = α + β ψ (u_{A}), u_{A} \geq \bar{u} .

(6)

The rational behind this assumption is the following: by the time that the DM recognized the fast degradation regime, the system has already been operating in that regime for some time and has generated some degradation data under the constant load

u_{A}

, which may enable a good estimation of the corresponding degradation rate

d_{A}

. Although the DM may not know the exact value of

d_{A}

, it can be argued that, given the availability of historical data, the uncertainty around the degradation under

u_{A}

should be lower than other workloads that have never been tried before, for which no data is available at all. Under this assumption,

d_{A}

and

ψ (u_{A})

are known but

α

and

β

are unknown.

Substituting (6) into (2), we obtain $d_{t} - d_{A} = β [ψ (u_{t}) - ψ (u_{A})] .$ To simplify the notation, we introduce a function

g (u) := ψ (u) - ψ (u_{A}),

which is negative for all

u \in U

. Using this notation, the relationship between the degradation rate

d_{t}

and workload

u_{t}

can be written as follows:

d_{t} - d_{A} = β g (u_{t}),

(7)

from which we observe that

β

is effectively the only unknown parameter. Another way to understand this result is noting from (6) that the random variables

α

and

β

are linearly dependent, so there is effectively only one unknown parameter to learn.

Since the degradation rate is almost never negative, we assume that its minimum value is positive. That is, $\underline{d} = d_{A} + β [ψ (\underline{u}) - ψ (u_{A})] > 0$ , which translates to an upper bound on the unknown parameter $β < \bar{β} = d_{A} / (ψ (u_{A}) - ψ (\underline{u}))$ . Recall Section 2.1 that $β$ also has a lower bound $\underline{β}$ . Therefore, we require $\underline{β} < β < \bar{β}$ in the learning process.

From now on, we will focus on the derating decision problem, in which $u_{t} \leq u_{A}$ . We can write the cumulative degradation up to period $n$ as $η_{n} = \sum_{t = 1}^{n} d_{t} = n d_{A} + β \sum_{t = 1}^{n} g (u_{t})$ . Define $ξ_{0} := 0$ and

ξ_{n} := \sum_{t = 1}^{n} g (u_{t}), n \geq 1.

The degradation accumulated up to period

n

η_{n} = n d_{A} + β ξ_{n} .

In this case, the workload

u_{n}

generates an immediate reward

r_{n} (u_{n}, η_{n}) = r_{n} (u_{n}, n d_{A} + β ξ_{n}) .

Note that the reward depends on the unknown parameter

β

, so reward maximization is coupled with parameter learning, which will be discussed in greater detail in the next section.

3. Active Learning and Control of Deterioration

The main challenge of this decision problem comes from the presence of unknown parameters. In the derating problem, the unknown parameters reduce to $β$ , which can be learned in an online fashion based on the workload sequence ${\tilde{u}}_{n}$ and condition monitoring data ${\tilde{x}}_{n}$ . The DM forms a prior belief $p_{0} (β)$ at the beginning and maintains a posterior belief $p (β | {\tilde{u}}_{n}, {\tilde{x}}_{n})$ over time. Although online Bayesian learning has been studied in the maintenance literature, a distinguishing feature of our model is that the DM can adjust the quality of information to be gathered in each period by changing the workload. We will explain the details below.

3.1. Bayesian Active Learning

Given the workload history ${\tilde{u}}_{n} = {u_{1}, \dots, u_{n}}$ and the corresponding observations history ${\tilde{x}}_{n} = {x_{1}, \dots, x_{n}}$ , the posterior belief on $β$ is given by the following equation:

p (β | {\tilde{u}}_{n}, {\tilde{x}}_{n}) = \frac{p ({\tilde{x}}_{n} | {\tilde{u}}_{n}, β) p_{0} (β)}{\int_{\underline{β}}^{\bar{β}} p ({\tilde{x}}_{n} | {\tilde{u}}_{n}, β) p_{0} (β) d β} .

(8)

To evaluate this belief, we introduce a new variable

y_{n} ≜ f^{- 1} (x_{n}) - f^{- 1} (x_{n - 1}),

which measures the increments of the signal after the inverse transform

f^{- 1}

. By (1) to (3), we have

y_{n} = d_{n} + ϵ_{n}

, which, after combining with (7), becomes

y_{n} = d_{A} + β g (u_{n}) + ϵ_{n} .

(9)

This suggests that the sequence

{\tilde{y}}_{n} = {y_{2}, \dots, y_{n}}

contains all the information about

β

that is stored in

{\tilde{x}}_{n}

. Therefore, we can express the posterior belief in terms of

{\tilde{y}}_{n}

as follows:

p (β | {\tilde{u}}_{n}, {\tilde{x}}_{n}) = \frac{p ({\tilde{y}}_{n} | {\tilde{u}}_{n}, β) p_{0} (β)}{\int_{\underline{β}}^{\bar{β}} p ({\tilde{y}}_{n} | {\tilde{u}}_{n}, β) p_{0} (β) d β},

(10)

where

\begin{aligned} p ({\tilde{y}}_{n} | {\tilde{u}}_{n}, β) & = (\frac{τ}{2 π})^{n / 2} \exp {- \frac{1}{2} β^{2} τ \sum_{t = 1}^{n} g^{2} (u_{t}) \\ + β τ \sum_{t = 1}^{n} (y_{t} - d_{A}) g (u_{t}) - \frac{τ}{2} \sum_{t = 1}^{n} (y_{t} - d_{A})^{2}} \end{aligned}

(11)

is the likelihood function corresponding to a normal random variable. This is because the error

ϵ_{n}

in (9) is normal with precision

τ = 1 / σ^{2}

Before proceeding, it is helpful to introduce some notations. Let $S$ be the set of all probability density functions on $R$ . Define the Bayes operator $B : S \times R \times R \to S$ that updates a belief $p (β) \in S$ to

B (p (β), u, y) ≜ \frac{p (y | u, β) p (β)}{\int_{\underline{β}}^{\bar{β}} p (y | u, β^{'}) p (β^{'}) d β^{'}}

(12)

given a pair of action and observation

(u, y)

. We will use

ϕ (β | τ_{n}, μ_{n})

to denote a Gaussian belief over

β

with precision

τ_{n}

and mean

μ_{n}

, and use

π (β | τ_{n}, μ_{n}) ≜ \frac{ϕ (β | τ_{n}, μ_{n}) \cdot I_{{\underline{β} < β < \bar{β}}}}{ρ (τ_{n}, μ_{n})}

(13)

to denote the corresponding truncated normal belief in the interval

(\underline{β}, \bar{β})

. Here,

I_{{\underline{β} < β < \bar{β}}}

is an indicator function and

ρ (τ_{n}, μ_{n}) = \int_{\underline{β}}^{\bar{β}} ϕ (β | τ_{n}, μ_{n}) d β

is a normalization factor.

To simplify the expressions, we use the normal conjugate prior. Specifically, we choose the prior $p_{0} (β)$ as the truncated normal distribution $π (β | τ_{0}, μ_{0})$ that satisfies the constraint $\underline{β} < β < \bar{β}$ . This prior also ensures all subsequent posteriors satisfy the constraint. Under this prior, the posterior belief can be fully characterized by the hyperparameters (i.e., mean and precision) and the belief update can be carried out by updating the hyperparameters. Specifically, given the action-observation pair $(u_{n + 1}, y_{n + 1})$ , the Bayes operator updates the belief $π (β | τ_{n}, μ_{n})$ to

π (β | τ_{n + 1}, μ_{n + 1}) = B (π (β | τ_{n}, μ_{n}), u_{n + 1}, y_{n + 1}),

(14)

which is also a truncated normal belief with hyperparameters

\begin{aligned} τ_{n + 1} & = z_{τ} (τ_{n}, u_{n + 1}) = τ_{n} + τ g^{2} (u_{n + 1}), \\ μ_{n + 1} & = z_{μ} (μ_{n}, τ_{n}, u_{n + 1}, y_{n + 1}) = \frac{τ_{n} μ_{n} + τ g (u_{n + 1}) (y_{n + 1} - d_{A})}{τ_{n} + τ g^{2} (u_{n + 1})} . \end{aligned}

(15)

Remark 1 (Active Learning)

The key to understanding active learning lies in equation (15). The precision of the posterior belief $τ_{n + 1}$ measures the level of confidence in Bayesian learning. A higher precision represents a more certain estimation. Note that the precision of the posterior belief depends on the workload decision $u_{n + 1}$ through $g^{2} (u_{n + 1})$ . For example, a workload with a higher value of $g^{2} (u)$ is more efficient at learning the unknown parameter because it can increase the precision of the posterior belief. For example, consider the special case of $ψ (u) = u$ , which leads to $g^{2} (u) = (u - u_{A})^{2}$ . By choosing a workload much lower than the normal operating load $u_{A}$ , we can increase $g^{2} (u)$ and hence learn the pd-relationship more efficiently. This suggests that different workloads can lead to different speeds of learning. Since $u$ is a decision variable, the DM can thus adjust the efficiency of learning. This is different from the existing maintenance literature with online learning (Chen et al., 2015; Drent et al., 2023; Elwany et al., 2011; Van Oosterom et al., 2017), which consider constant workload and, as a result, includes only one learning mode.

3.2. Partially Observable Markov Decision Process (POMDP) Formulation

When selecting the workload, the DM needs to balance the immediate and future reward. A higher workload increases the immediate reward but accelerates the degradation and thus reduces the future reward. This task is further complicated by the unobservability of the degradation level $η_{n}$ and the uncertainty about how workload affects the degradation, which is represented by the unknown parameter $β$ . We formulate a POMDP to analyze this dynamic decision problem with Bayesian active learning.

Based on results from the previous section, we use the triplet $(τ_{n}, μ_{n}, ξ_{n})$ as the state variable. The hyperparameters $τ_{n}$ and $μ_{n}$ are sufficient to represent the posterior belief for the unknown parameter $β$ . And together with $ξ_{n}$ , they can fully characterize the distribution of the degradation level $η_{n}$ . The value function, denoted by $V_{n} (τ_{n}, μ_{n}, ξ_{n})$ , represents the maximum expected reward from period $n$ onward starting from the state $(τ_{n}, μ_{n}, ξ_{n})$ . They satisfy the following optimality equations:

\begin{aligned} V_{n} (τ_{n}, μ_{n}, ξ_{n}) \\ = max {R, sup_{u \in [\underline{u}, \bar{u}]} {u \int_{\underline{β}}^{\bar{β}} r_{n + 1} (u, (n + 1) d_{A} \\ + β [ξ_{n} + g (u)]) π (β | τ_{n}, μ_{n}) d β + \int_{- \infty}^{\infty} V_{n + 1} \\ \times (τ_{n} + τ g^{2} (u), \frac{τ_{n} μ_{n} + τ g (u) (y - d_{A})}{τ_{n} + τ g^{2} (u)}, ξ_{n} + g (u)) \\ \times [\int_{\underline{β}}^{\bar{β}} p (y | β, u) π (β | τ_{n}, μ_{n}) d β] d y}}, \end{aligned}

(16)

for

n < T

, and

V_{T} (τ_{T}, μ_{T}, ξ_{T}) \equiv 0

The first term inside the maximum operator, $R$ , is the expected reward of stopping immediately. The second term corresponds to the reward of continuation, in which case a workload decision $u$ is selected through an optimization process. For each possible choice of $u$ , which ranges between the lower limit $\underline{u}$ and the upper limit $\bar{u}$ , we compute the expected value, which consists of the immediate expected reward in period $n + 1$ ,

u \int_{\underline{β}}^{\bar{β}} r_{n + 1} (u, (n + 1) d_{A} + β [ξ_{n} + g (u)]) π (β | τ_{n}, μ_{n}) d β,

and the expected value-to-go after period

n + 1

\begin{aligned} \int_{- \infty}^{\infty} V_{n + 1} (τ_{n} + τ g^{2} (u), \frac{τ_{n} μ_{n} + τ g (u) (y - d_{A})}{τ_{n} + τ g^{2} (u)}, ξ_{n} + g (u)) \\ \times [\int_{\underline{β}}^{\bar{β}} p (y | β, u) π (β | τ_{n}, μ_{n}) d β] d y . \end{aligned}

The expected value-to-go is calculated by considering all possible values of the next observation, denoted by

y

. The integral

\int_{\underline{β}}^{\bar{β}} p (y | β, u) π (β | τ_{n}, μ_{n}) d β

is the posterior predictive density of

y

given the current posterior belief

(τ_{n}, μ_{n})

. Here,

π (β | τ_{n}, μ_{n})

is defined in (13) and

p (y | β, u)

is a normal density in the form of (11).

Given the action $u$ and corresponding observation $y$ , the belief is updated to

(τ_{n} + τ g^{2} (u), \frac{τ_{n} μ_{n} + τ g (u) (y - d_{A})}{τ_{n} + τ g^{2} (u)})

and the state

ξ_{n}

becomes

ξ_{n} + g (u)

. They are the arguments of

V_{n + 1}

appeared in the expression.

The workload that maximizes the reward of continuation is denoted by $u_{n + 1}^{*} (τ_{n}, μ_{n}, ξ_{n})$ and is referred to as the optimal workload. We also define the myopic workload as the option that maximizes the immediate reward given the current information:

\begin{aligned} u_{n + 1}^{M} (τ_{n}, μ_{n}, ξ_{n}) & ≜ sup {argmax}_{u \in U} {u \int_{\underline{β}}^{\bar{β}} r_{n + 1} (u, (n + 1) d_{A} \\ + β [ξ_{n} + g (u)]) π (β | τ_{n}, μ_{n}) d β}, \end{aligned}

(17)

where supremum is used because the maximum may be attained at multiple points.

4. Structure of Optimal Policies

An optimal policy can be found by solving the optimality equations iteratively through backward induction. This requires enumerating through all combinations of action $u$ and state $(τ_{n}, μ_{n}, ξ_{n})$ , which can be time-consuming. Further, for each action-state combination, we also need to evaluate the posterior predictive density by calculating the following integral:

\int_{\underline{β}}^{\bar{β}} p (y | β, u) π (β | τ_{n}, μ_{n}) d β,

which has no closed-form expression. So we have to resort to numerical integration, which further increases the computational requirement. Therefore, a direct computation of (16) can be prohibitive. To overcome this challenge, we will characterize the structures of the value function and optimal policies and exploit the structure to develop efficient solution algorithms.

Toward this end, we first reformulate the original optimality equations in the following proposition:

Proposition 1
Define $V_{n} (τ_{n}, μ_{n}, ξ_{n}) ≜ ρ (τ_{n}, μ_{n}) V_{n} (τ_{n}, μ_{n}, ξ_{n})$ . The following equations hold
$\begin{aligned} V_{n} (τ_{n}, μ_{n}, ξ_{n}) \\ = max {ρ (τ_{n}, μ_{n}) R, sup_{u \in [\underline{u}, \bar{u}]} {u \int_{\underline{β}}^{\bar{β}} r_{n + 1} (u, (n + 1) d_{A} \\ + β [ξ_{n} + g (u)]) ϕ (β | τ_{n}, μ_{n}) d β + \int_{- \infty}^{\infty} V_{n + 1} \\ \times (τ_{n} + τ g^{2} (u), \frac{τ_{n} μ_{n} + τ g (u) (y - d_{A})}{τ_{n} + τ g^{2} (u)}, ξ_{n} + g (u)) \\ \times [\frac{1}{\sqrt{2 π (1 / τ + g^{2} (u) / τ_{n})}} \\ \times \exp {- \frac{(y - d_{A} - μ_{n} g (u))^{2}}{2 (1 / τ + g^{2} (u) / τ_{n})}}] d y}} . \end{aligned}$
(18)

All proofs in this article are given in the Electronic Companion. Since (18) is a reformulation of the original optimality equation (16), it can be used to compute optimal policies. This reformulation brings two computational advantages. First, the posterior predictive density (the third line of (18)) is normal, which has a closed-form expression and hence requires no numerical integration. Second, the integral with respect to $y$ (the second and third lines) can be greatly simplified into a convolution between the value function and a Gaussian kernel, as given in the following theorem:
Theorem 1
The optimality equation (18) can be rewritten as follows:
$\begin{aligned} V_{n} (τ_{n}, μ_{n}, ξ_{n}) \\ = max {ρ (τ_{n}, μ_{n}) R, sup_{u \in [\underline{u}, \bar{u}]} {u \int_{\underline{β}}^{\bar{β}} r_{n + 1} (u, (n + 1) d_{A} \\ + β [ξ_{n} + g (u)]) ϕ (β | τ_{n}, μ_{n}) d β \\ + \int_{- \infty}^{\infty} V_{n + 1} (τ_{n} + τ g^{2} (u), μ, ξ_{n} + g (u)) \\ \times \frac{\exp {- \frac{(μ - μ_{n})^{2}}{2 v (τ_{n}, u)}}}{\sqrt{2 π v (τ_{n}, u)}} d μ}}, \end{aligned}$
(19)
where
$v (τ_{n}, u) = \frac{1}{τ_{n}} - \frac{1}{τ_{n} + τ g^{2} (u)} .$
(20)

Given the state $(τ_{n}, μ_{n}, ξ_{n})$ and workload decision $u$ , the components $τ_{n + 1} = τ_{n} + τ g^{2} (u)$ and $ξ_{n + 1} = ξ_{n} + g (u)$ of the next state can be fully determined because neither of them depend on $y$ . It is important to note that the observation $y$ only influences the posterior mean $μ_{n + 1} = z_{μ} (μ_{n}, τ_{n}, u, y)$ according to (15). Therefore, for a given prior belief $(τ_{n}, μ_{n})$ and decision $u$ , there is a one-to-one correspondence between the observation $y$ and the posterior mean $μ_{n + 1}$ . This enables us to change the variable of integration from $y$ to $μ_{n + 1}$ , which is denoted by $μ$ in (19). This change of variable leads to a significant simplification. We observe from (20) that the variance of the Gaussian kernel $v (τ_{n}, u)$ depends only on the belief precision $τ_{n}$ and the decision $u$ . It represents the reduction of uncertainty about $β$ after taking the decision $u$ , in which the uncertainty is measured by the variance of posterior distribution.

Theorem 1 is important because it shows that the value-to-go function, which is the most computationally intensive function, can be recasted as a convolution transform. The convolution transform of a function $f$ with respect to a kernel $g$ is defined as $(f * g) (x) = \int_{- \infty}^{\infty} f (μ) g (x - μ) d μ$ . By letting $f (μ) = V_{n + 1} (τ_{n} + τ g^{2} (u), μ, ξ_{n} + g (u))$ , $g (x - μ) = \exp {- \frac{(μ - x)^{2}}{2 v (τ_{n}, u)}} / \sqrt{2 π v (τ_{n}, u)}$ , and $x = μ_{n}$ , we obtain the value-to-go function in (19).

Note that an efficient algorithm exists for the convolution transform with Gaussian kernel. As such, we can utilize the efficient convolution algorithm to accelerate the computation of value function. The details will be presented inSection 5.2.
Proposition 2
$V_{n} (τ_{n}, μ_{n}, ξ_{n})$ is decreasing in $ξ_{n}$ for all $τ_{n}, μ_{n}$ , and $n$ .

To understand this proposition, recall that the cumulative degradation up to period $n$ is $η_{n} = n d_{A} + β ξ_{n}$ , which is linear and increasing in $ξ_{n}$ . So the above proposition suggests that a more deteriorated system has a lower expected reward-to-go. This monotonicity will be used to establish the threshold structure of the optimal stopping rule.
Proposition 3
$\begin{aligned} V_{n + 1} (τ_{n}, μ_{n}, ξ) & \leq \int_{- \infty}^{\infty} V_{n + 1} (τ_{n} + τ g^{2} (u), μ, ξ) \\ \times \frac{\exp {- \frac{(μ - μ_{n})^{2}}{2 v (τ_{n}, u)}}}{\sqrt{2 π v (τ_{n}, u)}} d μ, \end{aligned}$
(21)
for all $τ_{n}, μ_{n}$ , and $ξ$ .

This proposition suggests that the convolution transform in (19) can increase the value function. This property will be used to establish the following main theorem:
Theorem 2 (Structure of Optimal Policy)

For any triple $(n, τ_{n}, μ_{n})$ , there exists a threshold $κ_{n} (τ_{n}, μ_{n})$ such that it is optimal to stop if $ξ_{n} > κ_{n} (τ_{n}, μ_{n})$ and to continue if $ξ_{n} \leq κ_{n} (τ_{n}, μ_{n})$ . If continue, the optimal workload is never greater than the myopic workload, namely,

u_{n + 1}^{*} (τ_{n}, μ_{n}, ξ_{n}) \leq u_{n + 1}^{M} (τ_{n}, μ_{n}, ξ_{n}) .

This theorem characterizes the structure of optimal policies. The most important structural insight is that the optimal workload is always less than or equal to the myopic workload. An intuitive understanding is that a lower workload can not only slow the deterioration process and increase the remaining useful life, but is also more efficient at learning the unknown parameter. To the best of our knowledge, few structural results have been obtained with respect to the optimal adjustment of learning modes for deteriorating systems. Another useful insight from the theorem is that the optimal stopping rule has a threshold structure, which is a common observation in maintenance decision models. There is a belief-dependent threshold $κ_{n} (τ_{n}, μ_{n})$ such that it is optimal to stop when the cumulative transformed workload $ξ_{n}$ exceeds the threshold.

The structural properties can be used to reduce the computation. For example, when searching for the optimal workload with respect to the state $(τ_{n}, μ_{n}, ξ_{n})$ , we can first compute the myopic workload using (17) and then eliminate all the workload greater than the myopic workload from the computation. We will discuss the details in the next section.

5. Computation of Optimal Policies

In this section, we present an efficient value-iteration algorithm to compute the optimal policies. The algorithm is based on discretizing the joint space of $(τ_{n}, μ_{n}, ξ_{n}, u)$ by a grid and approximating the off-grid points by the nearest grid point. The proposed method for computing the value-to-go function is not limited to this particular problem. It can also be applied to other online learning problems that involve Gaussian beliefs.

Figure 2.

Illustration of the value iteration with fast Gauss transform.

5.1. State Space Characterization

The design of grid requires the characterization of the joint space. We know that the action space is $U$ and the space for $μ_{n}$ is the real line $R$ , both of which will be discretized in the computation. But the space for $τ_{n}$ and $ξ_{n}$ reachable in the decision process depends on the historical actions $(u_{1}, \dots, u_{n})$ . We define the joint state space of $(τ_{n}, ξ_{n})$ at period $n$ as follows:

Ω_{n} ≜ {(τ_{0} + \sum_{t = 1}^{n} g^{2} (u_{t}), \sum_{t = 1}^{n} g (u_{t})) : u_{1}, \dots, u_{n} \in U},

and will show that

Ω_{n}

can be characterized by the forward induction of Minkowski sums. The Minkowski sum of set

A

and

B

is formed by adding each element in

A

to each element in

B

, namely,

A \oplus B = {a + b, a \in A, b \in B}

. Define vectors

{\vec{h}}_{0} = (τ_{0}, 0)

and

\vec{h} (u_{t}) = (g^{2} (u_{t}), g (u_{t}))

. Using this notation, the space

Ω_{n}

can be expressed as

Ω_{n} = {{\vec{h}}_{0} + \vec{h} (u_{1}) + \dots + \vec{h} (u_{n}) : u_{1}, \dots, u_{n} \in U}

. Let

\vec{H} (u) ≜ {(g^{2} (u), g (u)) : u \in U}

, then

Ω_{1} = {\vec{h}}_{0} \oplus \vec{H} (u)

, and the subsequent sets can be calculated from the iteration

Ω_{n + 1} = Ω_{n} \oplus \vec{H} (u)

5.2. Computing the Value-to-Go Function

The most computationally intensive step of calculating $V_{n} (τ_{n}, μ_{n}, ξ_{n})$ is the evaluation of the following integral:

\begin{aligned} {\bar{V}}_{n + 1} (u, τ_{n}, μ_{n}, ξ_{n}) & = \int_{- \infty}^{\infty} V_{n + 1} (τ_{n} + τ g^{2} (u), μ, ξ_{n} + g (u)) \\ \times \frac{\exp {- \frac{(μ - μ_{n})^{2}}{2 v (τ_{n}, u)}}}{\sqrt{2 π v (τ_{n}, u)}} d μ, \end{aligned}

(22)

over the four-dimensional space for

(u, τ_{n}, μ_{n}, ξ_{n})

. The amount of computation can be prohibitive if we evaluate this integral directly through numerical integration. The key idea of our algorithm is to view

{\bar{V}}_{n + 1}

as a function of

μ_{n}

when the remaining arguments

(u, τ_{n}, ξ_{n})

are fixed. From this perspective,

{\bar{V}}_{n + 1} (u, τ_{n}, \cdot, ξ_{n})

is effectively a Gauss transform of the function

V_{n + 1} (τ_{n} + τ g^{2} (u), \cdot, ξ_{n} + g (u))

. This important observation enables us to use the fast Gauss transform (FGT) to evaluate (22) (Greengard and Strain, 1991).

To implement the FGT, we first discretize the space of $μ_{n}$ and $μ_{n + 1}$ (which is denoted by $μ$ in (22)) into equally spaced points ${t_{1}, \dots, t_{N^{'}}}$ and ${s_{1}, \dots, s_{N}}$ , which are referred to as the target points and the source points, respectively. The step size of discretization is denoted by $Δ μ$ . A bird’s-eye view of the state-action space of $(u, τ_{n}, μ_{n})$ is shown in Figure 2.

The integral (22) can be approximated as follows:

\begin{aligned} {\bar{V}}_{n + 1} (u, τ_{n}, t_{i}, ξ_{n}) & \approx \sum_{j = 1}^{N} Δ μ \frac{V_{n + 1} (τ_{n} + τ g^{2} (u), s_{j}, ξ_{n} + g (u))}{\sqrt{2 π v (τ_{n}, u)}} \\ \times \exp {- \frac{(t_{i} - s_{j})^{2}}{2 v (τ_{n}, u)}}, \end{aligned}

(23)

where

t_{i}

represents the current posterior mean

μ_{n}

, and

s_{j}

represents the updated posterior mean

μ_{n + 1}

. The key observation from (23) is that

{\bar{V}}_{n + 1}

can be approximated by the sum of

N

Gaussian functions, each of which is evaluated at

N^{'}

sample points. In addition, Theorem 3.1 of Moré and Wu (1997) suggests that

{\bar{V}}_{n + 1} (u, τ_{n}, μ_{n}, ξ_{n})

is a smooth function of

μ_{n}

, so the value between two sample points can be well approximated by interpolation.¹

The mathematical basis of the FGT is the Hermite expansion of the Gaussian function

\exp {- (t - s)^{2}} = \sum_{k = 0}^{\infty} \frac{s^{k}}{k!} h_{k} (t),

in which

h_{k} (t)

are Hermite functions defined by

h_{k} (t) = (- 1)^{k} (\frac{d}{d t})^{k} e^{- t^{2}} .

The Gaussian function with shift and scaling can be written as follows:

\begin{aligned} \exp {- \frac{(t_{i} - s_{j})^{2}}{2 v (τ_{n}, u)}} & = \sum_{k = 0}^{\infty} \frac{1}{k!} {(\frac{s_{j} - s_{J}}{\sqrt{2 v (τ_{n}, u)}})}^{k} \\ \times h_{k} (\frac{t_{i} - s_{J}}{\sqrt{2 v (τ_{n}, u)}}), \end{aligned}

(24)

where

s_{J}

is a point near

s_{j}

. The Hermite function on the right-hand side of (24) can be further expanded around another point

t_{I}

(near

t_{i}

) as follows:

\begin{aligned} \exp {- \frac{(t_{i} - s_{j})^{2}}{2 v (τ_{n}, u)}} & = \sum_{ℓ = 0}^{\infty} \sum_{k = 0}^{\infty} \frac{1}{ℓ!} \frac{1}{k!} {(\frac{s_{j} - s_{J}}{\sqrt{2 v (τ_{n}, u)}})}^{k} h_{k + ℓ} \\ \times (\frac{t_{I} - s_{J}}{\sqrt{2 v (τ_{n}, u)}}) {(\frac{t_{i} - t_{I}}{\sqrt{2 v (τ_{n}, u)}})}^{ℓ} . \end{aligned}

(25)

Since the Hermite expansion converges very quickly, one can truncate the infinite sums at a small integer,

k_{max}

, to approximate (25) with a high precision. For example, choosing

k_{max} = 8

is sufficient to achieve a relative error of

10^{- 8}

when

| (s_{j} - s_{J}) / \sqrt{2 v (τ_{n}, u)} | < 1 / 2

and

| (t_{i} - t_{I}) / \sqrt{2 v (τ_{n}, u)} | < 1 / 2

, namely, when

s_{j}

is close to

s_{J}

, and

t_{i}

is close to

t_{I}

Given $τ_{n}$ and $u$ , we partition the space of the posterior mean into multiple, non-overlapping intervals of length $\sqrt{2 v (τ_{n}, u)}$ and perform computation on each pair of a source interval (denoted by $J$ ) and a target interval (denoted by $I$ ). Suppose the source space is divided into $M$ intervals, where each interval contains $n$ points. Hence, $N = n M$ . Similarly, the target space is divided into $M^{'}$ intervals with $n^{'}$ points inside each interval, so $N^{'} = n^{'} M^{'}$ . Let $s_{J}$ and $t_{I}$ represent the centers of the source and target intervals, respectively. An illustration is provided in Figure 2. For each point inside the interval $I$ , for example $t_{i}$ , we substitute (25) into (23) and obtain

\begin{aligned} {\bar{V}}_{n + 1} (u, τ_{n}, t_{i}, ξ_{n}) \\ \approx \sum_{j = 1}^{N} \frac{Δ μ V_{n + 1} (τ_{n} + τ g^{2} (u), s_{j}, ξ_{n} + g (u))}{\sqrt{2 π v (τ_{n}, u)}} \\ \times {\sum_{ℓ = 0}^{k_{max}} \sum_{k = 0}^{k_{max}} \frac{1}{ℓ!} \frac{1}{k!} {(\frac{s_{j} - s_{J}}{\sqrt{2 v (τ_{n}, u)}})}^{k} h_{k + ℓ} \\ \times (\frac{t_{I} - s_{J}}{\sqrt{2 v (τ_{n}, u)}}) {(\frac{t_{I} - t_{i}}{\sqrt{2 v (τ_{n}, u)}})}^{ℓ}} \\ = \sum_{ℓ = 0}^{k_{max}} {\sum_{J} \sum_{k = 0}^{k_{max}} [\frac{1}{k!} \sum_{s_{j} \in J} \frac{Δ μ V_{n + 1} (τ_{n} + τ g^{2} (u), s_{j}, ξ_{n} + g (u))}{\sqrt{2 π v (τ_{n}, u)}} \\ \times {(\frac{s_{j} - s_{J}}{\sqrt{2 v (τ_{n}, u)}})}^{k}] \\ \times h_{k + ℓ} (\frac{t_{I} - s_{J}}{\sqrt{2 v (τ_{n}, u)}})} \frac{1}{ℓ!} {(\frac{t_{I} - t_{i}}{\sqrt{2 v (τ_{n}, u)}})}^{ℓ} . \end{aligned}

(26)

Before computing (26), we first choose the truncation term

k_{max}

(= 8 by default) and the number of intervals

M

M^{'}

for the source and target space, respectively. Then, we partition the source and target points into multiple intervals with length

\sqrt{2 v (τ_{n}, u)}

. The value-to-go (26) is computed in three steps by starting from the inner layer and moving to the outer layers:

In the first step, we compute the term inside the brackets in the third line of (26). Let $s_{J}$ denote the center of the source interval $J = 1, \dots, M$ . We compute

\begin{aligned} A_{k, J} \leftarrow \frac{1}{k!} \sum_{s_{j} \in J} \frac{Δ μ V_{n + 1} (τ_{n} + τ g^{2} (u), s_{j}, ξ_{n} + g (u))}{\sqrt{2 π v (τ_{n}, u)}} \\ \times {(\frac{s_{j} - s_{J}}{\sqrt{2 v (τ_{n}, u)}})}^{k}, \end{aligned}

for

k = 0, \dots, k_{max}

. The outputs

{A_{0, J}, \dots, A_{k_{max}, J}}

summarize the relevant information from the source interval

J

In the second step, we calculate the term inside the curly brackets in the last two lines of (26). Let $t_{I}$ denote the center of the target interval $I = 1, \dots, M^{'}$ . We calculate

B_{ℓ, I} \leftarrow \sum_{J} \sum_{k = 0}^{k_{max}} A_{k, J} h_{k + ℓ} (\frac{t_{I} - s_{J}}{\sqrt{2 v (τ_{n}, u)}}),

for

ℓ = 0, \dots, k_{max}

. The outputs

{B_{0, I}, \dots, B_{k_{max}, I}}

represent the relations between the target interval

I

and different source intervals.

In the third step, we utilize the outputs of the second step to compute the final result. For each $t_{i}$ in the target interval $I$ , we compute

{\bar{V}}_{n + 1} (u, τ_{n}, t_{i}, ξ_{n}) \leftarrow \sum_{ℓ = 0}^{k_{max}} B_{ℓ, I} \frac{1}{ℓ!} {(\frac{t_{I} - t_{i}}{\sqrt{2 v (τ_{n}, u)}})}^{ℓ} .

A direct computation of (23) requires

O (N N^{'})

work, which is computationally expensive when

N

and

N^{'}

are large. With the fast Gauss transform, the computational work in Steps 1, 2, and 3 are

O (n)

O (M M^{'})

, and

O (n^{'})

, respectively. Therefore, the total computational work is

O (n + n^{'} + M M^{'})

, which is much lower than

O (N N^{'}) = O (n n^{'} M M^{'})

. Furthermore, it is often unnecessary to evaluate all the

M M^{'}

cross products in Step 2. This is because the Gaussian function

\exp {- (t - s)^{2}}

decreases rapidly as

t

moves away from

s

, so the interactions among distant intervals are negligible.

5.3. Reducing the State-Action Space

Theorem 2 can be used to further reduce the computation by eliminating suboptimal actions. Given the state variables $(τ_{n}, μ_{n}, ξ_{n})$ , we can easily compute the myopic workload $u_{n + 1}^{M} (τ_{n}, μ_{n}, ξ_{n})$ . Since Theorem 2 suggests that the optimal workload cannot exceed the myopic rate, we can eliminate all the actions (workload) above $u_{n + 1}^{M}$ from consideration. This enables us to prune a potentially large portion of the action space away from computation.

Note that the FGT algorithm evaluates ${\bar{V}}_{n + 1} (u, τ_{n}, μ_{n}, ξ_{n})$ by fixing $u, τ_{n}, ξ_{n}$ and looping over $μ_{n}$ . To incorporate state-action pruning into this algorithm, we will prune the state for a give action rather than pruning the action for a given state. Specifically, for a given value of $u$ we will eliminate the values of $μ_{n}$ that violate the inequality in Theorem 2. Consider the triple $(u, τ_{n}, ξ)$ , it follows from Theorem 2 that $u^{*} \leq u_{n + 1}^{M} (τ_{n}, μ_{n}, ξ_{n})$ . Therefore, if $u > u_{n + 1}^{M} (τ_{n}, μ, ξ_{n})$ then the action $u$ cannot be the optimal action for the state $μ$ .

6. Case Study

We first analyze a real-world degradation dataset from the PRONOSTIA experimental platform (Nectoux et al., 2012), and show that our degradation model provides a good fit. Then, we use the estimated model to conduct a simulation study to evaluate the performance of the optimal derating policies.

The PRONOSTIA experimental platform generates vibration monitoring data for ball bearing over its full operation life—from new to failure. The platform can apply a radial force on the bearing to simulate a high workload to accelerate the degradation, which can lead to bearing failure within a few hours. This experiment is an accelerated degradation test (Hong and Ye, 2017). The platform consists of three modules: a rotating module, a load module, and a data acquisition module. The rotating module contains a ball bearing with 13 rolling elements. The load module uses a pneumatic jack to generate a force on the external ring of the bearing. In addition, the rotation speed and the torque inducted to the bearing can also be controlled. The data acquisition module includes two vibration sensors installed on the external race of the bearing—one measures the acceleration in the vertical direction and the other in the horizontal direction. Each sensor takes a measurement every 10 seconds, and each measurement lasts 0.1 seconds with sampling frequency 25.6 kHz. Therefore, each sensor generates 2,560 data points every 10 seconds. The platform has generated six run-to-failure datasets. We will use the “Bearing 1.1” dataset in our analysis, which is generated under a constant load (4,000 N) and constant rotation speed (1,800 rpm).

Although the load and rotation speed are higher than the normal operating settings, data from this experiment can still be used for evaluating derating policies because the range of load variation $[\underline{u}, \bar{u}]$ is below the load used in the experiment.

6.1. Degradation Data Analysis

The analyses of vibration monitoring data often involve both the time and frequency domains. The ball-passing frequency is a critical feature for condition monitoring of rolling bearings. It is the rate at which a ball element passes a defect in the inner or outer race. The ball-passing frequency depends on the rotation speed RPM and the number of rolling elements in the bearing $z$ . In the experiment for Bearing 1.1 in the dataset, $RPM = 1, 800$ and $z = 13$ , so the ball-passing frequency is $f_{0} = 0.5 \times z \times RPM / 60 = 195 Hz$ .

We first perform fast Fourier transform on vibration signal in the horizontal direction and then aggregate the amplitudes from 10 equally spaced frequency intervals $(k f_{0} - 25, k f_{0} + 25) Hz$ , $k = 1, \dots, 10$ . These intervals are centered at the harmonic frequencies of ball-passing frequency, allowing us to focus on the vibration of the bearing. Repeating the procedure for all recordings over time produces the degradation signal shown in Figure 3(c). The signal is updated every 10 sec. Figure 3(b) shows that the spectrum of the signal evolves over time: as the bearing deteriorates, a greater number of high-frequency harmonics are excited.

Figure 3.

(a) The raw horizontal-vibration signal of Bearing 1.1 in a PRONOSTIA accelerated run-to-failure test (rotation speed 1,800 rpm, radial torque 4,000 N), (b) the low-frequency spectrogram of the raw signal calculated at every 0.1 seconds, (c) the degradation signal derived from the spectrogram, and (d) the log transform of the degradation signal.

Note that the degradation signal exhibits a nonlinear trend suggesting accelerating deterioration. It is also heteroscedastic: the variance increases with the degradation level. A common approach to stabilize the variance of a heteroscedastic time series is to apply a nonlinear transformation—such as log transform—to the original observation. After applying log transform to the degradation signal $x_{n}$ , the resulting signal $\log (x_{n})$ still exhibits heteroscedasticity. Thus, we choose the double log transform $y_{n} = \log (\log (x_{n}))$ , which leads to a homoscedastic time series $y_{n}$ shown in panel (d) of Figure 3. Note that this transformation also removes the nonlinear trend in panel (c).

Next, we validate and calibrate the degradation model. We will fit the model $y_{n} = y_{0} + n d_{A} + \sum_{t = 1}^{n} ϵ_{t}$ to the time series shown in Figure 3(d). We first verify that the data satisfy all three assumptions of the model. First, the model assumes independent increment. That is, the differenced time series ${Δ y_{n} = y_{n} - y_{n - 1} = d_{A} + ϵ_{n}, n = 1, \dots, T}$ should have no serial correlation. This is confirmed by inspecting the sample autocorrelation function. Second, $Δ y_{n}$ follow a normal distribution with constant variance (Shapiro-Wilk test, $p$ -value 0.57). Third, we performed the Breusch-Pagan test ( $p$ -value 0.41) to verify that the variance of $Δ y_{n}$ remains constant over time.

We use the radial force applied on the tested bearing to represent the testing load $u_{A} = 400$ . The average degradation rate under the testing $d_{A} = 0.00705$ and the noise variance $σ^{2} = 0.00964$ are from the data (the confidence intervals for both estimates are < 0.00001 in width). Based on these estimates, we model the degradation signal shown in Figure 3(c) as follows:

\begin{aligned} x_{n} & = \exp (\exp (0.00705 n + \sum_{t = 1}^{n} ϵ_{t})), \\ ϵ_{t} & \sim N (0, 0.00964) . \end{aligned}

6.2. Comparing Control Policies

Next, we conduct simulation studies to evaluate the performance of optimal policies against the Bayesian myopic policy. Bayesian myopic policy sets the workload as $u_{n + 1}^{M} (τ_{n}, μ_{n}, ξ_{n})$ , and thus focuses purely on exploitation. But it still uses new observations gathered passively to update the parameter estimation. In these experiments, we consider three levels of workload: $U = {100, 200, 300}$ corresponding to the “low,” “medium,” and “high” load settings. The planning horizon is $T = 100$ and the reward function is given by the following equation:

r (u, η) = {\begin{cases} r u, & e^{η} \leq 10, \\ - 1, & e^{η} > 10. \end{cases}

The prior belief for the unknown parameter is

N (τ_{0}, μ_{0})

Tables 1 and 2 compare the performances of optimal and myopic policies under different prior and the reward rate. It can be observed from both tables that the relative gap between the optimal and myopic policies can be large, ranging from $8.5 %$ to $20.5 %$ in all cases tested. Further, the gap is decreasing in the precision of the prior. That is, the performance gain is higher when the parameter is more uncertain, which is expected because, as a passive learning policy, the myopic policy has a slower rate of learning. We also observe that the average failure times are longer under optimal policies. This is because the optimal workload is always less than the myopic load, so the system always deteriorates more slowly under optimal policies. We observe that the optimal policy can yield a significantly higher output while extending the useful life of a system, both of which have important practical value. We also compared the computation times of the proposed algorithm, which is based on the fast Gauss transform, with the standard algorithm. The standard algorithm requires a computation time of 10.2–13.5 hours on a regular laptop computer with 2.4 GHz Quad-Core Intel Core i5 processor. The proposed algorithm requires only 0.3–0.45 hours on the same computer.

Table 1.
Comparison between Bayesian optimal policy and myopic policies for $r = 0.003$ .

$τ_{0}$ Reward (myopic policy) Reward (optimal policy) Gap (%) Failure time (myopic) Failure time (optimal)

1 0.1773 0.2136 20.4573 30.2760 34.4300

2 0.1971 0.2254 14.3704 25.8970 29.6920

3 0.2035 0.2299 12.9875 28.0480 28.9760

4 0.2044 0.2309 12.9629 28.0070 29.6570

5 0.2034 0.2304 13.2394 27.5220 30.8190

6 0.2068 0.2335 12.9042 26.8840 33.5220

7 0.2102 0.2374 12.9006 26.9090 33.4810

8 0.2099 0.2370 12.8948 27.0630 33.6920

9 0.2088 0.2360 13.0284 26.7720 34.2320

$τ_{0}$	Reward (myopic policy)	Reward (optimal policy)	Gap (%)	Failure time (myopic)	Failure time (optimal)
1	0.1773	0.2136	20.4573	30.2760	34.4300
2	0.1971	0.2254	14.3704	25.8970	29.6920
3	0.2035	0.2299	12.9875	28.0480	28.9760
4	0.2044	0.2309	12.9629	28.0070	29.6570
5	0.2034	0.2304	13.2394	27.5220	30.8190
6	0.2068	0.2335	12.9042	26.8840	33.5220
7	0.2102	0.2374	12.9006	26.9090	33.4810
8	0.2099	0.2370	12.8948	27.0630	33.6920
9	0.2088	0.2360	13.0284	26.7720	34.2320

Table 2.

Comparison between Bayesian optimal policy and myopic policies for $r = 0.001$ .

$τ_{0}$	Reward (myopic policy)	Reward (optimal policy)	Gap (%)	Failure time (myopic)	Failure time (optimal)
1	0.1663	0.1930	16.0472	21.2630	22.9320
2	0.1889	0.2079	10.0247	18.5450	19.9520
3	0.1941	0.2116	9.0237	17.2010	21.7180
4	0.1974	0.2134	8.1487	17.1100	26.2790
5	0.1995	0.2178	9.1978	16.9550	28.5060
6	0.2002	0.2169	8.3145	14.8040	29.7750
7	0.1999	0.2167	8.3915	14.7500	30.2090
8	0.2015	0.2189	8.6478	14.7050	30.2080
9	0.2001	0.2172	8.5485	14.5460	30.7020

7. Concluding Remarks

Systems approaching the end of their service life are often derated to prolong their remaining useful life. While historical data can be used to estimate the system’s deterioration under the standard operating load, it does not provide the full information about deterioration rates under alternative workloads. Therefore, experimenting with various loads is essential to infer the pd-relationship as the system operates.

In this article, we developed a decision model that allows for adaptive learning and optimization of workload. The model can capture certain nonlinear relation between degradation and workload, and is validated empirically with vibration monitoring data. We show that optimizing the workload under an unknown pd-relationship requires active learning, as distinct workloads correspond to different learning modes. As such, the DM must account for the complex interplay among performance, deterioration, and information.

We formulate this problem as a POMDP, in which the state is the hyper-parameters of the posterior belief. We characterized the structure of optimal policies. A key insight is that the optimal workload is always less than the myopic load, except in the last period where the two coincide. This structural insight greatly simplifies the load optimization process. It suggests that it is optimal to adjust the myopic load to a lower value. To calculate the exact amount of adjustment, we reformulate the value-to-go function as a convolution transform with a Gaussian kernel, and use an efficient algorithm—the fast Gauss transform—to evaluate this function. Simulation studies based on estimations from real data suggest that failure to learn the pd-relationship online can lead to significant performance loss, especially when the reward rate is high. Further, the optimal active learning can significantly outperform the myopic learning policy, which is a passive learning approach. We also find that adjusting the myopic load by a carefully chosen constant can yield good approximations to the optimal policies.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478241305339 - Supplemental material for Learning to Balance the Performance and Deterioration of Aging Systems Through Derating

Supplemental material, sj-pdf-1-pao-10.1177_10591478241305339 for Learning to Balance the Performance and Deterioration of Aging Systems Through Derating by Jue Wang in Production and Operations Management

Footnotes

Acknowledgments

The author thanks the department editor (Panos Kouvelis), the anonymous senior editor, and three anonymous referees for constructive feedback.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant RGPIN-2019-05671.

ORCID iD

Jue Wang

Supplemental Material

Supplemental material for this article is available online ().

Notes

How to cite this article

Wang J (2024) Learning to Balance the Performance and Deterioration of Aging Systems Through Derating. Production and Operations Management 34(7): 1743–1758.

References

Abdul-Malak

Kharoufeh

Maillart

(2019) Maintaining systems with heterogeneous spare parts. Naval Research Logistics 66(6): 485–501.

Barreras

Raj

Howey

(2018) Derating strategies for lithium-ion batteries in electric vehicles. In: IECON 2018-44th annual conference of the IEEE industrial electronics society, pp.4956–4961. IEEE.

Batun

Maillart

(2012) Reassessing tradeoffs inherent to simultaneous maintenance and production planning. Production and Operations Management 21(2): 396–403.

Bech

Hasager

Bak

(2018) Extending the life of wind turbine blade leading edges by reducing the tip speed during extreme precipitation events. Wind Energy Science 3(2): 729–748.

Cassady

Kutanoglu

(2005) Integrating preventive maintenance planning and production scheduling for a single machine. IEEE Transactions on Reliability 54(2): 304–309.

Chen

Z-S

Xiang

Zhang

(2015) Condition-based maintenance using the inverse Gaussian degradation model. European Journal of Operational Research 243(1): 190–199.

Conrad

McClamroch

(1987) The drilling problem: A stochastic modeling and control example in manufacturing. IEEE Transactions on Automatic Control 32(11): 947–958.

de Jonge

Scarf

(2020) A review on maintenance optimization. European Journal of Operational Research 285(3): 805–824.

Doksum

Hbyland

(1992) Models for variable-stress accelerated life testing experiments based on Wiener processes and the inverse Gaussian distribution. Technometrics 34(1): 74–82.

10.

Drent

Arts

Kapodistria

(2023) Real-time integrated learning and decision making for cumulative shock degradation. Manufacturing & Service Operations Management 25(1): 235–253.

11.

Elwany

Gebraeel

Maillart

(2011) Structured replacement policies for components with complex degradation processes and dedicated sensors. Operations Research 59(3): 684–695.

12.

Faiz

Ghazizadeh

Oraee

(2015) Derating of transformers under non-linear load current and non-sinusoidal voltage—An overview. IET Electric Power Applications 9(7): 486–495.

13.

Farias

Van Roy

(2010) Dynamic pricing with a prior on market response. Operations Research 58(1): 16–29.

14.

Flory

Kharoufeh

Gebraeel

(2014) A switching diffusion model for lifetime estimation in randomly varying environments. IIE Transactions 46(11): 1227–1241.

15.

Greengard

Strain

(1991) The fast Gauss transform. SIAM Journal on Scientific Computing 12(1): 79–94.

16.

Harrison

Keskin

Zeevi

(2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science 58(3): 570–586.

17.

Hong

(2017) When is acceleration unnecessary in a degradation test? Statistica Sinica 27(3): 1461–1483.

18.

Kharoufeh

Cox

Oxley

(2013) Reliability of manufacturing equipment in complex environments. Annals of Operations Research 209(1): 231–254.

19.

Kim

Makis

(2013) Joint optimization of sampling and control of partially observable failing systems. Operations Research 61(3): 777–790.

20.

Lariviere

Porteus

(1999) Stalking information: Bayesian inventory management with unobserved lost sales. Management Science 45(3): 346–363.

21.

Lee

Lapira

Yang

Kao

(2013) Predictive manufacturing system—Trends of next-generation production systems. In: 11th IFAC workshop on intelligent manufacturing systems, pp.150–156. São Paulo, Brazil: The International Federation of Automatic Control.

22.

Song

J-S

Zhu

(2008) Analysis of perishable-inventory systems with censored demand data. Operations Research 56(4): 1034–1038.

23.

Maillart

(2006) Maintenance policies for systems with condition monitoring and obvious failures. IIE Transactions 38: 463–475.

24.

Makis

(2008) Multivariate Bayesian control chart. Operations Research 56: 487–496.

25.

MikeHolt.com (2015) Mike holt’s forum. https://forums.mikeholt.com/threads/transformer-underloading-and-life.117809/.

26.

Moré

(1997) Global continuation for distance geometry problems. SIAM Journal on Optimization 7(3): 814–836.

27.

Nectoux

Gouriveau

Medjaher

Ramasso

Chebel-Morello

Zerhouni

Varnier

(2012) PRONOSTIA: An experimental platform for bearings accelerated degradation tests. In: IEEE international conference on prognostics and health management, PHM’12, 2012, pp.1–8. CO, USA: IEEE Denver.

28.

Panagiotidou

Tagaras

(2010) Statistical process control and condition-based maintenance: A meaningful relationship through data sharing. Production and Operations Management 19(2): 156–171.

29.

PARC (2022) A vision for the self-aware machine. https://www.parc.com/blog/a-vision-for-the-self-aware-machine/.

30.

Porta

Vlassis

Spaan

MTJ

Poupart

(2006) Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research 7: 2329–2367.

31.

Powell

Ryzhov

(2012) Optimal Learning. Wiley: Wiley Series in Probability and Statistics.

32.

Rishel

(1989) Controlled wear processes: Modeling-optimal control. In: Proceedings of the 28th IEEE conference on decision and control, pp.724–726. IEEE.

33.

Ruan

Barreras

Engstrom

Merla

Millar

(2023) Lithium-ion battery lifetime extension: A review of derating methods. Journal of Power Sources 563: 232805.

34.

Ryzhov

Powell

(2011) Bayesian active learning with basis functions. 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, pp.143–150.

35.

Schweitzer

Seidmann

(1991) Optimizing processing rates for flexible manufacturing systems. Management Science 37(4): 454–466.

36.

Skandari

Shechter

(2021) Patient-type Bayes-adaptive treatment plans. Operations Research 69(2): 574–598.

37.

Sloan

Shanthikumar

(2000) Combined production and maintenance scheduling for a multiple-product, single-machine production system. Production and Operations Management 9(4): 379–399.

38.

Sloan

Shanthikumar

(2002) Using in-line equipment condition and yield information for maintenance scheduling and dispatching in semiconductor wafer fabs. IIE Transactions 34(2): 191–209.

39.

Sun

Saxena

Pecht

(2018) Derating guidelines for lithium-ion batteries. Energies 11(12): 3295.

40.

Tagaras

Nenes

(2007) Two-sided Bayesian

\bar{X}

control charts for short production runs. In: Colosimo BM and del Castillo E (eds) Bayesian Process Monitoring Control and Optimization. Boca Raton, FL: Chapman and Hall/CRC, pp.167–186.

41.

Tagaras

Nikolaidis

(2002) Comparing the effectiveness of various Bayesian

\bar{X}

control charts. Operations Research 50: 878–888.

42.

Taylor

(1906) On the Art of Cutting Metals. vol. 23. American society of mechanical engineers.

43.

Tian

Han

Powell

(2022) Adaptive learning of drug quality and optimization of patient recruitment for clinical trials with dropouts. Manufacturing & Service Operations Management 24(1): 580–599.

44.

uit het Broek

MAJ

Teunter

de Jonge

Veldman

Van Foreest

(2020) Condition-based production planning: Adjusting production rates to balance output and failure risk. Manufacturing & Service Operations Management 22(4): 792–811.

45.

Van Oosterom

Peng

van Houtum

G-J

(2017) Maintenance optimization for a Markovian deteriorating system with population heterogeneity. IISE Transactions 49(1): 96–109.

46.

Wang

(2021) Optimal Bayesian demand learning over short horizons. Production and Operation Management 30(4): 1154–1177.

47.

Wang

Lee

C-G

(2015) Multistate Bayesian control chart over a finite horizon. Operations Research 63(4): 949–964.

48.

Whitmore

Schenkelberg

(1997) Modelling accelerated degradation data using wiener diffusion with a time scale transformation. Lifetime Data Analysis 3(1): 27–45.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.26 MB