Abstract
A common strategy of extending the lifetime of an aging system is to reduce its workload below the normal operating level, a practice known as derating. While derating can slow the deterioration process, it often comes at the expense of reduced performance. Thus, derating involves a trade-off between performance and deterioration. Central to the optimal derating strategy is the relationship between deterioration and workload, also referred to as the pd-relationship. In practice, however, this relationship is rarely known a priori. We consider the workload optimization when the pd-relationship can be adaptively learned through sequential experimentation, or active learning. We show that the workload not only influences the performance and deterioration but also controls the speed of learning. The decision-maker must therefore account for the complex interplay between performance, deterioration, and information in real time. We formulate this problem as a partially observable Markov decision process and characterize the optimal policy. A key structural insight is that the optimal workload is always less than the myopic load. We further propose an efficient algorithm based on the fast Gauss transform to compute the optimal policies. The model is validated with vibration data and the performance of the optimal policy is compared against several heuristic policies.
Keywords
Introduction
Almost all systems deteriorate with usage and experience a reduction in performance. The rate of deterioration often depends on the system’s workload, such as the production rate of a manufacturing system, the rotation speed of a wind turbine, or the loading of a power transformer. A high workload boosts the immediate performance but exerts more stress on the system and accelerates it deterioration, causing a reduction of its future performance. The setting of workload should therefore balance the short-term gain in performance and the long-term loss due to deterioration. This important problem has received much attention in the reliability and maintenance literature, for example, Conrad and McClamroch (1987), Schweitzer and Seidmann (1991), Sloan and Shanthikumar (2000), Cassady and Kutanoglu (2005), and uit het Broek et al. (2020).
To understand the practical challenges of workload control, refer to the following discussions on a power system forum (MikeHolt.com, 2015): In my workplace, we had a good discussion on how to extend the life of a transformer. A question was raised on the effect on the transformer lifespan between 100% loading and 90% or 60%. Many argued that the 60% and 90% loading would extend the life of the transformer, with the 60% adding more life-years to the transformer. A small number of us argued that 60% or 90% loading will have the same effect compared to 100% loading
One response to this posting is: But at 60% loading, the relative efficiency is going to drop, meaning you will likely spend way more on wasted energy over the life of that transformer than the value of the extra life you attain. At 90%, the difference in efficiency will not be that great, plus you will be able to expect longer life.
The discussion above is about the optimal strategy of derating, which is the practice of reducing the workload below the normal operating level in order to extend the remaining useful life of the system. As a system deteriorates, it will often enter a new regime characterized by faster-than-normal degradation, in which the normal operating load is no longer optimal. By operating the transformer at a reduced load in this regime, less heat will be produced within the transformer, thereby reducing the rate of insulation degradation (Faiz et al., 2015). A recent vision letter from Palo Alto Research Center states that “(t)he ability of a machine to not only report its future failures, but also adjust its own functions according to its health condition is integral” (PARC, 2022). Such capabilities are also the defining features of “self-aware” or “self-configure” machines (Lee et al., 2013).
An everyday example of derating is setting the phone into the low-power mode when the battery gets low, which extends the battery life at the expense of reduced performance. This strategy has recently garnered significant interest from electric vehicle makers, who can use derating to prolong the life of Lithium-ion batteries, which are among the most expensive components of electric vehicles (Barreras et al., 2018; Ruan et al., 2023; Sun et al., 2018). Derating is also a common strategy for operating wind turbines approaching the end of service life (Bech et al., 2018). The operators can reduce the mechanical stress on the turbine by adjusting the angle of the blades and thereby lowering the amount of power that the turbine can generate. Derating is an attractive strategy because it can be implemented at minimal operation cost, but it should be implemented carefully by considering the trade-off between reduced performance and extended lifetime.
A common assumption in the existing literature is that the relationship between workload and deterioration is known, for example, uit het Broek et al. (2020). We shall follow the terminology of uit het Broek et al. (2020) and call it the pd-relationship, which stands for the production-deterioration relationship. But in many situations, the decision-maker (DM) is uncertain about the pd-relationship, namely, uncertain about how the system would deteriorate under different loads. This is well exemplified by the discussion quoted above about power transformers; the professionals are uncertain as to how the load affects deterioration. Moreover, no historical data is available for estimating the degradation under workloads that have not been used before.
It is natural to experiment with different workloads while the system is operating and observe how the system deteriorates differently under various workloads. This would allow the DM to estimate the pd-relationship with greater accuracy as data accumulate. It is possible to implement this online learning strategy in systems that are equipped with sensors, so that the workload can be adaptively adjusted in response to new data.
Generally speaking, different workloads can provide different levels of information about the unknown pd-relationship. For example, operating the system at a load significantly below its normal operating load, such as 60%, often provides more information about the pd-relationship than a workload marginally below the standard, such as 90%. Thus, each workload can be seen as a distinct learning mode for the pd-relationship. Choosing among different workloads is thus effectively an active learning process in which the DM chooses how much information to acquire at a given time.
The goal of this article is to develop a sequential decision model that adaptively learns the optimal workload by sequentially experimenting with various workloads. In this model, we account for both the uncertainty about pd-relationship and the ambiguity surrounding the actual degradation state. By changing the workload, the DM can control (i) performance, (ii) deterioration, and (iii) information. Previous studies have examined the joint control of performance and deterioration. This paper extends this literature by considering the control of information acquisition. The DM faces a complex interplay between performance, deterioration, and information.
We will consider partially observable systems, whose true degradation state cannot be directly observable. Maintenance practitioners are well acquainted with the false negatives and false positives associated with sensor data, as most sensors are subject to measurement errors. For example, a drop in temperature might cause the activation of a low-pressure alert suggesting a tire leakage when the tire is intact (false positive). Conversely, an increase in temperature might fail to trigger the warning system even when there is an actual tire leakage (false negative). A fully observable system can be analyzed as a special case of the partially observable system.
Relevant Literature
This article lies at the intersection of three streams of research in the maintenance decision-making literature: (1) dynamic load control, (2) parameter learning, and (3) partially observable deteriorating systems. We will review relevant studies in each stream.
Dynamic Load Control
In light of the principle “prevention is better than cure,” it is natural to consider reducing the workload of a machine to slow its deterioration. The idea of derating dates at least to the work of Taylor (1906) on the study of tool wear under different cutting speed.
Conrad and McClamroch (1987) developed a diffusion-threshold process to model the deterioration of tool wear. In their model, the wear evolves according to a Brownian motion whose drift depends on the workload. The machine fails as soon as the wear exceeds a threshold and is replaced by a new one. The DM decides the workload setting at each time points as well as the replacement times. The authors studied the open-loop control policy and one-step look ahead feedback control policy. Our degradation model also involves a Brownian motion process with varying drift term, but we use it to model the noisy observations from the sensor. The true degradation, which determines the performance and failure time, is not directly observable in our model. Rishel (1989) noted that the scalar Brownian motion can generate negative increments and thus is not suitable for modeling wear that only accumulates over time. Our model does not suffer from this limitation because we use Brownian motion to model the noisy observations rather than the actual wear. The latter is represented by the cumulative drift in our model and it is always nondecreasing in time. Furthermore, we consider online learning of the model parameters, which are assumed to be known by Conrad and McClamroch (1987) and Rishel (1989).
The seminal work of Conrad and McClamroch (1987) inspired the optimization of production rate in manufacturing systems. For example, Schweitzer and Seidmann (1991) studied the optimization of processing rate in multi-machine manufacturing systems under the queueing network framework. There is a stream of research on the joint scheduling of production and maintenance, which also address the trade-off between productivity and deterioration (Batun and Maillart, 2012; Sloan and Shanthikumar, 2000, 2002). In this literature, the DM can choose which product to process at a machine subject to deterioration. The products differ in terms of the reward and impact on the state of the machine. However, a common assumption is that the state of the machine is fully observable and the parameters of the system are known, which differ from the present article.
Recently, uit het Broek et al. (2020) studied the important problem of condition-based production planning. By assuming that the relation between production rate and deterioration rate (i.e., the pd-relationship) is known, the authors characterized the optimal control policies for a deterministic system and showed that similar structures hold in stochastic systems through extensive numerical studies. Our work differs from uit het Broek et al. (2020) in two major aspects: first, we consider the uncertainty regarding the pd-relationship and perform active learning of this relationship in a Bayesian fashion. Second, we consider partially observable stochastic systems, whose true degradation level is not directly observable and needs to be inferred from monitoring data. Another difference is that we jointly optimize the maintenance and workload decisions by accounting for their interdependency, whereas uit het Broek et al. (2020) assumed that maintenance is conducted at fixed times. For systems with adjustable workload, failure risk can be lowered through a load reduction, which may postpone the maintenance. Therefore, the optimal timing of maintenance depends on the workload history.
Partially Observable Deteriorating Systems
There is a large body of literature on maintenance optimization for partially observable deteriorating systems. See de Jonge and Scarf (2020) for a recent review. Partially observable stochastic deterioration is commonly modeled using hidden Markov model, cf. Maillart (2006) and Kim and Makis (2013). This literature is also closely related to the economic design of Bayesian control charts (Makis, 2008; Panagiotidou and Tagaras, 2010; Tagaras and Nikolaidis, 2002; Tagaras and Nenes, 2007; Wang and Lee, 2015). Both frameworks assume that the monitoring data are generated from a latent stochastic process representing the evolution of the true state. However, most existing studies focus on how to respond to the natural deterioration of the system (e.g., through replacement) rather than how to actively alter the course of deterioration, like we do in this article.
In this article, we propose a different framework to model partially observable systems. Our model is inspired by the Brownian motion process widely used in the degradation modeling literature (Conrad and McClamroch, 1987; Elwany et al., 2011). The difference is that we use Brownian motion to model the observations instead of the true degradation state. The latter is treated as a latent process in our model and can only be inferred from the observations. We argue that the model with latent state is more realistic because sensor data may not reveal the true state of the machine.
When the workload is adjustable, the remaining useful life of the product can be controlled and the decision involves the prognostics of the machine’s health condition in a nonstationary environment. The nonstationary deterioration caused by changing environment is studied by Kharoufeh et al. (2013) and Flory et al. (2014). But unlike these works, which consider deteriorations in an exogenous environment, our work consider deteriorations that can be endogenously controlled by the DM.
Online Parameter Learning
There is a fast-growing literature on maintenance optimization with Bayesian learning. The motivation is that the deterioration is often heterogeneous, with some units deteriorating faster than others due to unobserved idiosyncrasies. With online monitoring data, it is possible to learn the unique degradation pattern of each unit in real time and tailor the maintenance strategy toward an individual unit.
The online learning of partially observable systems with heterogeneity has been studied by Van Oosterom et al. (2017) and Abdul-Malak et al. (2019), where the heterogeneity is modeled in terms of discrete hidden states. Another related stream of studies consider learning the unknown parameters over a continuous space with respect to a conjugate prior. Elwany et al. (2011) considered a degradation system with unknown parameters. By assuming that the system will be replaced when the observed signal exceeds a pre-determined threshold, the authors characterized the optimal replacement policy. Chen et al. (2015) optimized the inspection interval in heterogenous degradation systems modeled by an inverse Gaussian process. Drent et al. (2023) studied the optimal stopping of shock degradation systems with unknown parameters. Skandari and Shechter (2021) considered the optimal control of fully observable Markov process with uncertain transition rates.
Unlike these studies, which decide whether to gather information in a period, we consider how much information to gather in the period. A distinguishing feature of our model is that the DM can change the mode of learning by choosing different workloads, with some loads yielding more information about the pd-relationship than others. In contrast, the existing literature on parameter learning (surveyed above) mostly consider a single workload, which corresponds to a single learning mode.
The problem of choosing among multiple alternative learning modes is generally referred to as active learning (Powell and Ryzhov, 2012; Ryzhov and Powell, 2011). Such problem arise in various applications including inventory management (Lariviere and Porteus, 1999; Lu et al., 2008) and demand learning (Farias and Van Roy, 2010; Harrison et al., 2012; Wang, 2021). A central dilemma in active learning is the trade-off between exploration and exploitation. In our setting, the deterioration of the machine gives rise to a dynamic process, in which the reward structure changes as the degradation accumulates. Tian et al. (2022) studied the adaptive clinical trials in which the DM needs to track both the posterior belief and system state, but in their application the system state is the number of patients in the system, which is fully observable. The complicating factor in our problem is that the DM cannot observe the true degradation level.
This article makes several contributions to the maintenance decision-making literature. First, we develop a dynamic load optimization model that can adaptively learn the pd-relationship. We show that the rate of learning can be adjusted by varying the workload. As such, the workload optimization is effectively an active learning problem, which has not been well studied in the maintenance literature. Second, we characterize the structure of the optimal policy. We prove that the optimal workload should be not greater than the Bayesian myopic load. To the best of our knowledge, no optimal policy has been established for learning the pd-relationship in a noisy environment. Third, we exploit the structural properties of optimal policies to develop an efficient algorithm, which is based on a novel application of the fast Gauss transform.
Model Formulation
We will first describe a dynamic degradation model and then formulate the dynamic derating problem. We consider a scenario in which a system has been operating under a constant workload
Dynamic Degradation Model
We set the time origin
Different workloads exert different stress on the system, which ultimately result in different degradation rates. The pd-relationship, namely, the relationship between the workload
We assume that the function
Although the actual degradation level
Combining (1) to (3), we obtain
Given the workload history
The reward function
Another example is
Keep in mind that the reward function

The influence diagram of the decision problem.
The model proposed in this article unifies several important degradation models in the literature and generalizes them to the dynamic environment. One example is the Arrhenius model, which is an empirical model describing how the failure time depends on the environmental stress such as the temperature. The relation between the failure time
Another closely related model is the exponential degradation model with Brownian errors. Elwany et al. (2011) considered the following model (equation (2) in their paper):
Optimal Derating Problem
We assume that the degradation rate under
Substituting (6) into (2), we obtain
Since the degradation rate is almost never negative, we assume that its minimum value is positive. That is,
From now on, we will focus on the derating decision problem, in which
The main challenge of this decision problem comes from the presence of unknown parameters. In the derating problem, the unknown parameters reduce to
Bayesian Active Learning
Given the workload history
Before proceeding, it is helpful to introduce some notations. Let
To simplify the expressions, we use the normal conjugate prior. Specifically, we choose the prior
The key to understanding active learning lies in equation (15). The precision of the posterior belief
Partially Observable Markov Decision Process (POMDP) Formulation
When selecting the workload, the DM needs to balance the immediate and future reward. A higher workload increases the immediate reward but accelerates the degradation and thus reduces the future reward. This task is further complicated by the unobservability of the degradation level
Based on results from the previous section, we use the triplet
The first term inside the maximum operator,
Given the action
The workload that maximizes the reward of continuation is denoted by
An optimal policy can be found by solving the optimality equations iteratively through backward induction. This requires enumerating through all combinations of action
Toward this end, we first reformulate the original optimality equations in the following proposition:
Define
All proofs in this article are given in the Electronic Companion. Since (18) is a reformulation of the original optimality equation (16), it can be used to compute optimal policies. This reformulation brings two computational advantages. First, the posterior predictive density (the third line of (18)) is normal, which has a closed-form expression and hence requires no numerical integration. Second, the integral with respect to
The optimality equation (18) can be rewritten as follows:
Given the state
Theorem 1 is important because it shows that the value-to-go function, which is the most computationally intensive function, can be recasted as a convolution transform. The convolution transform of a function
Note that an efficient algorithm exists for the convolution transform with Gaussian kernel. As such, we can utilize the efficient convolution algorithm to accelerate the computation of value function. The details will be presented inSection 5.2.
To understand this proposition, recall that the cumulative degradation up to period
This proposition suggests that the convolution transform in (19) can increase the value function. This property will be used to establish the following main theorem:
For any triple
This theorem characterizes the structure of optimal policies. The most important structural insight is that the optimal workload is always less than or equal to the myopic workload. An intuitive understanding is that a lower workload can not only slow the deterioration process and increase the remaining useful life, but is also more efficient at learning the unknown parameter. To the best of our knowledge, few structural results have been obtained with respect to the optimal adjustment of learning modes for deteriorating systems. Another useful insight from the theorem is that the optimal stopping rule has a threshold structure, which is a common observation in maintenance decision models. There is a belief-dependent threshold
The structural properties can be used to reduce the computation. For example, when searching for the optimal workload with respect to the state
Computation of Optimal Policies
In this section, we present an efficient value-iteration algorithm to compute the optimal policies. The algorithm is based on discretizing the joint space of

Illustration of the value iteration with fast Gauss transform.
The design of grid requires the characterization of the joint space. We know that the action space is
Computing the Value-to-Go Function
The most computationally intensive step of calculating
To implement the FGT, we first discretize the space of
The integral (22) can be approximated as follows:
The mathematical basis of the FGT is the Hermite expansion of the Gaussian function
Given
In the first step, we compute the term inside the brackets in the third line of (26). Let
In the second step, we calculate the term inside the curly brackets in the last two lines of (26). Let
In the third step, we utilize the outputs of the second step to compute the final result. For each
Theorem 2 can be used to further reduce the computation by eliminating suboptimal actions. Given the state variables
Note that the FGT algorithm evaluates
Case Study
We first analyze a real-world degradation dataset from the PRONOSTIA experimental platform (Nectoux et al., 2012), and show that our degradation model provides a good fit. Then, we use the estimated model to conduct a simulation study to evaluate the performance of the optimal derating policies.
The PRONOSTIA experimental platform generates vibration monitoring data for ball bearing over its full operation life—from new to failure. The platform can apply a radial force on the bearing to simulate a high workload to accelerate the degradation, which can lead to bearing failure within a few hours. This experiment is an accelerated degradation test (Hong and Ye, 2017). The platform consists of three modules: a rotating module, a load module, and a data acquisition module. The rotating module contains a ball bearing with 13 rolling elements. The load module uses a pneumatic jack to generate a force on the external ring of the bearing. In addition, the rotation speed and the torque inducted to the bearing can also be controlled. The data acquisition module includes two vibration sensors installed on the external race of the bearing—one measures the acceleration in the vertical direction and the other in the horizontal direction. Each sensor takes a measurement every 10 seconds, and each measurement lasts 0.1 seconds with sampling frequency 25.6 kHz. Therefore, each sensor generates 2,560 data points every 10 seconds. The platform has generated six run-to-failure datasets. We will use the “Bearing 1.1” dataset in our analysis, which is generated under a constant load (4,000 N) and constant rotation speed (1,800 rpm).
Although the load and rotation speed are higher than the normal operating settings, data from this experiment can still be used for evaluating derating policies because the range of load variation
Degradation Data Analysis
The analyses of vibration monitoring data often involve both the time and frequency domains. The ball-passing frequency is a critical feature for condition monitoring of rolling bearings. It is the rate at which a ball element passes a defect in the inner or outer race. The ball-passing frequency depends on the rotation speed RPM and the number of rolling elements in the bearing
We first perform fast Fourier transform on vibration signal in the horizontal direction and then aggregate the amplitudes from 10 equally spaced frequency intervals

(a) The raw horizontal-vibration signal of Bearing 1.1 in a PRONOSTIA accelerated run-to-failure test (rotation speed 1,800 rpm, radial torque 4,000 N), (b) the low-frequency spectrogram of the raw signal calculated at every 0.1 seconds, (c) the degradation signal derived from the spectrogram, and (d) the log transform of the degradation signal.
Note that the degradation signal exhibits a nonlinear trend suggesting accelerating deterioration. It is also heteroscedastic: the variance increases with the degradation level. A common approach to stabilize the variance of a heteroscedastic time series is to apply a nonlinear transformation—such as log transform—to the original observation. After applying log transform to the degradation signal
Next, we validate and calibrate the degradation model. We will fit the model
We use the radial force applied on the tested bearing to represent the testing load
Next, we conduct simulation studies to evaluate the performance of optimal policies against the Bayesian myopic policy. Bayesian myopic policy sets the workload as
Tables 1 and 2 compare the performances of optimal and myopic policies under different prior and the reward rate. It can be observed from both tables that the relative gap between the optimal and myopic policies can be large, ranging from
Comparison between Bayesian optimal policy and myopic policies for
.
Comparison between Bayesian optimal policy and myopic policies for
Comparison between Bayesian optimal policy and myopic policies for
Systems approaching the end of their service life are often derated to prolong their remaining useful life. While historical data can be used to estimate the system’s deterioration under the standard operating load, it does not provide the full information about deterioration rates under alternative workloads. Therefore, experimenting with various loads is essential to infer the pd-relationship as the system operates.
In this article, we developed a decision model that allows for adaptive learning and optimization of workload. The model can capture certain nonlinear relation between degradation and workload, and is validated empirically with vibration monitoring data. We show that optimizing the workload under an unknown pd-relationship requires active learning, as distinct workloads correspond to different learning modes. As such, the DM must account for the complex interplay among performance, deterioration, and information.
We formulate this problem as a POMDP, in which the state is the hyper-parameters of the posterior belief. We characterized the structure of optimal policies. A key insight is that the optimal workload is always less than the myopic load, except in the last period where the two coincide. This structural insight greatly simplifies the load optimization process. It suggests that it is optimal to adjust the myopic load to a lower value. To calculate the exact amount of adjustment, we reformulate the value-to-go function as a convolution transform with a Gaussian kernel, and use an efficient algorithm—the fast Gauss transform—to evaluate this function. Simulation studies based on estimations from real data suggest that failure to learn the pd-relationship online can lead to significant performance loss, especially when the reward rate is high. Further, the optimal active learning can significantly outperform the myopic learning policy, which is a passive learning approach. We also find that adjusting the myopic load by a carefully chosen constant can yield good approximations to the optimal policies.
Supplemental Material
sj-pdf-1-pao-10.1177_10591478241305339 - Supplemental material for Learning to Balance the Performance and Deterioration of Aging Systems Through Derating
Supplemental material, sj-pdf-1-pao-10.1177_10591478241305339 for Learning to Balance the Performance and Deterioration of Aging Systems Through Derating by Jue Wang in Production and Operations Management
Footnotes
Acknowledgments
The author thanks the department editor (Panos Kouvelis), the anonymous senior editor, and three anonymous referees for constructive feedback.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant RGPIN-2019-05671.
Notes
How to cite this article
Wang J (2024) Learning to Balance the Performance and Deterioration of Aging Systems Through Derating. Production and Operations Management 34(7): 1743–1758.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
