The improvement of surgical quality and the corresponding early detection of its changes is of increasing importance. To this end, sequential monitoring procedures such as the risk-adjusted CUmulative SUM chart are frequently applied. The patient risk score population (patient mix), which considers the patients’ perioperative risk, is a core component for this type of quality control chart. Consequently, it is important to be able to adapt different shapes of patient mixes and determine their impact on the monitoring scheme. This article proposes a framework for modeling the patient mix by a discrete beta-binomial and a continuous beta distribution for risk-adjusted CUSUM charts. Since the model-based approach is not limited by data availability, any patient mix can be analyzed. We examine the effects on the control chart’s false alarm behavior for more than 100,000 different scenarios for a cardiac surgery data set. Our study finds a negative relationship between the average risk score and the number of false alarms. The results indicate that a changing patient mix has a considerable impact and, in some cases, almost doubles the number of expected false alarms.
Improving the quality of health care, especially for surgical procedures, has become increasingly important. Monitoring methods from statistical process control, such as control charts, can support this task. One of the most widely used tools for this purpose is the risk-adjusted CUmulative SUM (RA CUSUM) chart developed by Steiner et al.1 It can quickly detect surgical performance changes by sequentially testing a log-likelihood ratio statistic adjusting for the perioperative risk of patients. In general, this risk is quantified by a risk scoring system where each patient is assigned an individual score. The resulting patient risk score population (patient mix) is subsequently used as a core component by the monitoring scheme. Since Steiner et al.1 presented the cardiac surgery data set (see the left panel in Figure 1), researchers have mainly used this particular patient mix in their studies. However, the data set has an inverse J-shape and represents only one empirical patient mix. This makes it difficult to consider and evaluate other scenarios. The paper’s main aim is to propose a flexible model-based approach for the patient mix to study the impact of changes on the number of false alarms of the monitoring scheme. Our modeling approach has high practical relevance as it overcomes the limitation to a single data set, allowing for investigations of different scenarios and more general conclusions.
Cardiac surgery data. Relative frequencies of risk scores from the observed patient mix and modeled probabilities.
The approach uses the beta-binomial and beta probability distribution to represent the patient mix on a discrete or continuous scale. Both the design and calibration of the RA CUSUM charts are simplified without losing their general insights. In addition to an identified or estimated risk model, knowledge of the patient risk distribution is required. For this purpose, it is much easier to set only a few model parameters of a valid probability distribution instead of collecting enough entries for all risk scores. Moreover, a modeling approach can reduce susceptibility to sampling errors due to artifacts or variability that may occur in the original data set, as shown in the left panel in Figure 1. Another motivation is the often encountered limited access to sensitive empirical data from the health care system due to data confidentiality. With our model-based approach, a patient mix can be transparently constructed by specifying the probability distribution and the following three parameters: maximum risk score and shape parameters and . Consequently, only this information needs to be provided to enable independent reproducibility of a patient mix.
Of broad interest are the impacts of individual RA components, i.e., the risk model or patient risk population, on the RA control charts’ characteristics. Inadequate choices or changes over time of these components can lead to increased number of false alarms. In practice, this often reduces the acceptance of the monitoring scheme. Previous studies, which took into account the variation in patient mix, examined only a small number of different scenarios. A key feature of our parametric modeling approach is the representation of various shapes of patient risk distributions by specific alterations of the model parameters, see Figure 2. In this way, a variety of patient risk distribution shapes can be accommodated, and unlike previous attempts, any number of scenarios can be evaluated.
Different shapes of modeled patient risk score distributions.
We conduct a sensitivity analysis for the patient mix and gain insights into the false alarm behavior for more than 100,000 different scenarios for several control chart designs. Additionally, we examine the ability and speed of the monitoring scheme to detect various out-of-control situations. This study finds a negative relationship between the average risk score and the number of false alarms. The results indicate that a changing patient mix has a considerable impact, and in some cases, almost doubles the number of expected false alarms. In addition, we find that the proposed discrete and continuous distributions allow the patient mix to be modeled comparably well. It is shown that both choices have similar effects on the RA control chart performance.
The remaining part of this article is organized as follows. Section 2 gives an overview of the existing literature on modeling the patient mix and the heuristic generation of patient risk distributions. Section 3 provides background information on the data set used in our study and describes our modeling proposal. Section 4 introduces the RA CUSUM chart.
Accuracy comparisons for the numerical methods and patient mix models applied are given in Section 5. Section 6 presents a comprehensive sensitivity analysis of the patient mix. A simulation study in Section 7 demonstrates potential challenges in applying RA monitoring schemes. Section 8 contains some concluding remarks and discusses possible extensions. Further technical details can be found in Appendices A-D.
Related literature
In the context of statistical process monitoring, there are already excellent reviews of risk-adjusted procedures.2–5 Therefore, the following overview focuses on existing modeling approaches for the patient mix and the distributions that have been used. It also reviews earlier attempts to obtain different subgroups of patient populations from empirical data heuristically.
The use of probability distributions to model the patient mix for risk-adjusted (RA) control charts has had limited utilization in the past. Therefore, researchers mainly draw samples from the empirical distribution of the data. However, some probability distributions have been utilized. Hussein et al.6 assumed that the risk factor that characterizes the patient mix follows the Poisson distribution. Although this distribution is suitable for integer-based risk scores, its flexibility and fit to the data is often limited. Other researchers assumed a continuous risk score. While the application of the uniform7,8 and the normal9,10 distribution is of more theoretical than practical interest, the exponential distribution11–13 with its single parameter enables at least the representation of an inverse J-shaped patient mix. More advanced approaches consider a beta distribution.14–16 With a bounded interval and the ability to represent most of the shapes mentioned above, this distribution currently provides the most flexible approach for modeling a patient risk mix. However, integer risk scores that are approximated by assuming a continuous variable do not fully reflect the underlying data. In this context, we will present alternatives that directly consider the discrete nature of the risk scores.
Several researchers have indicated that a change in patient risk distribution, which is usually assumed to be constant over time, can affect the control chart’s performance to monitor surgical performance.4,16–19, Previous studies, which took into account the variation in patient mix, often examined only a small number of different scenarios, which can be explained by limitations to simple manual manipulations or reclassifications of the used empirical data. Steiner et al. measured their RA approach’s sensitivity for the patient mix from two surgeons in their sample data with the most extreme patient mixes1 and for lowest risk and highest risk patients only.20 Chang21 divided the original data set into five risk categories and increased or decreased the number of patients within some categories by a fixed percentage to create new risk distributions of higher or lower risk. The author also obtained a series of risk distributions by combining a uniform or right-skewed distribution of patient numbers with different surgical risks. In their estimation error analysis for RA CUSUM, Jones and Steiner22 reduced the data set variability by examining only the lower end of the empirical risk distribution. They obtained the data for low-risk patients by restricting the sampling to risk scores . To generate different patient mix populations, Tian et al.18 combined some of the previous approaches. The authors created five risk distributions, the first two years’ data, the lower and the higher of risk scores, and risk scores of the patients corresponding to the first and sixth surgeon. Further studies mainly adopted the procedures of Tian et al.18 and Jones and Steiner.22 By reversing the order of the sample data’s predisposed risk distribution, Rossi et al.23 obtained more patients at high-risk and fewer patients at low-risk. Knoth et al.24 created two new patient mixes by removing patients without other health conditions and all patients below the median risk score. Only Loke and Gan16 considered a continuous distribution to explicitly model the patient mix and examine possible effects for a small number of scenarios.
In this article, we extend and combine ideas from Loke and Gan16 and Tian et al.18 We propose a framework that overcomes the shortcomings mentioned earlier for patient mix modeling and provides a comprehensive analysis of the patient mix’s impact. The following section outlines our modeling approach, which considers the continuous beta distribution, a discretized beta, and the discrete beta-binomial distribution.
The modeling approach
Data
For illustration purposes of our modeling approach, we use the cardiac surgery operations data set presented in Steiner et al.1 It consists of records from patients in a single UK Hospital between 1992 and 1998 for seven operating surgeons. Available variables include survival time of each patient after an operation measured in days, the operating surgeon, date of operation, some risk factors such as age, diabetes, gender or number of reoperation, and the integer-based Parsonnet risk score.25 However, in our approach for modeling the patient risk mix, we only use the Parsonnet score, as it already contains the influential risk factors to characterize the individual risk of a patient. Alternatively, other risk assessment systems for cardiac surgery4,26 can also be combined with our modeling approach. As a measure of outcome, we consider the 30-day mortality. Accordingly, the binary operation outcome is expressed as for survival and for a patient’s death. Following an established procedure in statistical process monitoring, the data is split into two segments. The first 2 years 1992/93 ( operations) of the data (Phase I), are used to set up the RA control chart. The five-year 1994/98 (Phase II) monitoring period is often used to illustrate the actual monitoring of surgical performances. Note that one of the surgeon’s clinical records (#4) has only been available since mid-1997; and, it is not part of the Phase I data.
The relative frequencies of the Parsonnet score for the Phase I data are displayed in Figure 3. It is visible that the patient mix exhibits an inverse J-shape and is right-skewed. A total of 95% of the operations were carried out on patients with a risk score below . It should be noted that there are some sparsely occupied risk scores in the remaining patients after . Furthermore, there are some prominent peak fluctuations visible (e.g. . A few of these peaks are likely related to the initial construction scheme and composition of the Parsonnet score. For example, a weight of three is added to the risk score for each of the risk factors: hypertension, morbid obesity, or diabetes. Also, factors like patient age or an unbalance of gender in the study population may cause some of these peaks. In summary, we can state that the patient mix shown in Figure 3 can be described by its discrete risk scores ranging from to (full model). Note that Paynabar et al.27 argued that the Parsonnet risk scores in this data set do not follow any known probability distribution. We will elaborate in the next sections on how the patient mix can be modeled by either a continuous or a discrete parametric probability distribution.
Cardiac surgery data set showing the relative frequencies of Parsonnet scores.
Beta distribution
The univariate beta distribution,28 referred to as beta, will be our initial choice to model the patient mix, since it was already used by some researchers. Furthermore, it has a bounded interval for the random variable and is flexible to adapt to different shapes of possible patient risk score populations. The probability density function (PDF) is given by
and the cumulative distribution function (CDF) is given by
where is the complete beta function and the generalized form is the incomplete beta function. The first two moments are defined as
To obtain parameter estimates for the beta distribution, a transformation of the integer risk scores into an interval is required. In this work, the entire data’s maximum risk score, , is used to rescale with to the interval midpoints. Then, the parameters and can be determined by the Method of Moments
with , , and the total number of patients . By applying (4), we estimate and . Furthermore, we propose the continuous beta distribution also for modeling in discrete form in order to account for the underlying original data structure. By dividing the beta distribution support into intervals of equal size and applying the CDF (2) with the estimated parameters and , we obtain the related probabilities for the discrete risk scores.
Beta-binomial distribution
As a second probability distribution for modeling the patient mix, we propose the beta-binomial distribution introduced by Skellam.29 It has similar modeling flexibility as the beta distribution but uses discrete values that by design fit the Parsonnet integer risk score. It is a compound distribution where Bin with trials and is a random variable with distribution beta. The probability function is given by
The first two raw moments are given as
The point estimates for and can be obtained by the Method of Moments
with , , and the total number of patients . Note that , in this case, is the patient mix’s average risk score because, unlike the beta distribution, no rescaling of the original data is required. Comparable to the beta distribution, we set the maximum Parsonnet score . Accordingly, (6) leads to a parameterization of the beta-binomial distribution with , and .
For a first evaluation of the model fit, we compare the modeled probabilities of the two discrete models: discrete beta() and beta-binomial() distribution with the relative frequencies of the original data. Figure 4 shows a graphical summary for 95% of the Phase I data (). Despite the peak value fluctuations mentioned, both models fit quite well with the inverse J-shape, , of the empirical data (full model).
Probabilities of discrete patient mix models and observed relative frequencies for 95% of the Phase I data.
RA CUSUM and average run length calculation
The CUSUM scheme,30 a sequential procedure for rapid detection of changes, can be modified to consider specific risk factors.31–34 One of the most widely applied methods for this purpose is the RA Bernoulli CUSUM chart developed by Steiner et al.1 It is based on the log-likelihood ratio score and uses a risk model in which for patient the risk score is used as an explanatory variable. The logistic regression model
is usually applied. For the Phase I data, we obtain the regression coefficients (Maximum Likelihood) and . Subsequently, the probability of death for a patient with Parsonnet score can be calculated by the inverse function
Following the approach of Steiner et al.,1 we compute the log-likelihood ratio statistic
using risk score in (8) and outcome and test the odds ratio under the null () and alternative Hypothesis (). For , can be simplified to
The CUSUM statistics
give a signal when the upper or lower control limit is exceeded. The upper-sided CUSUM chart signals, if and the lower-sided if, . Each of the two CUSUM statistics (10), (11) can be designed and used separately to detect a specific shift in the odds ratio using in (9), for example, to detect a deterioration () or an improvement () in surgical performance. However, usually, a two-sided design is used. The control limit is generally chosen based on a performance measure. In this study, we consider the Average Run Length (ARL). It is defined as the expected number of patients until a signal is given. When the surgical process is in-control, the ARL referred to as should be high; otherwise, there would be too many false alarms. If an assignable cause is present and the process is out-of-control, the ARL referred to as should be low to signal the change in the process quickly. In the following, we consider three different methods to approximate the ARL of RA CUSUM schemes.
The first method is the Monte Carlo simulation. It allows the determination of the control charts run-length under the assumptions of both a discrete and continuous distribution of the patient mix. However, especially for large ARL values, it requires a considerable amount of time to achieve a high precision level.2,24 Therefore, it is employed in this study only to verify the accuracy of the numerical methods used.
A second established method for calculating run-length properties is the Markov chain approach. It exploits the Markov property and can be applied to either a continuous or discrete random variable. The main idea is to discretize the CUSUM interval by subintervals into a finite state space.35 The transition probabilities representing the possible changes within the control chart are combined into a transition matrix. The ARL can be computed by manipulating the matrix and solving a linear system of equations.2 The log-likelihood ratio score’s CDF derived in Appendix A is used to determine the continuous case’s transition probabilities. For the discrete case, a finite state space is generated by multiplying the irrational log-likelihood ratio scores (9) and the control limit by a large number (scaling parameter), followed by appropriate rounding to the nearest integer. A reduction of the approximation error can be achieved by increasing the number of subintervals or the scaling parameter. However, this leads to an increased matrix dimension, which corresponds to increased computational complexity. Further technical details on the Markov chain approach can be found in Appendix C.
As a third method, applicable only to the continuous case, we utilize an integral equation approach to characterize the . The basic idea is to consider a Fredholm integral equation of the second kind. It is solved numerically by the collocation method using Chebyshev polynomials of degree . The degree of the polynomials determines the size of the linear system of equations. Thus, a larger increases both the matrix dimension and the overall accuracy of the method. In some cases, the function may not be smooth over the entire CUSUM continuation region, leading to considerable inaccuracies and instabilities in the results. An extension, piece-wise collocation, can improve the approximation behavior by appropriately dividing the interval into subintervals before applying the actual collocation. However, since this procedure can increase the approximation’s stability and accuracy, it also features an enlarged matrix dimension of the linear system of equations. Further technical details on the collocation procedures can be found in Appendix B.
In the next section, the ARL approximation accuracy and stability of the individual methods, taking into account our modeling proposal’s distributions, are discussed and compared.
ARL approximation accuracy
Since our calculations are mainly based on numerical methods, we investigate the ARL approximation accuracy before further analysis.2 Webster and Pettitt7 have shown that factors such as degree of discretization of the CUSUM interval , ARL size, and patient mix can affect the ARL’s approximation stability. Moreover, this investigation is conducted because the accuracy of results reported in the literature for the same setup varied widely.24
We follow Steiner et al.’s1 initial setup of the RA CUSUMs. These are two one-sided control charts, one to detect a shift in performance by doubling the odds ratio () and one by halving the odds ratio (). It involves the Phase I data of the first 2 years, including operations, a risk model (7) with regression coefficients and and control limits for the upper and for the lower RA CUSUM. The choice of these control limits leads to large in-control ARL values, which, due to the desired accuracy, result in large linear systems of equations expressed by large matrix dimensions and are therefore computationally intensive. However, numerical methods based on the Markov chain and the piece-wise collocation method can quickly compute the ARL. Incrementally increasing the resulting system dimension allows us to check the numerical approximations’ accuracy and stability for each setup: approximation method, patient mix model, and chart to detect deterioration or improvement. Additionally, all numerical methods are validated by Monte Carlo simulations with replications.
Resulting ARL approximations for detecting deterioration are shown in Figure 5 and for improvement in Figure 6. The x-axis refers for each method to the size of the matrix dimensions used to solve the linear system of equations. It expresses the numerical effort required to achieve comparable accuracy and stability, regardless of the approximation method. In both Figures 5 and 6 (top left panel), we observe a divergence between the results reported by Steiner et al.1 and our ARL results for the full-score model. This is mainly related to the Markov chain’s matrix dimensions and rounding scheme. In Steiner et al., the ARL was computed using a simple rounding scheme and a Markov chain configuration with a matrix dimension of . This results in an ARL equal to around for each chart. In this work, however, we employ a pairwise rounding scheme that provides more stable ARL approximations and is applied up to a matrix dimension of 80,000. The resulting ARL values are and , respectively. While an accuracy issue was first mentioned in Tian et al.,18 a detailed discussion is given in the Appendix in Knoth et al.24 Comparing the ARL approximation stability for of the full model with discrete beta and beta-binomial in the upper row of Figure 5, we notice that all three methods behave similarly. They approach a slightly different level for matrix dimensions less than 20,000 relatively quickly. The lower row in Figure 5 compares approximation results for methods assuming a continuous patient mix modeled with the beta distribution. The piece-wise collocation method provides the most stable approximations. Polynomials of the degree less than leading to matrix dimensions less than are sufficient to achieve a reasonable accuracy.
In-control ARL approximation for detecting deterioration () with control limit for different methods and patient mix models (). (Upper row) Discrete risk scores, computations by Markov chain approach for patient mix based on (top left) empirical data, (top) beta-binomial and (top right) discrete beta. (Lower row) Continuous risk scores modeled with beta and approximated by (bottom left) Markov chain, (bottom) piece-wise collocation and (bottom right) full Collocation. Superimposed are Monte Carlo simulations with replications () and three standard errors ().
In-control ARL approximation for detecting improvement () with control limit for different methods and patient mix models (). (Upper row) Discrete risk scores, computations by Markov chain approach for patient mix based on (top left) empirical data, (top) beta-binomial and (top right) discrete beta(). (Lower row) Continuous risk scores modeled with beta and approximated by (bottom left) Markov chain, (bottom) piece-wise collocation and (bottom right) full collocation. Superimposed are Monte Carlo simulations with replications () and three standard errors ().
For the Markov chain method, a funnel-shaped approximation of the ARL is observed, with greater instability in the approximation behavior for smaller matrix dimensions. This suggests that a high degree of discretization of the CUSUM interval, corresponding to a large matrix dimension, is necessary to achieve the desired accuracy. The full collocation method does not approach a stable approximation, supporting the necessity of using one of these other two methods. For the detection of improvement, , shown in Figure 6, all three discrete models (full-score, beta-binomial, and discrete beta) show a similarly smooth approximation behavior for the ARL as for . Additional investigations for the numerical methods show that the beta-binomial distribution also provides stable ARL approximations for other model parameterizations. However, we observe an unstable approximation behavior for the beta distribution, especially for . This can be explained by the numerical integration challenges of the beta probability distribution. See Supplemental Material: Figures S.1 to S.6.
Sensitivity analysis of the patient mix
In this section, we first compare the full model with the model-based patient mixes ARL performance. The modeling approach is then applied to evaluate the patient mix’s sensitivity to risk distribution changes. In this way, we can determine the false alarm behavior of the monitoring scheme. Finally, we compare different patient risk distributions in their detection rates for various shifts in surgical performance.
Model comparison
The final values for the approximation methods and patient mix models studied in the previous section are summarized in Table 1. It shows that the numerical methods’ final values are close to the Monte Carlo simulation results. The maximum standard error in simulation procedures is less than . The constant gap in the ARL between the two CUSUM designs and is primarily related to the choice of specific control limits, and , from the illustrative example. To avoid this gap and enhance comparability, we calibrate all control charts in subsequent analyzes to an of . Interestingly, we observe that the resulting ARL performance of RA CUSUM charts with a patient mix modeled by the proposed probability distributions deviates only between and from the full score model. For both models, which utilize either a discrete beta or beta-binomial distribution, the deviation from the full model is about . The continuous beta distribution also exhibits only a small deviation of . This result strongly supports our proposal to model the patient mix by a suitable probability distribution.
Final in-control ARL values for patient distributions and approximation methods.
Patient distribution
Approximation method
Full model
MC-Simulation
7 388.0
6 112.1
Markov chain
7387.7
6112.1
Beta-binomial
MC-Simulation
7162.5
5907.4
Markov chain
7162.4
5908.2
Discrete beta
MC-Simulation
7163.2
5914.3
Markov chain
7162.1
5914.4
Beta
MC-Simulation
7039.9
5815.6
Markov chain
7040.3
5814.6
Piece-wise collocation
7040.5
5815.1
Collocation
7039.4
5815.1
*MC-Simulation standard error .
To investigate the effects of patient mixes on false alarm behavior and detection ability in the following sections, we choose the beta-binomial distribution, since it provides reliable and stable approximation results (see Figures 5, 6, and Table 1) as well takes into account the discrete data structure. However, almost comparable results for the following analysis can be obtained for a discrete beta distribution or a continuous beta distribution; see Supplemental Material (Figures S.7 to S.10 and Table T.1).
False alarm behavior
Several studies have indicated that a change in patient risk distribution, which is usually assumed to be constant, can affect the control chart’s performance to monitor surgical performance.16–19 To examine and quantify possible effects on the false alarm behavior (), we perform a sensitivity analysis by modeling different patient mixes with a beta-binomial distribution.
A previous study16 used the beta distribution to model the patient mix for a different data set. However, the analysis was restrictive in the number of scenarios examined (). To obtain greater insight into the ARL’s behavior concerning possible effects of changes in risk distribution, we take up the idea of Loke an Gan16 and investigate this problem on a larger scale. More specifically, we apply the Markov chain approximation method () for a beta-binomial distribution. We set up a one-sided RA CUSUM chart tuned to detect a shift in the odds ratio of and apply the risk model in (7) with and . For the true model () the target is specified to 7500. An iterative procedure is applied based on a full grid search with four-digit decimal precision to determine the corresponding control limit . Then, the ARLs of 102,771 different points on a grid with step size are calculated to determine their deviations from the target .
Figure 7 shows the resulting isolines for the and for the expected value (5) of the Parsonnet score following a beta-binomial distribution. We observe that both types of isolines extend almost parallel as straight lines. Hence, the changes in the parameters describing the risk distribution may directly affect the . These effects and their peculiarities can vary depending on the magnitude and direction of the parameter changes. For example, the in-control ARL reduces if either increases and or decreases and . This effect gets amplified when both parameters change in opposite directions ( increase while decrease). The three patterns of change described above correspond to a beta-binomial distribution, which models a risk distribution with an increased expected value. This corresponds to a patient population in which more operations are performed on average on patients with higher risk scores. Furthermore, simultaneous increases or decreases of and can have compensatory effects that only lead to relocation on the same isoline. Thus the shape of the risk distribution has changed, but the expected value (5) remains constant, see Figure 7. In addition to these compensatory effects, more pronounced changes in the magnitude of a single shape parameter can also reduce or increase in the average risk score and, therefore, consequently lead to changes in the . For example, the decrease of may outweigh that of , leading to an increase in the average risk score and reducing the in-control ARL. Inversion of the above-mentioned influences of the parameters and leads to contrary patterns of change in the risk distribution and an increased . Hence, the control chart’s false alarm behavior is impacted less by the actual shape of risk distribution but primarily by the average risk score and its change.
In-control ARLs for beta-binomial models showing isolines of ARL () and expected risk score (). The control chart is calibrated for a beta-binomial distribution () to an of .
Table 2 shows the impact of possible deviations from the first two years (Phase I) of the data set used as a calibrated reference scenario on the in-control ARL, similar to Figure 7. It further lists characteristics from different patient subsets of the original observed data and their related beta-binomial models. The upper part of the table includes risk distributions for six individual surgeons from the first two years of the data set and also for comparison, data from the remaining five years (Phase II), and the entire data set. Furthermore, we add two artificial risk mixes with extremely low and high-risk to illustrate the severity of possible effects.18 Except for one artificial patient mix, the inverse J-shaped form of the risk distribution () remains unchanged in all subgroups. The patient mix characteristics (median and average risk score) from the original empirical and the beta-binomial model are also close. Substantial deviations from the are not observed for Phase II and the entire data set, but for individual surgeons of Phase I in the range of to compared to the reference scenario. Deviations of the in-control ARL are similar for RA CUSUM control charts to detect deterioration or improvement, but slightly more pronounced in each subset for improvement, similar as in Tian et al.18 The lower part of Table 2 lists patient mixes of individual years from the year 1994 onwards to study changes of the risk distribution over time and validate potential effects. We observe that the annual patient mix is relatively constant and that only minor deviations ( to ) from the occur.
Patient distribution characteristics (average and median risk score) and their corresponding in-control ARLs.
Patient distribution
Beta-binomial
Empirical data
Parameter
Average
Median
Average
Median
Cases
Artificial high-risk
19.4
17
4342.0
3 983.0
Surgeon #2
12.5
10
6062.8
5902.2
12.4
10
287
Surgeon #1
11.3
8
6466.0
6255.3
11.3
7
565
Surgeon #7
10.5
7
6816.9
6793.6
10.5
8
260
Phase II
9.8
6
7134.8
7152.9
9.8
7
4776
Surgeon #3
9.6
6
7176.5
7130.9
9.6
6
324
Complete data
9.5
6
7235.7
7246.6
9.5
7
6994
Phase I*
8.9
5
7500.5
7500.3
8.9
6
2218
Surgeon #6
5.5
3
9731.5
10276.3
5.6
3
474
Surgeon #5
4.3
2
10759.2
11523.1
4.4
3
308
Artificial low-risk
2.6
1
12433.5
13483.3
Single year 1994
10.5
7
6761.9
6641.8
10.6
7
969
Single year 1995
9.8
7
7073.2
7027.8
9.9
7
1134
Single year 1996
10.7
7
6713.1
6661.4
10.7
8
877
Single year 1997
9.4
7
7382.6
7536.0
9.4
7
950
Single year 1998
8.3
6
7974.4
8241.0
8.3
6
846
*Reference scenario.
Next, we extend our investigations on the same parameter scale as before and vary the CUSUM design parameter , which controls the CUSUM chart’s sensitivity to detect a certain change in the odds ratio. Figure 8 shows the resulting isolines for control charts designed to detect small, medium, and large shifts in surgical performance () with and () with , respectively. We note that the observed patterns of isolines are similar to Figure 7 for all six charts independently of the considered shift size . The choice of a CUSUM tuning parameter closer to one, , leads to larger deviations in the (narrower isolines) when the patient mix has changed. Furthermore, the choice of a larger leads to smaller deviations in the ARL profiles (wider isolines). This effect is somewhat less pronounced for detecting improvement .
In-control ARLs for beta-binomial models showing isolines of ARLs (—) for different out-of-control shift sizes . All six charts are calibrated for a beta-binomial distribution () to an of .
Based on our results, we confirm Tian et al.’s18 findings, namely that the can vary considerably between different risk distributions and that there is a decreasing trend between and average risk score. However, justified by our study’s scope, we can further generalize them. Moreover, by employing the isolines, we show a consistent (negative) relationship between the average risk score and and that this relationship is valid independently of the design shift magnitude .
Detection ability
Since the false alarm behavior of RA CUSUM charts can be significantly influenced by the patient mix, knowledge of the ability and speed of detection of changes in surgical performance, taking the risk distribution into account, is of valuable interest. Previous performance evaluations in the literature20,21,23 limited their examinations of out-of-control ARL () effects to a particular changed patient mix after online monitoring with the control chart had already started. In this scenario, regular recalibration of control limits is usually suggested to adapt to the new patient risk mix. However, comparisons of different patient risk distributions that may occur in different hospitals for the same type of surgery and assessing of their individual detection rate have not yet been performed.
Given the risk model in (7) with coefficients and , we compare the performance of different patient populations modeled with the beta-binomial distribution. Again, Markov chain approximations () are used to calculate the ARLs of five different risk distributions. Three are derived from the original data, the complete Phase I, the two surgeons with the most extreme patient mix (Surgeon #2 and #5), and two artificial patient mixes with more extreme risk distributions, see Section 6.2. Various surgical performance levels can be studied by modifying the odds ratio with
where characterizes improvement and deterioration. In (12), the probability of death after surgery of a patient with a Parsonnet score is given by (8) if performed by an average surgeon estimated from in-control data and if performed by a surgeon with a changed surgical performance . Subsequently, these modified probabilities are then used to calculate the log-likelihood ratio statistic in (9) and the CUSUM scores in (10), (11).
Figure 9 depicts the resulting ARL profiles of one-sided RA CUSUM charts for and . For comparison, the in-control ARL of all charts is set to . Different levels of are shown on a grid from to with stepsize . We note that for very tiny changes (), there are only minor differences in the detection of changes in surgical performance between different patient mixes. More considerable changes in the odds ratio, characterized by a marked improvement or deterioration in surgical performance, are detected more quickly in RA CUSUM charts monitoring a higher risk patient mix. The spread of the for specifc changes in the odds ratio can be more than twice as high, depending on the patient mix’s risk distribution.
Out-of-control ARLs indicated on the logarithmic scale of one-sided RA CUSUM charts for five beta-binomial models and different levels of surgical performance . The left panel shows results for detecting deterioration and the right panel for detecting improvement in surgical performance.
A summary of the setting is given in Table 3. From the performance evaluation, we conclude that RA CUSUM control charts’ detection speed depends, similar to the , mainly on the patient mix’s average risk score. Consequently, a change in surgical performance is more likely to be detected more quickly in a patient mix with a higher average risk score.
Out-of-control ARL for different patient risk distributions.
Patient distribution
Beta-binomial
Artificial low-risk
4.0636
296
3.6770
601
Surgeon #5
4.2001
267
3.8221
536
Phase I
4.5443
209
4.2252
378
Surgeon #2
4.7494
179
4.4536
312
Artificial high-risk
5.0736
142
4.8326
224
Simulation study
In this section, the model-based approach is utilized to develop alternative scenarios and highlight potential patient mix-related challenges when surgical performance is monitored online. For Phase I (training data), we assume a beta-binomial() distribution to represent the patient mix and a risk model with coefficients and . Two control charts with the control limits and are set up, giving an in-control ARL of each chart. Phase II (monitoring data) is generated by the model-based approach, and a change-point is introduced at patient number . This divides the data into two parts: in-control, , and out-of-control, , where either a shift in the patient mix, surgical performance , or both occurs:
Two patient mixes of explicit lower-risk and higher-risk compared to the in-control patient mix were selected to demonstrate the effects of a single step change in risk distribution on the control charting procedure. However, we emphasize that the model-based approach allows investigations of any parameterization of patient mixes and shift patterns, i.e., multilevel or linear changes.
Figure 10 shows the online monitoring of surgical performance in three different scenarios with RA CUSUM charts. The upper CUSUM charts are designed to detect deterioration, and the lower ones to detect an improvement in surgical performance. The left panel (Scenario A) displays the first signal at patient , correctly indicating a deterioration in performance. In contrast, the middle panel (Scenario B) incorrectly signals performance deterioration at patient , although the actual performance () has not changed, only the distribution to higher-risk patients. Furthermore, it is also possible to fully miss a change in performance. Scenario C’s control charts do not show a change in performance because the actual performance shift () overlays with a patient mix change to lower-risk patients. These examples clearly illustrate that it is challenging to distinguish whether the given signal is caused by an actual change in surgical performance or the patient mix. It also highlights the RA CUSUM charts’ fundamental problem namely its non-adaptability to risk distribution changes. Therefore, inferences from RA CUSUM control charts alone may be misleading or incorrect without considering the patient mix’s influence.16
Online monitoring of surgical performance with RA CUSUM charts is displayed for three different scenarios. Scenario A (left panel) correctly signals a performance deterioration. Scenario B (middle panel) incorrectly signals performance deterioration. In scenario C (right panel), an actual change in performance is not detected.
Discussion
In this research, we propose a framework to flexibly model the patient risk score population using different probability distributions and investigate the effects of patient mix changes. Our study finds that both a beta-binomial and a discretized beta distribution are well suited to model the patient mix by the discrete risk scoring system’s underlying properties. However, a continuous beta-distributed risk score yields almost comparable results. In order to obtain reliable and precise results of the control chart performance measure ARL for the considered distributions, various numerical methods were applied and their approximation accuracy was checked. Because we employ a flexible parametric model instead of the complete empirical distribution, data availability is not an issue, and various patient mix changes can be analyzed. The sensitivity analysis of the patient mix with more than different scenarios in Section 6 showed that the control chart’s false alarm behavior is less influenced by the actual shape of the risk distribution but primarily by the average risk score and its change. Specifically, the parallel isolines of in-control ARL and expected risk score exhibit a consistent negative relationship. This supports the application of a joint monitoring scheme of patient mix and surgical risk.16
We show that chart parameters, in this article, the control limit, based solely on Phase I patient mix data, should be used cautiously as Phase II data may differ from Phase I data. Accordingly, as shown in Section 6.2, the actual false alarm behavior can be severely affected. An alternative approach for setting up the control chart is to choose the desired and then select the control limit that allows for such a false alarm rate. These Dynamic Probability Control Limits (DPCL) introduced by Zhang and Woodall36 in a RA setting can maintain the prespecified throughout the whole monitoring period and are by design robust to patient mix changes. However, our proposed approach does not fix the issue of keeping the false alarm rate constant in the monitoring period. On the contrary, it highlights the classical’s RA Bernoulli CUSUM chart’s fundamental problem: the non-adaptability to changes in risk distribution from Phase I (training data) to Phase II (monitoring data). Thus, as shown in Section 7, conclusions from RA CUSUM charts alone that do not consider the patient mix’s influence or change can be misleading or incorrect. We recommend that users of RA CUSUM charts either apply DPCLs or a joint monitoring scheme of the surgical risk and the patient risk distribution.
The proposed modeling approach provides versatile opportunities for future research and extensions. On the one hand, it can be adapted for other CUSUM-type methods such as the RA CUSUM based on multiresponses19,37 or the CUSUM.38 This also offers users of the variable life-adjusted display33 an application option. On the other hand, it can be a starting point for the development of new adaptive self-starting RA CUSUM charts that do not rely on Phase I data.
Supplemental Material
sj-pdf-1-smm-10.1177_09622802211053205 - Supplemental material for Modeling the patient mix for risk-adjusted CUSUM charts
Supplemental material, sj-pdf-1-smm-10.1177_09622802211053205 for Modeling the patient mix for risk-adjusted CUSUM charts by PhilippWittenberg in Statistical Methods in Medical Research
Footnotes
Acknowledgements
The author thanks the anonymous reviewer and the editor for their constructive comments, which improved the the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
ORCID iD
Philipp Wittenberg
Supplemental material
Supplementary material for this article is available online.
Appendices
References
1.
SteinerSHCookRJFarewellVT, et al. Monitoring surgical performance using risk-adjusted cumulative sum charts. Biostatistics2000; 1: 441–452.
2.
GriggOAFarewellVTSpiegelhalterDJ. Use of risk-adjusted CUSUM and RSPRTcharts for monitoring in medical contexts. Stat Methods Med Res2003; 12: 147–170.
3.
WoodallWH. The use of control charts in health-care and public-health surveillance. J Qual Technol2006; 38: 89–104.
4.
WoodallWHFogelSLSteinerSH. The monitoring and improvement of surgical-outcome quality. J Qual Technol2015; 47: 383–399.
5.
SachlasABersimisSPsarakisS. Risk-adjusted control charts: Theory, methods, and applications in health. Stat Biosci2019; 11: 630–658.
6.
HusseinAKasemANkurunzizaS, et al. Performance of risk-adjusted cumulative sum charts when some assumptions are not met. Commun Stat Simul Comput2015; 46: 823–830.
7.
WebsterRAPettittAN. Stability of approximations of average run length of risk-adjusted CUSUM schemes using the Markov approach: Comparing two methods of calculating transition probabilities. Commun Stat Simul Comput2007; 36: 471–482.
8.
ZengLZhouS. A Bayesian approach to risk-adjusted outcome monitoring in healthcare. Stat Med2011; 30: 3431–3446.
9.
LiuLLaiXZhangJ, et al. Online profile monitoring for surgical outcomes using a weighted score test. J Qual Technol2018; 50: 88–97.
10.
LiuJLaiXWangJ, et al. A fast online monitoring approach for surgical risks. Math Biosci Eng2020; 17: 3130–3146.
11.
SegoLHReynoldsMRWoodallWH. Risk-adjusted monitoring of survival times. Stat Med2009; 28: 1386–1401.
12.
SteinerSHJonesM. Risk-adjusted survival time monitoring with an updating exponentially weighted moving average (EWMA) control chart. Stat Med2009; 29: 444–454.
13.
SteinerSHMackayRJ. Monitoring risk-adjusted medical outcomes allowing for changes over time. Biostatistics2014; 15: 665–676.
14.
GriggOSpiegelhalterD. A simple risk-adjusted exponentially weighted moving average. J Am Stat Assoc2007; 102: 140–152.
15.
GanFFTanT. Risk-adjusted number-between failures charting procedures for monitoring a patient care process for acute myocardial infarctions. Health Care Manag Sci2010; 13: 222–233.
16.
LokeCKGanFF. Joint monitoring scheme for clinical failures and predisposed risks. Qual Technol Quant Manag2012; 9: 3–21.
17.
RogersCAReevesBCCaputoM, et al. Control chart methods for monitoring cardiac surgical performance and their interpretation. J Thorac Cardiovasc Surg2004; 128: 811–819.
18.
TianWSunHZhangX, et al. The impact of varying patient populations on the in-control performance of the risk-adjusted CUSUM chart. Int J Qual Health Care2015; 27: 31–36.
19.
TangXGanFFZhangL. Risk-adjusted cumulative sum charting procedure based on multiresponses. J Am Stat Assoc2015; 110: 16–26.
20.
SteinerSCookRFarewellV. Risk-adjusted monitoring of binary surgical outcomes. Med Decis Making2001; 21: 163–169.
21.
ChangTC. Cumulative sum schemes for surgical performance monitoring. J R Stat Soc Ser A Stat Soc2008; 171: 407–432.
22.
JonesMASteinerSH. Assessing the effect of estimation error on risk-adjusted CUSUM chart performance. Int J Qual Health Care2011; 24: 176–181.
23.
RossiGSartoSDMarchiM. A new risk-adjusted Bernoulli cumulative sum chart for monitoring binary health data. Stat Methods Med Res2016; 25: 2704–2713.
24.
KnothSWittenbergPGanFF. Risk-adjusted CUSUM charts under model error. Stat Med2019; 38: 2206–2218.
25.
ParsonnetVDeanDBernsteinA. A method of uniform stratification of risk for evaluating the results of surgery in acquired adult heart disease. Circulation1989; 79: I3–I12.
26.
ThaljiNMSuriRMGreasonKL, et al. Risk assessment methods for cardiac surgery and intervention. Nat Rev Cardiol2014; 11: 704–714.
27.
PaynabarKJinJJYehAB. Phase I risk-adjusted control charts for monitoring surgical performance by considering categorical covariates. J Qual Technol2012; 44: 39–53.
28.
JohnsonNLKotzSBalakrishnanN. Beta distributions. In: Continuous univariate distributions, Vol. 2. 2nd ed. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons, 1995, pp.210–275.
29.
SkellamJG. A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. J R Stat Soc Series B Stat Methodol1948; 10: 257–261.
LieRTHeuchIIrgensLM. A new sequential procedure for surveillance of Down's syndrome. Stat Med1993; 12: 13–25.
32.
de LevalMRFrançoisKBullC, et al. Analysis of a cluster of surgical failures: Application to a series of neonatal arterial switch operations. J Thorac Cardiovasc Surg1994; 107: 914–924.
33.
LovegroveJValenciaOTreasureT, et al. Monitoring the results of cardiac surgery by variable life-adjusted display. Lancet1997; 350: 1128–1130.
34.
PolonieckiJValenciaOLittlejohnsP. Cumulative risk adjusted mortality chart for detecting changes in death rate: observational study of heart surgery. BMJ1998; 316: 1697–1700.
35.
HawkinsDMOlwellDH. Cumulative sum charts and charting for auality improvement. New York: Springer, 1998.
36.
ZhangXWoodallWH. Dynamic probability control limits for risk-adjusted Bernoulli CUSUM charts. Stat Med2015; 34: 3336–3348.
37.
GanFFTangXZhuY, et al. Monitoring the quality of cardiac surgery based on three or more surgical outcomes using a new variable life-adjusted display. Int J Qual Health Care2017; 29: 427–432.
38.
WittenbergPGanFFKnothS. A simple signaling rule for variable life-adjusted display derived from an equivalent risk-adjusted CUSUM chart. Stat Med2018; 37: 2455–2473.
39.
AtkinsonKE. A survey of numerical methods for the solution of Fredholm Integral Equations of the second kind. Philadelphia: Society for Industrial and Applied Mathematics, 1976.
40.
KnothS. Computation of the ARL for CUSUM- schemes. Comput Stat Data Anal2006; 51: 499–512.
41.
ShuLHuangWSuY, et al. Computation of the run-length percentiles of CUSUM control charts under changes in variances. J Stat Comput Simul2013; 83: 1238–1251.
42.
HuangWShuLJiangW, et al. Evaluation of run-length distribution for cusum charts under gamma distributions. IIE Trans2013; 45: 981–994.
43.
KnothS. On the calculation of the ARL for beta EWMA control charts. In: Knoth S and Schmid W (eds) Frontiers in statistical quality control 13. ISQC 2019. Cham: Springer, 2021, pp.25–44.
44.
BrookDEvansDA. An approach to the probability distribution of CUSUM run length. Biometrika1972; 59: 539–549.
45.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2021. URL https://www.R-project.org/.
46.
WittenbergPKnothS. vlad: Variable Life Adjusted Display and Other Risk-Adjusted Quality Control Charts, 2021. URL https://CRAN.R-project.org/package=vlad. R package version: 0.2.2.
47.
EddelbuettelDFrançoisR. Rcpp: Seamless R and C++ integration. J Stat Softw2011; 40: 1–18.
48.
EddelbuettelDSandersonC. RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal2014; 71: 1054–1063.
49.
EddelbuettelDEmersonJWKaneMJ. BH: Boost C++ Header Files – Metapackage, 2018. R package version 1.66.0-1.
50.
WickhamH. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.