Sage Journals: Discover world-class research

Abstract

The ctl command implements a recently proposed statistical methodology to assess the level of agreement or interchangeability between two quantitative measurement methods. It is based on tolerance limits that are specified by the user. The methodology requires repeated measurements by at least one of the two measurement methods. It accommodates heteroskedastic measurement errors and often performs well even when the user has only one measurement by one of the two measurement methods and at least five repeated measurements from the other. It provides a more direct assessment of the agreement level than the Bland-Altman limits of agreement method and circumvents some of its deficiencies.

Keywords

st0785 ctl agreement tolerance limits differential bias proportional bias method comparison

1 Introduction

In clinical research, when the characteristic of interest is continuous, the Bland-Altman limits of agreement (LOA) method (1986) is frequently used to assess the agreement or interchangeability between two measurement methods (for example, a new one to be compared with the standard one). Typically, one wants to assess the agreement between a new blood pressure device and a standard one. The Bland-Altman methodology, however, does not allow one to assess the level of agreement; rather, one appraises the amount of disagreement based on inspection of the width between the lower and upper LOA limits.

To more directly quantify the level of agreement, Lin et al. (2002) proposed the concept of “coverage probability”, where the probability of the absolute difference between the two measurement methods made on the same subject being less than a predefined value is computed. Their methodology, however, does not account for the fact that the level of agreement might not be constant and depend on the value of the true underlying latent trait (note that because of measurement errors, the measured trait is always different from the true latent trait, for example, the true blood pressure level) and allows one to assess only the overall agreement level. In addition, the variance of measurement errors is implicitly assumed to be constant (that is, homoskedastic), which is often too strong of an assumption (in empirical applications, it is often observed that variability of a trait is larger for larger values of the trait; see Taffé [2021] and Taffé et al. [2022]). Also, the presence of a possible bias of the new measurement method is not assessed. For these reasons, Stevens, Steiner, and MacKay (2017) have extended this methodology to allow the coverage probability to depend on the value of the latent trait and on the amount of bias. Later, they further extended their methodology (2018) to allow for heteroskedastic measurement errors (that is, to allow the variance of measurement errors to depend on the true underlying latent trait) and called their extended agreement concept “probability of agreement”.

However, several important limitations to the Stevens, Steiner, and MacKay (2017, 2018) methodology have been identified (a strongly parametric specification, constant tolerance limits, pointwise and not simultaneous confidence bands around the probability of agreement, no simulations, etc.), and a new statistical methodology that overcomes these defects has recently been developed (Taffé 2023). This new methodology is still based on Lin et al. (2002) and Stevens, Steiner, and MacKay (2018) coverage probability or probability of agreement concepts, but no parametric assumption is made regarding the true underlying latent trait. Rather, it is estimated by an empirical Bayes approach. In addition, both the pointwise and the simultaneous confidence bands around the conditional agreement curve have been developed such that investigators may adopt one of these according to their inference goal.

In this article, I will briefly present the theory and illustrate the use of the new ctl command. The user will have to specify the slope and intercept of the tolerance limits, and the level of agreement will be quantified by the conditional probability of agreement. The latter shows the level of agreement (measured on the probability scale) between the two measurement methods, given the expected value of the individual latent trait. Two new graphs will be introduced: the tolerance limits plot, which allows one to visualize the user-defined clinical tolerance limits, and the conditional probability of agreement plot, which shows the level of agreement between the two measurement methods given the expected value of the individual latent trait. The methodology requires repeated measurements taken on each individual for at least one of the two measurement methods. Regarding this last remark, let me emphasize that without repeated measurements by at least one of the two instruments, one cannot distinguish the differential bias from the proportional bias or account for the heterogeneity of the measurement error variances, which results in biased estimates when using the conventional Bland-Altman methodology (Taffé 2021).

Note that there are at least three other commands—concord (Steichen and Cox 1998); biasplot (Taffé et al. 2017, 2023); and blandaltman (Chatfield et al. 2023)—that are available to assess the agreement or interchangeability between two measurement methods. The ctl command differs fundamentally from these three commands in that the clinical tolerance limits are specified first based on clinical appraisal, and then the level of agreement (that is, the conditional probability of agreement) is computed. With the three other commands, the scatterplot of the differences is first generated, and then the LOA are computed.

2 Methodology of the clinical tolerance limits

Consider the general measurement error model, where method 2 is considered the reference standard:

\begin{aligned} y_{1 i j} = α_{1} + β_{1} x_{i} + ε_{1 i j}, ε_{1 i j} | x_{i} \sim N {0, σ_{ε_{1}}^{2} (x_{i}; θ_{1})} \\ y_{2 i j} = x_{1} + ε_{2 i j}, ε_{2 i j} | x_{i} \sim N {0, σ_{ε_{2}}^{2} (x_{i}; θ_{2})} \\ x_{i} \sim f_{x} (μ_{x}, σ_{x}^{2}) \end{aligned}

y_{1 i j}

is the jth repeated measurement by method 1 on individual i,

j = 1, . . ., n_{1 i}

and

i = 1, \dots, N

y_{2 i j}

is obtained by method 2, where j = 1,…,

n_{2 i}

to allow different numbers of repeated measurements per individual by the two instruments,

x_{i}

is a latent variable with density

f_{x}

representing the true unknown trait, and

ε_{1 i j}

and

ε_{2 i j}

· represent measurement errors by methods 1 and 2.

The $α_{1}$ parameter measures the differential bias and the $β_{1}$ parameter the proportional bias. It is assumed that the latent variable $x_{i}$ represents the true unknown but constant value of the trait for individual i.

It is assumed that the variances of these errors, that is, $σ_{ε_{1}}^{2} (x_{i}; θ_{1})$ and $σ_{ε_{2}}^{2} (x_{i}; θ_{2})$ , are heteroskedastic and increase with the level of the true latent trait $x_{i}$ (Taffé 2018):

\begin{aligned} σ_{ε_{1}} (x_{i}; θ_{1}) = ({\hat{θ}}_{1}^{(0)} + {\hat{θ}}_{1}^{(1)} x_{i}) \sqrt{π / 2} \\ σ_{ε_{2}} (x_{i}; θ_{2}) = ({\hat{θ}}_{2}^{(0)} + {\hat{θ}}_{2}^{(1)} x_{i}) \sqrt{π / 2} \end{aligned}

Note that this measurement error model is slightly different from the classical measurement error model (Bland and Altman 1999; Dunn 2004) in that the heteroskedasticity depends on the true latent trait and not on the observed average $(y_{1 i} + y_{2 i}) / 2$ . This is conceptually more appealing because it links the variance of measurement errors to the value of the latent trait (a relationship that is often observed in practice, where the variance of the trait is low for low values and larger for larger values of the trait, although the opposite could also hold).

2.1 Computation of the conditional and overall or marginal agreement

Consider the differences

d_{i j} = y_{1 i j} y_{2 i j}

and assume that lower

C_{L} (x_{i})

and upper

C_{U} (x_{i})

clinical tolerance limits, which may depend on the true latent trait, have been defined a priori, that is, before seeing the data, based on clinical considerations. One simple alternative is to set constant values that do not depend on the true latent trait:

\begin{aligned} C_{L} (x_{i}) = a \\ C_{U} (x_{i}) = b \end{aligned}

Alternatively, one may define limits that depend on the true latent trait in a specific form, for example, linear:

\begin{aligned} C_{L} (x_{i}) = - a - b x_{i} \\ C_{U} (x_{i}) = a + b x_{i} \end{aligned}

a represents the smallest tolerable upper limit for a zero value of the latent trait, and b is the percentage of acceptable difference beyond the zero latent trait value.

Given (1) and assumptions, the conditional probability of agreement is computed as

P {C_{L} (x_{i}) < d_{i j} < C_{U} (x_{i}) | x_{i}} \equiv π (x_{i})

This is the probability that the difference

d_{i j}

lies between the two tolerance limits,

C_{L} (x_{i})

and

C_{U} (x_{i})

, for a specific value

x_{i}

of the latent trait.

The ctl command will also compute the overall or marginal agreement,

P {C_{L} (x_{i}) < d_{i j} < C_{U} (x_{i})} \equiv π

which represents the overall probability of agreement for all the measurements.

2.2 Estimation of the model parameters

Taffé (2018) has developed a two-step method to estimate the parameters of (1). Because of the complexity of the model, inference is carried out based on a simulation method (Taffé 2023). Therefore, the nbsimul() option of the ctl command has been set to 1,000 by default, which should be sufficient to compute the required critical value in most cases. Also, the seed() option allows the user to change the default value (seed(123456789)) of the seed used by the random-number generator.

3 The ctl command

The ctl command makes the methodology of the clinical tolerance limits (Taffé 2023) available for Stata users. The user will first have to specify the desired intercept and slope of the lines defining the clinical tolerance limits and then choose between two different graphs to be generated—either the tolerance limits plot or the conditional probability of agreement plot.

3.1 Syntax

The syntax for using ctl is

ctl [if] [in], idvar(varname) ynew(varname) yref(varname)

[ intercept(real) slope(real) cpaplot tlplot pointwise simultaneous pdf

results nbsimul(integer) seed(integer)]

When the goal is to assess the agreement level for a specific value of the latent trait, a pointwise confidence interval is fine because it guarantees that 95% of the computed intervals will cover the true value on average. However, when interest lies in several points from the support or the whole curve, a simultaneous confidence band is required because it guarantees a proper coverage rate for the simultaneous inference no matter the number of points from the support.

3.2 Options

idvar(varname) defines the variable identifying the individual. idvar() is required.

ynew(varname) specifies the new measurement method. ynew() is required.

yref(varname) specifies the reference standard method. yref() is required.

intercept(real) specifies the desired intercept of the upper clinical tolerance limit. The lower limit is automatically defined by a negative sign.

slope(real) specifies the desired slope of the upper clinical tolerance limit. The lower limit is automatically defined by a negative sign.

cpaplot graphs the conditional probability of agreement plot. By default, ctl will compute and graph both a pointwise and a simultaneous 95% confidence band around the conditional probability of agreement.

tlplot graphs the clinical tolerance limits plot. By default, ctl will compute and graph both a pointwise and a simultaneous 95% confidence band around the conditional probability of agreement.

pointwise graphs a pointwise confidence band only. By specifying either the pointwise option or the simultaneous option, the user can select the preferred inference goal.

simultaneous graphs a simultaneous confidence band only.

pdf saves the graphs in PDF (instead of Stata’s .gph format).

results generates a file called ctl_results.dta, which contains the original data plus the estimates computed by the program (all the variables are prefixed by my_var- name).

nbsimul(integer) allows one to change the default value (that is, nbsimul(1000)) of the number of simulations carried out to compute the confidence bands. For example, to set the number of simulations to 2,000, specify nbsimul(2000).

seed(integer) allows one to set the seed for random-number generation. The default is seed(123456789).

3.3 Stored results

ctl stores the following in r():

4 Numerical examples

4.1 Example 1

To illustrate the use of the ctl command, we will consider the following simulated dataset:

The first column, id, represents the individual; the second, t, the index for the repeated measurements; the third, y1, the measurements made by method 1; and the fourth, y2, the measurements made by method 2.

The data have been generated using the simulation model¹

\begin{aligned} y_{1 i} = 4 + 0.8 x_{i} + ε_{1 i}, ε_{1 i} | x_{i} \sim N {0, {(0.2 x_{i})}^{2}} \\ y_{2 i j} = x_{i} + ε_{2 i j} ε_{2 i j} | x_{i} \sim N {0, {(1.75 + 0.08 x_{i})}^{2}} \\ x_{i} \sim Uniform [10 - 100] \end{aligned}

where the number of individuals

i = 1, \dots, 100

and the number of repeated measurements per individual

n_{1 i} = n_{2 i} = 5

There are 5 repeated measurements per individual both by the reference standard y2 and the new measurement method y1. The new method (y1) has a differential bias of 4 and a proportional bias of 0.8. In addition, the variance of the measurement errors from method y1 is larger than that of the reference method y2. Note that when individuals do not have the same number of observations by the reference standard, the precision (that is, variance) of the predicted value (best linear unbiased prediction [BLUP] x) of the true latent trait will vary across individuals, and a smoothing of the confidence bands has been implemented using fractional polynomials of degree 2.

The clinical tolerance limits, for the sake of simplicity, have been set to constant values here and do not depend on the true latent trait:

\begin{aligned} C_{L} (x_{i}) = - 5 \\ C_{U} (x_{i}) = 5 \end{aligned}

The simulated data have been saved in the file named ctl_dataset1.dta.

We load the example dataset:

. use ctl_dataset1

Below, we use the ctl command and specify an intercept of ±5 and a slope of 0 for the clinical tolerance limits. We specify the option tlplot to draw the clinical tolerance limits plot. For convenience, the program will provide the estimates of the differential and proportional biases along with their 95% confidence intervals:

Figure 1.

Tolerance limits plot

By inspecting the tolerance limits plot, which shows the observations within the clinical limits, investigators can visually check whether the specified limits correspond to their clinical intuition.

One can specify the option cpaplot to see the conditional probability of agreement plot:

Figure 2.

Conditional probability of agreement plot

The conditional probability of agreement plot allows one to assess the level of agreement (that is, the conditional probability of agreement) for a given expected value of the true latent trait (that is, BLUP of x). For example, when the expected value of the latent trait is 20, the level of agreement is about 70%. When the expected value is 60, the level of agreement is about 25%. Finally, when the expected value is 100, the level of agreement is only about 15%. The overall agreement (shown in the subtitle) is 39%. The pointwise confidence band shows the uncertainty in the estimate at a specific value of the latent trait, whereas the simultaneous band accounts for the uncertainty in the whole curve and allows one to assess the uncertainty in the estimated conditional probability of agreement simultaneously at several values or even all values of the latent trait. When one uses either the pointwise option or the simultaneous option, only the pointwise or the simultaneous confidence band, respectively, will be drawn.

Based on inspection of the previous tolerance limits plot, the investigator decides that it is now more clinically relevant to specify nonconstant tolerance limits, which depend on the true latent trait. For the sake of simplicity, it is assumed here that this dependence is appropriately described using the following linear relationships:

\begin{aligned} C_{L} (x_{i}) = - 0.15 x_{i} \\ C_{U} (x_{i}) = 0.15 x_{i} \end{aligned}

In this case, the tolerance limits are narrower for small values and wider for large values of the latent trait:

Figure 3.

Tolerance limits plot

This time, the investigator is happier with these tolerance limits and proceeds further by computing the conditional probability of agreement plot:

Figure 4.

Conditional probability of agreement plot

Now, because of the funnel-shaped tolerance limits, the conditional probability of agreement is more homogeneous for values of the latent trait that are between 20 and 100 but drops down strongly for values below 20.

Note that the conditional probability of agreement plot is saved to the current working directory under the name cpa_plot.gph. To change aspects of the graph (color, title, markers, etc.), as usual, one may use the Graph Editor by clicking on Start Graph Editor from the File menu within the Graph window.

4.2 Example 2

Using the same simulation model as in example 1, we have generated a second example dataset with unequal numbers of repeated measurements per individual (between 1 to 3 for method y1 and between 5 to 10 for method y2) to illustrate that the methodology can still be used in this setting:

Figure 5.

Tolerance limits plot (left) and conditional probability of agreement plot

4.3 Example 3

As a last example, consider real data on systolic blood pressure (SBP) from a previous study (Taffé, Halfon, and Halfon [2020], dataset being nonpublic):

This scatterplot presents the repeated SBP measurements made on each individual (see marker labels) by the Microlife watchHomeBP oscillometric blood pressure device (that is, new method y1) and the invasive arterial blood pressure method based on an arterial indwelling catheter (that is, reference method y2). There were 10 repeated pairs per individual, except for one patient where only 6 pairs were available.

In this example, the clinical tolerance limits have been set as follows:

\begin{aligned} C_{L} (x_{i}) = - 0.1 x_{i} \\ C_{U} (x_{i}) = 0.1 x_{i} \end{aligned}

That is, the tolerance limits are narrower for small values and larger for large values of the latent trait. Because $x_{i}$ is not observed by definition, it is replaced by its best linear unbiased prediction (BLUP of $x$ ). Consequently, in the tolerance limit plot, the differences are allowed to vary around zero within a funnel-shaped band of width ±10% of the BLUP of x_i (that is, true latent trait) to declare agreement between the two paired measurements:

Figure 6.

Scatterplot of the repeated SBP measurements made on each individual by the invasive arterial blood pressure and watchHomeBP devices

Figure 7.

(left) Tolerance limits plot (right) and conditional probability of agreement plot

The conditional probability of agreement plot shows that the level of agreement is better for blood pressures in the normal range (<= 120 mmHg). The pointwise 95% confidence band is useful for assessing the level of agreement at a specific value of SBP. However, when one is interested in assessing the agreement level at several values or even across an entire interval of the latent trait, the simultaneous confidence band is required. Here we see that the simultaneous band is, as expected, wider than the pointwise to guarantee at least a 95% coverage rate for the simultaneous inference.

5 Conclusions

Based on simulated data, I have illustrated the use of the ctl command to assess the level of agreement between two quantitative measurement methods. The package implements both pointwise and simultaneous confidence bands around the probability of agreement curve to allow formal inferences. This is particularly relevant from a clinical perspective. For example, thanks to the simultaneous confidence band, it is possible to assess the level of agreement between the two measurement methods for any number of values or over any interval of values of the latent trait, which is not obvious in the case of pointwise confidence bands because of the issue of multiple testing. This is very useful because it may turn out that the level of agreement is high enough only for large values of the latent trait but not for low values, which limits the usefulness of the new measurement method only for large expected values of the true latent trait.

Requiring repeated measurements by one of the two measurement methods might discourage the applied researcher from using this methodology. However, repeated measurements by at least one of the two measurement methods are necessary for mathematical identification. Indeed, when the variance of the measurement errors of each instrument is not constant (it often increases with the latent trait) and their ratio is unknown, which is usually the case in the biomedical field, having only one measurement by each of the two measurement methods does not allow one to identify all the parameters of the model (Dunn 2004). In this setting, the investigator may resort to the Bland-Altman LOA method, which does not require individual repeated measurements. However, as already shown (Taffé 2018, 2021), the LOA plot may be misleading in this setting, particularly when the ratio of the variances is not proportional to the proportional bias.

The marginal probability of agreement has simply been estimated by the proportion of observations between the two tolerance limits and provides an overall summary of the agreement level. One may notice that depending on how the tolerance limits have been defined, the conditional probability of agreement may be similar in shape to the percentage of agreement proposed by Taffé (2020). The former is based on predefined tolerance limits, whereas the latter depends on the width of the LOA and the amount of bias and does not require the investigator to set tolerance limits. Which one should be preferred depends on the information available a priori to the investigator for setting the limits. I have illustrated that the definition of the tolerance limits may have an important leverage effect regarding the level of the conditional agreement calculated, whereas the percentage of the agreement depends solely on the variability and bias found in the data. I recommend computing both measures of agreement and thoroughly inspecting the plots before deciding on the agreement.

Finally, note that this modeling strategy rests on the assumption that the individual true latent trait is constant within individuals, that is, $x_{i j} \equiv x_{i}$ . This means that the repeated measurements should ideally be taken in sequence within a time interval where this assumption is sensible. It is theoretically possible to extend the methodology to other settings where the latent trait has a time trend (Taffé 2018). However, this is beyond the scope of this article and will be the subject of a future project.

6 Declaration of conflicting interests

The author declares no potential conflicts of interest with respect to the research, authorship, or publication of this article.

7 Programs and supplemental material

To install the software files as they existed at the time of publication of this article, type

Supplemental Material

sj-dta-1-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits

Supplemental material, sj-dta-1-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal

Supplemental Material

sj-dta-2-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits

Supplemental material, sj-dta-2-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal

Supplemental Material

sj-dta-3-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits

Supplemental material, sj-dta-3-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal

Supplemental Material

sj-txt-1-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits

Supplemental material, sj-txt-1-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal

Footnotes

Notes

References

Bland

J. M.

Altman

D. G.

. 1986. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 327: 307–310. 10.1016/S0140-6736(86)90837-8.

Bland

J. M.

Altman

D. G.

. 1999. Measuring agreement in method comparison studies. Statistical Methods in Medical Research 8: 135–160. 10.1177/096228029900800204.

Chatfield, M. D., T. J. Cole, H. C. W. de Vet, L. Marquart-Wilson, and D. M. Farewell. 2023. blandaltman: A command to create variants of Bland-Altman plots. Stata Journal 23: 851–874. 10.1177/1536867X231196488.

Dunn, G. 2004. Statistical Evaluation of Measurement Errors: Design and Analysis of Reliability Studies. 2nd ed. London: Arnold.

Lin

H. A. S.

. and M

. Yang. 2002. Statistical methods in assessing agreement: Models, issues, and tools. Journal of the American Statistical Association 97: 257–270. 10.1198/016214502753479392.

Steichen

T. J.

Cox

N. J.

. 1998. sg84: Concordance correlation coefficient. Stata Technical Bulletin 43: 35-39. Reprinted in Stata Technical Bulletin Reprints, vol. 8, pp. 137-143. College Station, TX: Stata Press.

Stevens

N. T.

Steiner

S. H.

MacKay

R. J.

. 2017. Assessing agreement between two measurement systems: An alternative to the limits of agreement approach. Statistical Methods in Medical Research 26: 2487–2504. 10.1177/0962280215601133.

Stevens

N. T.

Steiner

S. H.

MacKay

R. J.

. 2018. Comparing heteroscedastic measurement systems with the probability of agreement. Statistical Methods in Medical Research 27: 3420–3435. 10.1177/0962280217702540.

Taffé

. 2018. Effective plots to assess bias and precision in method comparison studies. Statistical Methods in Medical Research 27: 1650–1660. 10.1177/0962280216666667.

10.

Taffé

. 2020. Assessing bias, precision, and agreement in method comparison studies. Statistical Methods in Medical Research 29: 778–796. 10.1177/0962280219844535.

11.

Taffé

. 2021. When can the Bland and Altman limits of agreement method be used and when it should not be used. Journal of Clinical Epidemiology 137: 176–181. 10.1016/j.jclinepi.2021.04.004.

12.

Taffé

. 2023. Use of clinical tolerance limits for assessing agreement. Statistical Methods in Medical Research 32: 195–206. 10.1177/09622802221137743.

13.

Taffé

Halfon

. 2020. A new statistical methodology overcame the defects of the Bland-Altman method. Journal of Clinical Epidemiology 124: 1–7. 10.1016/j.jclinepi.2020.03.018.

14.

Taffé

Peng

Stagg

Williamson

. 2017. biasplot: A package to effective plots to assess bias and precision in method comparison studies. Stata Journal 17: 208–221. 10.1177/1536867X1701700111.

15.

Taffé

Peng

Stagg

Williamson

. 2023. Extended biasplot command to assess bias, precision, and agreement in method comparison studies. Stata Journal 23: 97–118. 10.1177/1536867X231161978.

16.

Taffé

Zuppinger

Burger

G. M.

Gonseth-Nussle

. 2022. The Bland-Altman method should not be used when one of the two measurement methods has negligible measurement errors. PLOS ONE 17: e0278915. 10.1371/journal.pone.0278915.

17.

About the author

18.

Patrick Taffé is a senior research fellow at the Center for Primary Care and Public Health (Unisante), Division of Biostatistics, University of Lausanne, Switzerland.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB

0.03 MB

0.00 MB