Abstract
The
Introduction
In clinical research, when the characteristic of interest is continuous, the Bland-Altman limits of agreement (LOA) method (1986) is frequently used to assess the agreement or interchangeability between two measurement methods (for example, a new one to be compared with the standard one). Typically, one wants to assess the agreement between a new blood pressure device and a standard one. The Bland-Altman methodology, however, does not allow one to assess the level of agreement; rather, one appraises the amount of disagreement based on inspection of the width between the lower and upper LOA limits.
To more directly quantify the level of agreement, Lin et al. (2002) proposed the concept of “coverage probability”, where the probability of the absolute difference between the two measurement methods made on the same subject being less than a predefined value is computed. Their methodology, however, does not account for the fact that the level of agreement might not be constant and depend on the value of the true underlying latent trait (note that because of measurement errors, the measured trait is always different from the true latent trait, for example, the true blood pressure level) and allows one to assess only the overall agreement level. In addition, the variance of measurement errors is implicitly assumed to be constant (that is, homoskedastic), which is often too strong of an assumption (in empirical applications, it is often observed that variability of a trait is larger for larger values of the trait; see Taffé [2021] and Taffé et al. [2022]). Also, the presence of a possible bias of the new measurement method is not assessed. For these reasons, Stevens, Steiner, and MacKay (2017) have extended this methodology to allow the coverage probability to depend on the value of the latent trait and on the amount of bias. Later, they further extended their methodology (2018) to allow for heteroskedastic measurement errors (that is, to allow the variance of measurement errors to depend on the true underlying latent trait) and called their extended agreement concept “probability of agreement”.
However, several important limitations to the Stevens, Steiner, and MacKay (2017, 2018) methodology have been identified (a strongly parametric specification, constant tolerance limits, pointwise and not simultaneous confidence bands around the probability of agreement, no simulations, etc.), and a new statistical methodology that overcomes these defects has recently been developed (Taffé 2023). This new methodology is still based on Lin et al. (2002) and Stevens, Steiner, and MacKay (2018) coverage probability or probability of agreement concepts, but no parametric assumption is made regarding the true underlying latent trait. Rather, it is estimated by an empirical Bayes approach. In addition, both the pointwise and the simultaneous confidence bands around the conditional agreement curve have been developed such that investigators may adopt one of these according to their inference goal.
In this article, I will briefly present the theory and illustrate the use of the new
Note that there are at least three other commands—
Methodology of the clinical tolerance limits
Consider the general measurement error model, where method 2 is considered the reference standard:
The
It is assumed that the variances of these errors, that is,
Note that this measurement error model is slightly different from the classical measurement error model (Bland and Altman 1999; Dunn 2004) in that the heteroskedasticity depends on the true latent trait and not on the observed average
Computation of the conditional and overall or marginal agreement
Consider the differences
Alternatively, one may define limits that depend on the true latent trait in a specific form, for example, linear:
Given (1) and assumptions, the conditional probability of agreement is computed as
The
Estimation of the model parameters
Taffé (2018) has developed a two-step method to estimate the parameters of (1). Because of the complexity of the model, inference is carried out based on a simulation method (Taffé 2023). Therefore, the
The ctl command
The
Syntax
The syntax for using
[
When the goal is to assess the agreement level for a specific value of the latent trait, a pointwise confidence interval is fine because it guarantees that 95% of the computed intervals will cover the true value on average. However, when interest lies in several points from the support or the whole curve, a simultaneous confidence band is required because it guarantees a proper coverage rate for the simultaneous inference no matter the number of points from the support.
Options
Stored results
Numerical examples
Example 1
To illustrate the use of the
The first column,
The data have been generated using the simulation model
1
There are 5 repeated measurements per individual both by the reference standard
The clinical tolerance limits, for the sake of simplicity, have been set to constant values here and do not depend on the true latent trait:
The simulated data have been saved in the file named
We load the example dataset:
Below, we use the

Tolerance limits plot
By inspecting the tolerance limits plot, which shows the observations within the clinical limits, investigators can visually check whether the specified limits correspond to their clinical intuition.
One can specify the option

Conditional probability of agreement plot
The conditional probability of agreement plot allows one to assess the level of agreement (that is, the conditional probability of agreement) for a given expected value of the true latent trait (that is, BLUP of x). For example, when the expected value of the latent trait is 20, the level of agreement is about 70%. When the expected value is 60, the level of agreement is about 25%. Finally, when the expected value is 100, the level of agreement is only about 15%. The overall agreement (shown in the subtitle) is 39%. The pointwise confidence band shows the uncertainty in the estimate at a specific value of the latent trait, whereas the simultaneous band accounts for the uncertainty in the whole curve and allows one to assess the uncertainty in the estimated conditional probability of agreement simultaneously at several values or even all values of the latent trait. When one uses either the pointwise option or the simultaneous option, only the
Based on inspection of the previous tolerance limits plot, the investigator decides that it is now more clinically relevant to specify nonconstant tolerance limits, which depend on the true latent trait. For the sake of simplicity, it is assumed here that this dependence is appropriately described using the following linear relationships:

Tolerance limits plot
This time, the investigator is happier with these tolerance limits and proceeds further by computing the conditional probability of agreement plot:

Conditional probability of agreement plot
Now, because of the funnel-shaped tolerance limits, the conditional probability of agreement is more homogeneous for values of the latent trait that are between 20 and 100 but drops down strongly for values below 20.
Note that the conditional probability of agreement plot is saved to the current working directory under the name
Using the same simulation model as in example 1, we have generated a second example dataset with unequal numbers of repeated measurements per individual (between 1 to 3 for method

Tolerance limits plot (left) and conditional probability of agreement plot
As a last example, consider real data on systolic blood pressure (SBP) from a previous study (Taffé, Halfon, and Halfon [2020], dataset being nonpublic):
This scatterplot presents the repeated SBP measurements made on each individual (see marker labels) by the Microlife watchHomeBP oscillometric blood pressure device (that is, new method
In this example, the clinical tolerance limits have been set as follows:
That is, the tolerance limits are narrower for small values and larger for large values of the latent trait. Because

Scatterplot of the repeated SBP measurements made on each individual by the invasive arterial blood pressure and watchHomeBP devices

(left) Tolerance limits plot (right) and conditional probability of agreement plot
The conditional probability of agreement plot shows that the level of agreement is better for blood pressures in the normal range (<= 120 mmHg). The pointwise 95% confidence band is useful for assessing the level of agreement at a specific value of SBP. However, when one is interested in assessing the agreement level at several values or even across an entire interval of the latent trait, the simultaneous confidence band is required. Here we see that the simultaneous band is, as expected, wider than the pointwise to guarantee at least a 95% coverage rate for the simultaneous inference.
Based on simulated data, I have illustrated the use of the
Requiring repeated measurements by one of the two measurement methods might discourage the applied researcher from using this methodology. However, repeated measurements by at least one of the two measurement methods are necessary for mathematical identification. Indeed, when the variance of the measurement errors of each instrument is not constant (it often increases with the latent trait) and their ratio is unknown, which is usually the case in the biomedical field, having only one measurement by each of the two measurement methods does not allow one to identify all the parameters of the model (Dunn 2004). In this setting, the investigator may resort to the Bland-Altman LOA method, which does not require individual repeated measurements. However, as already shown (Taffé 2018, 2021), the LOA plot may be misleading in this setting, particularly when the ratio of the variances is not proportional to the proportional bias.
The marginal probability of agreement has simply been estimated by the proportion of observations between the two tolerance limits and provides an overall summary of the agreement level. One may notice that depending on how the tolerance limits have been defined, the conditional probability of agreement may be similar in shape to the percentage of agreement proposed by Taffé (2020). The former is based on predefined tolerance limits, whereas the latter depends on the width of the LOA and the amount of bias and does not require the investigator to set tolerance limits. Which one should be preferred depends on the information available a priori to the investigator for setting the limits. I have illustrated that the definition of the tolerance limits may have an important leverage effect regarding the level of the conditional agreement calculated, whereas the percentage of the agreement depends solely on the variability and bias found in the data. I recommend computing both measures of agreement and thoroughly inspecting the plots before deciding on the agreement.
Finally, note that this modeling strategy rests on the assumption that the individual true latent trait is constant within individuals, that is,
Declaration of conflicting interests
The author declares no potential conflicts of interest with respect to the research, authorship, or publication of this article.
Programs and supplemental material
To install the software files as they existed at the time of publication of this article, type
Supplemental Material
sj-dta-1-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits
Supplemental material, sj-dta-1-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal
Supplemental Material
sj-dta-2-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits
Supplemental material, sj-dta-2-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal
Supplemental Material
sj-dta-3-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits
Supplemental material, sj-dta-3-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal
Supplemental Material
sj-txt-1-stj-10.1177_1536867X251365501 - Supplemental material for ctl: A package for assessing agreement based on clinical tolerance limits
Supplemental material, sj-txt-1-stj-10.1177_1536867X251365501 for ctl: A package for assessing agreement based on clinical tolerance limits by Patrick Taffé in The Stata Journal
Footnotes
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
