Sage Journals: Discover world-class research

Abstract

The growth hormone-2000 biomarker method, based on the measurements of insulin-like growth factor-I and the amino-terminal pro-peptide of type III collagen, has been developed as a powerful technique for the detection of growth hormone misuse by athletes. Insulin-like growth factor-I and amino-terminal pro-peptide of type III collagen are combined in gender-specific formulas to create the growth hormone-2000 score, which is used to determine whether growth hormone has been administered. To comply with World Anti-Doping Agency regulations, each analyte must be measured by two methods. Insulin-like growth factor-I and amino-terminal pro-peptide of type III collagen can be measured by a number of approved methods, each leading to its own growth hormone-2000 score. Single decision limits for each growth hormone-2000 score have been introduced and developed by Bassett, Erotokritou-Mulligan, Holt, Böhning and their co-authors in a series of papers. These have been incorporated into the guidelines of the World Anti-Doping Agency. A joint decision limit was constructed based on the sample correlation between the two growth hormone-2000 scores generated from an available sample to increase the sensitivity of the biomarker method. This paper takes this idea further into a fully developed statistical approach. It constructs combined decision limits when two growth hormone-2000 scores from different assay combinations are used to decide whether an athlete has been misusing growth hormone. The combined decision limits are directly related to tolerance regions and constructed using a Bayesian approach. It is also shown to have highly satisfactory frequentist properties. The new approach meets the required false-positive rate with a pre-specified level of certainty.

Keywords

Bayesian tolerance regions growth hormone misuse detection growth hormone-2000 scores decision limits tolerance limits tolerance regions

1 Introduction

As a powerful anabolic agent of considerable therapeutic value, growth hormone (GH) is misused in sport to enhance performance.¹ To preserve the fairness of competition, its use is prohibited by the World Anti-Doping Agency (WADA).^2–5 Two methods for the detection of GH misuse are currently available and approved by WADA: the isoform test developed by Bidlingmaier et al.⁶ (see also WADA^2,3) and the GH-2000 biomarker test developed by the GH-2000 and GH-2004 projects.⁷ The latter method depends on the measurements of two GH sensitive biomarkers, the insulin-like growth factor-I (IGF-I) and the amino-terminal pro-peptide of type III collagen (P-III-NP), both of which rise in response to exogenous GH administration.^8,9 The measured concentrations of the two biomarkers are combined in sex-specific and age-adjusted discriminant functions^10,11,7,12 to allow the calculation of a score, the GH-2000 score. It is possible that the score may take a negative value.

The measurements of IGF-I and P-III-NP are carried out by choosing two specific assays. As the measured results differ slightly from one assay to another, each assay pair generates an assay-specific GH-2000 score, which differs from other GH-2000 scores generated by different assays. These are the basis of the data generating process. Currently, there are three IGF-I assays and two P-III-NP assays approved by WADA. The IGF-I assays are a mass spectrometry (MS)-based approach, Immunotech A15729 IGF-I IRMA (Immunotech SAS, Marseille, France), and Immunodiagnostic Systems iSYS IGF-I (Immunodiagnostics Systems Limited, Boldon, UK). The P-III-NP assays are UniQ $^{TM}$ P-III-NP RIA (Orion Diagnostica, Espoo, Finland) and Siemens ADVIA Centaur P-III-NP (Siemens Healthcare Laboratory Diagnostics, Camberley, UK). For more details and background on these assays, see Holt et al.⁷ As any GH-2000 score requires a pair of IGF-I assay and P-III-NP assay, there are six possible GH-2000 scores. Depending on the available technology, laboratories choose the appropriate GH-2000 scores for evaluating their samples, and a decision limit based on one single GH-2000 score has been developed in Holt et al.⁷ and Böhning et al.¹³ by assuming that a GH-2000 score from an athlete without GH misuse has a normal distribution.

As stated in Holt et al.,⁷ a result will be declared as an adverse analytical finding (i.e. indicative of doping) only if the confirmation procedure results in GH-2000 scores greater than the decision limits for two pairs of analytes. These decision limits are constructed on the basis of the univariate normal distributions of the associated GH-2000 scores. Erotokritou-Mulligan et al.¹¹ with further details in Holt et al.,⁷ construct combined decision limits on the basis of a bivariate normal distribution. The idea is motivated by the intuition that a reduced correlation between the two GH-2000 scores could lead to reduced decision limits, and thus increase the sensitivity of GH misuse detection. We describe details of this method in Section 3.3.

The purpose of this paper is to provide a valid construction method of combined decision limits when two GH-2000 scores, based on two different pairs of IGF-I and P-III-NP assays, are used jointly in assessing the compliance of a sample. It is shown here how the combined decision limits are directly related to a particular tolerance region, and can be constructed so that the false positive rate (FPR) is controlled at a pre-specified level $1 - β$ , say 1 in 10,000, with a pre-specified confidence or belief $1 - α$ , for example, $95 %$ , about the possible value of $(μ, Σ)$ , assuming the two GH-2000 scores have a bivariate normal distribution $N_{2} (μ, Σ)$ . This is in contrast to the previously mentioned method discussed in Erotokritou-Mulligan et al.,¹¹ where such a property is assumed to hold bona fide.

A Bayesian approach is adopted in this paper since a frequentist solution is much harder to construct and not available thus far (see Section 3.2 for more details). The frequentist property of the Bayesian combined decision limits is also assessed by simulation, which shows that the Bayesian combined decision limits can also be interpreted as frequentist combined decision limits.

The paper is organized as follows. Section 2 collects some known distributional results that will be used in Section 3. Section 3 considers the construction of decision limits. A very brief review of the construction of a single decision limit for one GH-2000 score is given in Section 3.1. Section 3.2 constructs Bayesian combined decision limits for two GH-2000 scores. The method is then illustrated with a real data set in Section 3.3. A simulation study is presented in Section 3.4 to assess the frequentist property of the Bayesian combined decision limits given in Section 3.3. Finally, the paper closes with a brief discussion in Section 4.

2 Preliminary distributional results

In this section, we collect some known distributional results, which are used in Section 3 for the construction of decision limits. More details about these results can be found in Guttman,¹⁴ Box and Tao¹⁵ and Anderson.¹⁶

Following Holt et al.⁷ and Böhning et al.,¹³ we assume that the GH-2000 scores $x = (x_{1}, \dots, x_{k})^{'}$ from an athlete without GH misuse have a $k$ -variate normal distribution $N_{k} (μ, Σ)$ , with both $μ$ and $Σ$ unknown. For the problem considered in this paper, we are only interested in the case of $k = 2$ since only two GH-2000 scores are involved.We assume further that we have observed a random sample from the population $N_{k} (μ, Σ)$ :

x_{1} = (\begin{matrix} x_{11} \\ ⋮ \\ x_{1 k} \end{matrix}), \dots, x_{n} = (\begin{matrix} x_{n 1} \\ ⋮ \\ x_{n k} \end{matrix}) \overset{i . i . d .}{\sim} N_{k} (μ, Σ)

Denote

X = (x_{1}, \dots, x_{n})

\bar{x} = \sum_{i = 1}^{n} x_{i} / n

, and

V = \sum_{i = 1}^{n} (x_{i} - \bar{x}) (x_{i} - \bar{x})^{'} / (n - 1)

In this paper, the non-informative reference prior distribution of $(μ, Σ^{- 1})$ given by

p (μ, Σ^{- 1}) \propto p (μ) P (Σ^{- 1}) \propto | Σ^{- 1} |^{- (k + 1) / 2}

(1)is used since the sample

X

is all we have. An additional incentive for using the non-informative reference prior is that the Bayesian decision limit for the special case of

k = 1

is also the frequentist decision limit; see Section 3.1 below.

The posterior distribution of $(μ, Σ^{- 1})$ based on the observed data $X$ is then given by

p (μ, Σ^{- 1} | X) \propto | Σ^{- 1} |^{(n - k - 1) / 2} \exp {- \frac{1}{2} t r Σ^{- 1} [(n - 1) V + n (μ - \bar{x}) (μ - \bar{x})^{'}]}

where

t r A

denotes the trace of matrix

A

. Integrating out

μ

gives the posterior distribution of

Σ^{- 1}

Σ^{- 1} | X \sim W_{k} ([(n - 1) V]^{- 1}, n - k)

(2)in the notation of Box and Tao,¹⁵ and the posterior conditional (on

Σ^{- 1}

) distribution of

μ

p (μ | Σ^{- 1}, X) = \frac{p (μ, Σ^{- 1} | X)}{p (Σ^{- 1} | X)} \sim N_{k} (\bar{x}, Σ / n)

(3)

3 Decision limits

In this section, we consider the construction of decision limits. In Sction 3.1, we provide a very brief review of the construction of decision limit for one single GH-2000 score, that is, for the case of $k = 1$ , which helps the understanding of Section 3.2. Section 3.2 studies the construction of combined decision limits based on two GH-2000 scores, that is, for the case of $k = 2$ . Section 3.3 illustrates the computation of the decision limits by using the dataset on GH-2000 scores given in Holt et al.⁷ Section 3.4 presents the results of simulation studies.

3.1 Decision limit for one GH2000 score

Let $a = a (X)$ denote the decision limit for one GH-2000 score. Hence a future sample observation $y$ is declared to be positive if and only if $y$ is larger than $a (X)$ . To control the FPR at the pre-specified level $1 - β$ , it is desirable to have

P_{y | μ, σ^{2}} {y > a (X)} \leq 1 - β

which is equivalent to

P_{y | μ, σ^{2}} {y \leq a (X)} \geq β

(4)under the assumption that

y

is from the population distribution

N (μ, σ^{2})

. Here the probabilities are calculated with respect to the distribution of

y

conditional on

(μ, σ^{2})

. Note that the probability in (4) depends on the value of

(μ, σ^{2})

, for which we have only the posterior distribution

p (μ, σ^{2} | X)

after observing the data

X

. Hence this probability cannot be guaranteed to be at least

β

for every possible value of

(μ, σ^{2}) \sim p (μ, σ^{2} | X)

and, instead, we guarantee with a pre-specified

1 - α

(close to one) belief (or confidence) with respect to possible values of

(μ, σ^{2})

that the probability in (4) is at least

β

, that is,

P_{μ, σ^{2} | X} {P_{y | μ, σ^{2}} {y \leq a (X)} \geq β} = 1 - α

(5)One recognizes this is the defining equation of that

(- \infty, a (X)]

is a Bayesian

1 - α

confidence and

β

content upper tolerance interval for the population

N (μ, σ^{2})

. Bayesian tolerance intervals were first introduced in Aitchison¹⁷ and Guttman^14,18 are excellent references on the topic.

Following Guttman¹⁴ $^{(\,pp. 140--141)}$ , the $a (X)$ that solves equation (5) under the non-informative prior for $(μ, Σ)$ given in (1) with $k = 1$ is given by

a (X) = \bar{x} + \frac{1}{\sqrt{n}} \sqrt{V} t_{n - 1, \sqrt{n} z_{β}, 1 - α}

(6)where

z_{β}

denotes the

β

quantile of the standard normal distribution

N (0, 1)

, and

t_{n - 1, \sqrt{n} z_{β}, 1 - α}

denotes the

1 - α

quantile of the non-central

t

distribution with degrees of freedom

n - 1

and non-centrality parameter

\sqrt{n} z_{β}

. This Bayesian

1 - α

confidence and

β

content upper tolerance interval

(- \infty, a (X)]

is also the frequentist

1 - α

confidence and

β

content upper tolerance interval (cf. Guttman¹⁴

^{(\,p. 141)}

). This is the additional incentive for using the non-informative reference prior in this paper.

3.2 Combined decision limits

In this subsection, we have $k = 2$ . Hence a future sample observation $y = (y_{1}, y_{2})^{'}$ is declared to be positive if and only if both $y_{1} > a_{1} (X)$ and $y_{2} > a_{2} (X)$ as stated in Holt et al.,⁷ that is,

y \in S (X) with S (X) = {y : y_{1} > a_{1} (X) and y_{2} > a_{2} (X)}

(7)To control the FPR at the pre-specified level

1 - β

, it is desirable that

P_{y | μ, Σ^{- 1}} {y \in S (X)} \leq 1 - β

which is equivalent to

P_{y | μ, Σ^{- 1}} {y \in \bar{S} (X)} \geq β

(8)under the assumption that

y

is from the population distribution

N_{2} (μ, Σ)

, where the probabilities are calculated with respect to the distribution of

y

conditional on

(μ, Σ^{- 1})

, and

\bar{S} (X)

denotes the complement of

S (X)

. As in the case of

k = 1

in Section 3.1, the probability in (8) depends on the value of

(μ, Σ^{- 1})

, for which we have only the posterior distribution

p (μ, Σ^{- 1} | X)

after observing the data

X

. Hence this probability cannot be guaranteed to be at least

β

for every possible value of

(μ, Σ^{- 1})

from the posterior distribution

p (μ, Σ^{- 1} | X)

, and we guarantee with a pre-specified

1 - α

belief (or confidence) about the possible values of

(μ, Σ^{- 1})

that the probability in (8) is at least

β

, that is,

P_{μ, Σ^{- 1} | X} {P_{y | μ, Σ^{- 1}} {y \in \bar{S} (X)} \geq β} = 1 - α

(9)One recognizes immediately that

\bar{S} (X)

is a Bayesian

1 - α

confidence and

β

content tolerance region for the population

N_{2} (μ, Σ)

. But this particular tolerance region has not been considered before to the best of our knowledge.

Under the frequentist framework, tolerance intervals/regions were introduced first by Wilks.¹⁹ Guttman,^14,18 Hahn and Meeker,²⁰ Krishnamoorthy and Mathew²¹ and Meeker et al.²² are excellent references on tolerance intervals/regions. The R package tolerance²³ allows the computation of many tolerance intervals/regions. Until very recently, the only available frequentist $β$ content and $1 - α$ confidence tolerance region specifically for multivariate normal distribution $N_{k} (μ, Σ)$ is of the ellipsoidal form

R (X) = {y : (y - \bar{x})^{'} V^{- 1} (y - \bar{x}) \leq c}

where

c

is the critical constant that needs to be determined so that

P_{\bar{x}, V} {P_{y | \bar{x}, V} {y \in R (X)} \geq β} = 1 - α

(10)where the probability

P_{y | \bar{x}, V} {\cdot}

is calculated with respect to the random variable

y

conditional on

(\bar{x}, V)

, and

P_{\bar{x}, V} {\cdot}

is calculated with respect to

(\bar{x}, V)

. The central ellipsoidal tolerance region of Dong and Mathew,²⁴ also of the form

R (X)

above but with a larger

c

, is conservative, that is the probability on the left side of the equation in (10) is strictly larger than

1 - α

One key factor in the choice of the $R (X)$ above is that the probability $P_{\bar{x}, V} {\cdot}$ in (10) does not depend on the unknown parameters $(μ, Σ)$ . But even in this case the computation of $c$ is very challenging and only approximation methods are available; see, for example, Krishnamoorthy and Mathew,²⁵ Krishnamoorthy and Mondal,²⁶ and Mbodj and Mathew.²⁷ If $R (X)$ is replaced by $\bar{S} (X)$ then the corresponding probability expression depends on the unknown $Σ$ in a complicated manner and so the computation of tolerance regions of forms different from $R (X)$ is much harder.

But most recently, rectangular (including one-sided or mixed-sided) tolerance regions of $β$ content and $1 - α$ confidence, specifically for multivariate normal distribution, have been constructed in Lucagbo²⁸ (Section 4.7) by using a parametric bootstrap. These tolerance regions are of different forms from the tolerance region $\bar{S} (X)$ in (9) considered in this paper.

For nonparametric rectangular (including one-sided or mixed-sided) tolerance regions, the reader is referred to Young and Mathew²⁹ and Lucagbo²⁸ (Sections 5.6 and 5.7) for the latest development.

In the Bayesian framework, there is no published work on $1 - α$ confidence and $β$ content tolerance region of the form $R (X)$ for $N_{k} (μ, Σ)$ even with $k = 2$ . The reader is referred to Chen³⁰ (Chapter 3) for the latest development on the construction of nonparametric Bayesian tolerance regions.

To determine $(a_{1} (X), a_{2} (X))$ of $S (X)$ in (7) from the only constraint in (9), we set

a_{1} (X) = a_{1} (λ, X) = {\bar{x}}_{1} + λ \sqrt{V_{11}}, a_{2} (X) = a_{2} (λ, X) = {\bar{x}}_{2} + λ \sqrt{V_{22}}

(11)where

{\bar{x}}_{i}

is the

i

th element of

\bar{x}

V_{i i}

is the

i

th diagonal element of

V

i = 1, 2

, and

λ

is the critical constant that needs to be determined from (9). These two expressions of

a_{i} (X)

are sensible if one compares them with the decision limit

a (X)

in (6) for the case of one GH-2000 score. Hence

S (X) = S (λ, X)

and

λ

is solved from

P_{μ, Σ^{- 1} | X} {P_{y | μ, Σ^{- 1}} {y \in S (X)} \leq 1 - β} = 1 - α

(12)which is equivalent to (9).

Algorithm 3.2.

For computing $λ$ by simulation for given $X$

Step 1: simulate one $(μ, Σ^{- 1})$ from the posterior distribution $p (μ, Σ^{- 1} | X)$ .

Step 2: given the simulated $(μ, Σ^{- 1})$ in Step 1, solve $λ$ from $P_{y | μ, Σ^{- 1}} {y \in S (X)} = 1 - β$ .

Step 3: repeat Steps 1 and 2 for a large number of $L$ times, $L = 100, 000$ say, to get the corresponding $λ_{1}, \dots, λ_{L}$ ; order these values as $λ_{[1]} \leq \dots \leq λ_{[L]}$ and use $λ_{[⟨ (1 - α) L ⟩]}$ as the $λ$ we want. Here $⟨ (1 - α) L ⟩$ denotes the integer part of $(1 - α) L$ .

We use a simulation method, given by Algorithm 3.2, to compute the $λ$ from (12). It is well known that the $(1 - α)$ sample quantile $λ_{[⟨ (1 - α) L ⟩]}$ in Algorithm 3.2 converges almost surely to the $(1 - α)$ population quantile $λ$ that solves (9) as $L \to \infty$ . Hence $λ_{[⟨ (1 - α) L ⟩]}$ can be regarded as accurate so long as the number of simulations $L$ is large enough. The computation results given in (the penultimate paragraph of) Section 3.3 below show that $L = 100, 000$ is sufficiently large for the problem considered in this paper.

Now Step 1 can be implemented by using the distributional results in (2) and (3) in the following way. We first simulate one $Σ^{- 1}$ from $W_{2} ([(n - 1) V]^{- 1}, n - 2)$ and then one $μ$ from $N_{2} (\bar{x}, Σ / n)$ to generate one $(μ, Σ^{- 1})$ . To simulate one $Σ^{- 1}$ from $W_{2} ([(n - 1) V]^{- 1}, n - 2)$ , we use the Bartlett decomposition (see Smith and Hocking³¹ and the references therein) in the following way. Step (a): generate independent random variables $u_{11} \sim χ_{n - 1}^{2}$ , $u_{22} \sim χ_{n - 2}^{2}$ and $u_{12} \sim N (0, 1)$ to form matrix $U = (\begin{matrix} \sqrt{u_{11}}, u_{12} \\ 0, \sqrt{u_{22}} \end{matrix})$ . Step (b): set $Σ^{- 1} = [(n - 1) V]^{- 1 / 2} U^{'} U [(n - 1) V]^{- 1 / 2},$ which has the required Wishart distribution. This simulation method for $Σ^{- 1}$ works for a general $k \geq 2$ . Alternatively one can directly use, for example, the R package rWishart.

For Step 2, we have from (7) and (11) that

\begin{aligned} P_{y | μ, Σ^{- 1}} {y \in S (X)} \\ = P_{y | μ, Σ^{- 1}} {y_{1} \geq {\bar{x}}_{1} + λ \sqrt{V_{11}}, y_{2} \geq {\bar{x}}_{2} + λ \sqrt{V_{22}}} \\ = P_{y | μ, Σ^{- 1}} {Z_{1} \leq - \frac{{\bar{x}}_{1} - μ_{1} + λ \sqrt{V_{11}}}{\sqrt{σ_{11}}}, Z_{2} \leq - \frac{{\bar{x}}_{2} - μ_{2} + λ \sqrt{V_{22}}}{\sqrt{σ_{22}}}} \end{aligned}

(13)where

Z_{1} = - (y_{1} - μ_{1}) / \sqrt{σ_{11}}

and

Z_{2} = - (y_{2} - μ_{2}) / \sqrt{σ_{22}}

have distribution

N_{2} (0, (\begin{matrix} 1, ρ \\ ρ, 1 \end{matrix}))

, with

Σ = (σ_{i j})

and

ρ = σ_{12} / \sqrt{σ_{11} σ_{22}}

. The probability in (13) can be computed directly by using the function pmvnorm of the R package mvtnorm; see Genz and Bretz³² and Genz et al.³³ for more details. Furthermore, note that this probability is monotone decreasing in

λ

. Hence the unique solution

λ

P_{y | μ, Σ^{- 1}} {y \in S (X)} = 1 - β

can be easily computed by using a numerical searching algorithm, for example, the bisection method is used in our coding. From our experience, the computation of one

λ

in Step 2 takes only a small fraction of a second on an ordinary personal computer (PC); see more details in the next subsection.

If one uses, in Step 2, an inner loop of simulation to compute an approximation to $λ$ , similar to an idea used in, for example, Krishnamoorthy and Mathew²⁵ to construct the frequentist ellipsoidal tolerance region $R (X)$ , then the computation is much more time-consuming and the resultant $λ$ is much less accurate. Hence, this is not recommended for computing $λ$ .

3.3 Applications to the dataset on GH-2000 scores

In this subsection, we compute the decision limits given in the last two subsections using the available sample observations on GH-2000 from Holt et al.⁷ For the purpose of illustration, we focus on the following two GH-2000 scores: (1) Siemens IDS generated by using the P-III-NP assay Siemens ADVIA Centaur and IGF-I assay Immunodiagnostic Systems iSYS IGF-I and (2) Orion liquid chromatography-tandem mass spectrometry (LC-MS/MS) generated by P-III-NP assay UniQ $^{TM}$ P-III-NP RIA and IGF-I assay LC-MS/MS. These two GH-2000 scores are available for a sample of $n = 917$ female athletes and plotted by the 917 dots in Figure 1. There were 932 female athletes in the sample originally. But some had Siemens IDS readings missing, and some had Orion LC-MS/MS readings missing. Hence, only the $n = 917$ female athletes having both readings available are used in the analysis below.

Figure 1.

Plots of the single and combined decision limits based on the observed data. Single decision limits are given by the dotted lines. Combined decision limits are given by the upper-right quadrant $S$ bounded by two solid lines.

We set FPR $1 - β = 1 / 10, 000$ and confidence level $1 - α = 95 %$ , which are currently adopted by WADA. If one wants to use the GH-2000 score Siemens IDS ( $y_{1}$ ) only to decide whether a future female athlete with reading $y = (y_{1}, y_{2})^{'}$ is positive on GH misuse, then the single decision limit is computed from the formula in (6) and given by $a (X) = 9.3445$ . It is depicted by the vertical dotted line in Figure 1. Hence, a future female athlete is judged to be positive if and only if $y_{1} > 9.3445$ . If one wants to use the GH-2000 score Orion LC-MS/MS ( $y_{2}$ ) only to decide whether a future female athlete is positive, then the single decision limit is computed again from the formula in (6) and given by $a (X) = 8.5703$ . It is depicted by the horizontal dotted line in Figure 1. Hence, a future female athlete is judged to be positive if and only if $y_{2} > 8.5703$ .

On the other hand, if one wants to use both the GH-2000 scores Siemens IDS and Orion LC-MS/MS to decide whether a future female athlete with reading $y = (y_{1}, y_{2})^{'}$ is positive, then the combined decision limits in (11) are used and computed by using Algorithm 3.2. By using $L = 100, 000$ simulations, $λ$ is calculated to be 3.5578, which gives $a_{1} (X) = 8.9881$ and $a_{2} (X) = 8.1952$ . These two decision limits are depicted by the two solid lines, respectively, in Figure 1, and the set $S (X)$ is given by the upper-right quadrant formed by these two solid lines and indicated by the letter $S$ in the figure. Hence, a future female athlete is judged to be positive if and only if both $y_{1} > 8.9881$ and $y_{2} > 8.1952$ .

The decision limits constructed according to the property in (9) or (12) have the following interpretation. With $1 - α$ belief or confidence about the possible value of $(μ, Σ)$ that, the FPR is no more than 1 in 10,000 that a future athlete, whose GH-2000 reading $y$ follows the distribution $N_{2} (μ, Σ)$ , is wrongly judged to be positive.

The computation of $λ$ based on $L = 100, 000$ simulations takes about 210 s on an ordinary Window’s PC (Intel(R) Core(TM) i5-6600 CPU 3.30GHz, RAM 8.0 GB). We have tried five different random seeds for the random number generator, which give the corresponding $λ$ -values: 3.5572, 3.5574, 3.5567, 3.5594, and 3.5579. This indicates that the $λ$ value computed using $L = 100, 000$ is likely to be accurate to the second decimal place at least. Indeed, one computation we have done using $L = 1, 000, 000$ simulations produces $λ = 3.5572$ and takes about 2210 s (37 min). Hence, the computation method proposed is fast and accurate enough for practical purposes with $L = 100, 000$ .

It is valuable to compare the proposed combined decision limits with the combined decision limits of Erotokritou-Mulligan et al.¹¹ and Holt,⁷ which is mentioned in Section 1. They are given by ${\tilde{a}}_{1} (X) = {\bar{x}}_{1} + \tilde{λ} \sqrt{V_{11}}$ and ${\tilde{a}}_{2} (X) = {\bar{x}}_{2} + \tilde{λ} \sqrt{V_{22}}$ , and so of similar form as the new combined decision limits given in (11). However, the critical constant is given by $\tilde{λ} = \tilde{k} + z_{1 - α} \sqrt{(1 + {\tilde{k}}^{2} / 2) / n}$ with $\tilde{k}$ being solved from $P {W_{1} > \tilde{k}, W_{2} > \tilde{k}} = 1 - β$ , where $(W_{1}, W_{2})^{'}$ has distribution $N_{2} ((\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} 1, \tilde{ρ} \\ \tilde{ρ}, 1 \end{matrix}))$ with $\tilde{ρ}$ being the usual sample correlation coefficient between the two GH-2000 scores $x_{1}$ and $x_{2}$ based on the sample $X$ . For the case considered in this subsection, it is computed that $\tilde{ρ} = V_{12} / \sqrt{V_{11} V_{22}} = 0.852$ , $\tilde{k} = 3.4049$ , $\tilde{λ} = 3.5465$ , ${\tilde{a}}_{1} (X) = 8.9755$ , and ${\tilde{a}}_{2} (X) = 8.1820$ .

While ${\tilde{a}}_{1} (X) = 8.9755$ and ${\tilde{a}}_{2} (X) = 8.1820$ are quite close to $a_{1} (X) = 8.9881$ and $a_{2} (X) = 8.1952$ , respectively, for the specific sample observed, the construction of ${\tilde{a}}_{1} (X)$ and ${\tilde{a}}_{2} (X)$ does not guarantee that ${y : y_{1} \leq {\tilde{a}}_{1} (X) or y_{2} \leq {\tilde{a}}_{2} (X)}$ forms a $1 - α$ confidence and $β$ content tolerance region for the population $N_{2} (μ, Σ)$ . Indeed a simulation study in the next subsection shows that the true probability of ${y : y_{1} \leq {\tilde{a}}_{1} (X) or y_{2} \leq {\tilde{a}}_{2} (X)}$ covering $β$ content of the population $N_{2} (μ, Σ)$ could deviate from the nominal level $1 - α$ in both directions.

3.4 Simulation studies

In this subsection, a simulation study is carried out to assess whether the Bayesian $1 - α$ confidence and $β$ content tolerance region $\bar{S} (X)$ in Section 3.3 is also a frequentist $1 - α$ confidence and $β$ content tolerance region for $N_{2} (μ, Σ)$ . Specifically, we assess whether

P_{\bar{x}, V} {P_{y | \bar{x}, V} {y \in \bar{S} (X)} \geq β} \geq 1 - α

(14)holds for all possible values of

μ

and

Σ

; here the probability

P_{y | \bar{x}, V} {\cdot}

is calculated with respect to

y \sim N_{2} (μ, Σ)

conditional on the sample mean and covariance matrix

(\bar{x}, V)

, and

P_{\bar{x}, V} {\cdot}

is calculated with respect to

(\bar{x}, V),

which depends on the random sample

X

from

N_{2} (μ, Σ)

It can be shown that $P_{\bar{x}, V} {P_{y | \bar{x}, V} {y \in \bar{S} (X)} \geq β}$ does not depend on $μ$ . Hence, it is only necessary to assess whether (14) holds for all possible values of $Σ = (\begin{matrix} σ_{11}, ρ \sqrt{σ_{11} σ_{22}} \\ ρ \sqrt{σ_{11} σ_{22}}, σ_{22} \end{matrix})$ with $μ = 0$ . From the observed sample on GH-2000 scores in Section 3.3, the 99% confidence intervals for $ρ$ , $σ_{11}$ and $σ_{22}$ are given, respectively, by $(0.827, 0.874)$ , $(1.097, 1.396)$ and $(1.215, 1.546)$ . So the following three values $(0.83, 0.85, 0.87)$ are used for $ρ$ , $(1.10, 1.25, 1.40)$ for $σ_{11}$ , and $(1.22, 1.38, 1.55)$ for $σ_{22}$ in the simulation study, with a total of 27 combinations of $Σ$ . The range of these 27 combinations cover the likely true value of $Σ$ . Furthermore, FPR $1 - β = 1 / 10, 000$ , confidence level $1 - α = 95 %$ and sample size $n = 917$ are used as in Section 3.3.

Algorithm 3.4.

For computing the (outer) probability in (14) by simulation

Step 1: simulate one sample $X = (x_{1}, \dots, x_{n})$ from $N_{2} (0, Σ)$ .

Step 2: use the sample $X$ to compute the region $\bar{S} (X)$ by using Algorithm 3.2; the number of simulations used to compute $λ$ is $L = 100, 000$ .

Step 3: compute $P_{y | X, V} {y \in \bar{S} (X)} = 1 - P_{y | X, V} {y \in S (X)}$ by using an expression similar to (13) with $μ = 0$ for $P_{y | X, V} {y \in S (X)}$ , and the R function pmvnorm.

Step 4: repeat Steps 1 to 3 for a large number, say $M = 1000$ , times; the proportion of times that $P_{y | X, V} {y \in \bar{S} (X)} \geq β$ is used as the required probability.

The (outer) probability $P_{\bar{x}, V} {\cdot}$ in (14) is approximated by a proportion using Algorithm 3.4. It takes about 60 h to compute each probability in Table 1 on the same computer as mentioned in Section 3.3, with most computation time spent on computing the $λ$ -values in the $M = 1000$ repetitions.

Table 1.

The probability in (14) for given $Σ$ .

		$σ_{11} = 1.10$	$σ_{11} = 1.25$	$σ_{11} = 1.40$
$ρ = 0.83$	$σ_{22} = 1.22$	0.951	0.951	0.951
	$σ_{22} = 1.38$	0.951	0.951	0.951
	$σ_{22} = 1.55$	0.951	0.951	0.951
$ρ = 0.85$	$σ_{22} = 1.22$	0.951	0.951	0.951
	$σ_{22} = 1.38$	0.951	0.951	0.951
	$σ_{22} = 1.55$	0.951	0.951	0.951
$ρ = 0.87$	$σ_{22} = 1.22$	0.953	0.953	0.953
	$σ_{22} = 1.38$	0.953	0.953	0.953
	$σ_{22} = 1.55$	0.953	0.953	0.953

Table 1 presents the simulation results on the probability in (14). It is clear from Table 1 that the probabilities are very close to $1 - α = 0.95$ for all the 27 configurations of $ρ, σ_{11}$ and $σ_{22}$ . Since the range of these 27 configurations most likely covers the true value of $Σ$ , it follows therefore that it is most likely the inequality in (14) holds for the unknown true value of $Σ$ . Hence the Bayesian combined decision limits can also be interpreted as the frequentist combined decision limits of approximate $1 - α$ confidence.

The results in Table 1 seem to indicate that the probability in (14) depends on $Σ$ , that is, $σ_{11}, σ_{22}$ and $ρ$ , only through $ρ$ . But this is difficult to prove analytically since the critical constant $λ$ of $S (X)$ depends on $Σ$ in a complicated manner in Step 1 of Algorithm 3.2. We have done further simulation study on the probability in (14) with $σ_{11} = σ_{22} = 1$ and various values of $ρ$ in the wider range $[- 0.9, 0.9]$ . The results are given by Prob (new) in Table 2.

Table 2.

The probability in (14) for given $Σ$ with $σ_{11} = σ_{22} = 1$ .

$ρ =$	$- 0.9$	$- 0.8$	$- 0.7$	$- 0.6$	$- 0.5$	$- 0.4$	$- 0.3$	$- 0.2$	$- 0.1$	0
$Prob (new) =$	0.949	0.950	0.950	0.950	0.949	0.950	0.950	0.951	0.950	0.950
$Prob (old) =$	0.998	0.995	0.986	0.975	0.963	0.956	0.953	0.952	0.949	0.948
$ρ =$	0.9	0.8	0.7	0.6	0.5	0.4	0.3	0.2	0.1
$Prob (new) =$	0.955	0.950	0.950	0.953	0.958	0.961	0.954	0.950	0.951
$Prob (old) =$	0.945	0.940	0.939	0.941	0.948	0.950	0.950	0.943	0.946

The results on Prob (new) in Table 2 indicate that, if the probability in (14) does depend on $Σ$ only through $ρ$ , then this probability seems to be quite close to $1 - α = 0.95$ across the wide range $[- 0.9, 0.9]$ of $ρ$ -values.

Finally, we have carried out a simulation study to assess the probability corresponding to the probability in (14) but for the combined decision limits ${\tilde{a}}_{1} (X)$ and ${\tilde{a}}_{2} (X)$ of Erotokritou-Mulligan et al.¹¹ and Holt.⁷ It is clear that this probability depends on $Σ$ only through $ρ$ , and the simulation results on this probability are given by Prob (old) in Table 2. As pointed out in the last subsection, its construction does not guarantee that ${y : y_{1} \leq {\tilde{a}}_{1} (X) or y_{2} \leq {\tilde{a}}_{2} (X)}$ is a $1 - α$ confidence and $β$ content tolerance region for the population $N_{2} (μ, Σ)$ . From the results in Table 2, it can be seen that the probability of ${y : y_{1} \leq {\tilde{a}}_{1} (X) or y_{2} \leq {\tilde{a}}_{2} (X)}$ covering $β = 0.9999$ content of the population $N_{2} (μ, Σ)$ tends to be a bit smaller than the nominal level $1 - α = 0.95$ when the true value of $ρ$ is around 0.7. Strong deviations from the nominal level occur for $ρ$ smaller than $- 0.5$ . In practice, negative correlations between two GH-2000 scores are unlikely to occur. But in other applications where the two scores have a large negative correlation, the method of Erotokritou-Mulligan et al.¹¹ and Holt⁷ will produce ${\tilde{a}}_{1} (X)$ and ${\tilde{a}}_{2} (X)$ that are larger than necessary. In contrast, with the new combined decision limits, the probabilities Prob(new) are consistently close to the nominal level across the range of $ρ$ values.

4 Discussion

Decision limits based on the GH-2000 scores produced by the various pairs of analytical assays employed have been published. These scores are used individually but scores for two pairs of assays must be exceeded before an athlete has to answer a case for the misuse of GH. In other words, WADA mandated the measurement of each analyte by two methods which meant that each sample had two GH-2000 scores. Hence, it is natural to use the correlation structure involved in the two scores to develop naturally decreased decision limits which would increase the sensitivity of the biomarker method. The biomarker test then would be more sensitive the lower the correlation between the two GH-2000 scores under consideration would be.

While combined decision limits have their benefits there are some drawbacks. First, depending on which of the other pair of assays was used, the decision limit for one assay pair could change and that could lead to confusion. Ideally, for a given GH-2000 score one would like to have a unique decision limit and not one that depends on which other GH-2000 score is used in the pair. Secondly, it became possible to measure IGF-I by mass spectrometry as the preferred choice to measure IGF-I. WADA does not mandate measurement by a second assay when an analyte is measured by mass spectrometry because of the greater reliability and traceability of the method compared with immunoassays. Hence, using the same assay for IGF-I in two GH-2000 scores leads to an increase in the correlation and the potential for an increased sensitivity of the biomarker test diminishes. On the other hand, it might be that in the near future two mass spectrometric methods for IGF-I (intact and digest) will be available with a potential of a decrease in the correlation of two GH-2000 scores involved in the pair.

Having said that, it is valuable nevertheless to have a statistical theory for constructing combined decision limits. These combined decision limits should have the same pre-specified $1 - α$ confidence and $1 - β$ FPR as the single decision limit. A Bayesian approach is used in this paper to construct the combined decision limits. Our simulation study in Section 3.4 shows that the Bayesian combined decision limits also have satisfactory frequentist properties and so can be regarded as frequentist combined decision limits too. The R code available from the authors allows the method ready to be used.

Combined decision limits of other forms are worth investigating too in future. For example, it seems also sensible to use combined decision limits of the form $T (X) = {y : y_{1} > a_{1} (X) or y_{2} > a_{2} (X)}$ with $a_{1} (X)$ and $a_{2} (X)$ of the forms in (11). That is, a future athlete is judged to be positive if either of the two readings is too high. The corresponding $\bar{T} (X)$ becomes a one-sided rectangular tolerance region for a bivariate normal distribution considered recently in Lucagbo²⁸ (Section 4.7). It would be interesting to compare the tolerance region $\bar{T} (X)$ constructed using the Bayesian method as in this paper with the tolerance region $\bar{T} (X)$ constructed using parametric bootstrap of Lucagbo.²⁸

The computation method of Section 3.2 can potentially be explored in the construction of a frequentist tolerance region of the ellipsoidal form $R (X)$ in (10) for $k = 2$ at least, which is probably the most useful case in applications of tolerance regions. Furthermore, the construction of the Bayesian tolerance region of the ellipsoidal form $R (X)$ can also be investigated, even though it is not of direct interest to GH misuse detection.

While the GH misuse detection motivates this work, one can envisage other potential applications of the methodology developed in this paper. For example, suitable decision limits can be constructed to trigger an alert on whether a child of a given age is over/underweight or over/underheight.

For nonparametric tolerance regions, the reader is referred to Young and Mathew,²⁹ Lucagbo²⁸ (Chapter 5) and Chen³⁰ (Chapter 3) for the latest development.

Footnotes

Acknowledgements

We thank the referees for constructive comments.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Dankmar Böhning

Yang Han

References

Holt

RIG

. Is human growth hormone an erotogenic aid? Drug Test Anal 2009; 9: 412–418.

World Antidoping Agency. World anti-doping program guidelines for hGH isoform differential immunoassays for anti-doping analyses, https://www.wada-ama.org/sites/default/files/resources/files/WADA-Guidelines-for-hGH-Differential-Immunoassays-v2.1-2014-EN.pdf (2014, accessed 25 October 2017).

World Antidoting Agency. The world anti-doping code international standard: Prohibited list 2016, https://www.wada-ama.org/sites/default/files/resources/files/wada-2016-prohibited-list-en.pdf (2016, accessed 25 October 2017).

World Antidoting Agency. Human growth hormone (hGH) isoform differential immunoassays for doping control analyses, https://www.wada-ama.org/sites/default/files/2022-01/td2021gh_final_eng_0.pdf (2021, accessed 4 May 2021).

World Antidoting Agency. Laboratory guidelines - human growth hormone (hGH) biomarkers test, https://www.wada-ama.org/en/resources/laboratory-guidelines-human-growth-hormone-hgh-biomarkers-test (2021, accessed 4 May 2021).

Bidlingmaier

Strasburger

. Test method: GH. Baillieres Best Pract Res Clin Endocrinol Metab 2000; 14: 99–109.

Holt

RIG

Banning

Gurkha

, et al. The development of decision limits for the GH-2000 detection methodology using additional insulin-like growth factor-I and amino-terminal pro-peptide of type III collagen assays. Drug Test Anal 2015; 7: 745–755.

Longboard

Keay

Ehrnborg

, et al. Growth hormone (GH) effects on bone and collagen turnover in healthy adults and its potential as a marker of GH abuse in sports: a double blind, placebo-controlled study. The GH-2000 study group. J Clun Endocrinol Metab 2000; 85: 1505–1512.

Dall

Longboard

Shernborne

, et al. The effect of four weeks of supra physiological growth hormone administration on the insulin-like growth factor axis in women and men. GH-2000 study group. J Clin Endocrinol Metab 2000; 85: 4193–4200.

10.

Cowrie

Bassett

Rosen

, et al. Detection of growth hormone abuse in sport. Growth Horn IGF Res 2007; 17: 220–226.

11.

Erotogenous-Mulligan

Guha

Stow

, et al. The development of decision limits for the implementation of the GH-2000 detection methodology using current commercial insulin-like growth factor-I and amino-terminal pro-peptide of type III collagen assays. Growth Horm IGF Res 2012; 22: 53–58.

12.

Böhning

Guha

, et al. Statistical methodology for age-adjustment of the GH-2000 score detecting growth hormone misuse. BMC Med Res Methodol 2016; 16: 147.

13.

Böhning

Liu

Holt

RIG

, et al. Exact statistical calculation of the uncertainty term in the decision limits based on the GH2000 score for growth hormone misuse detection (doping). Stat Methods Med Res 2019; 28: 928–936.

14.

Gutman

. Statistical tolerance regions: Classical and Bayesian. London: Griffin, 1970.

15.

Box

GEP

Tao

. Bayesian inference in statistical analysis. New York: Wiley, 1992.

16.

Anderson

. An introduction to multivariate statistical analysis. 3rd ed. New York: Wiley, 2003.

17.

Aitchison

. Bayesian tolerance regions. J R Stat Soc Ser B 1964; 26: 161–175.

18.

Gutman

. Tolerance regions. In: Kotz

(eds) Encyclopedia of statistical sciences, 2nd ed. New York: Wiley, 2006, pp. 8644–8659.

19.

Wilks

. Determination of sample sizes for setting tolerance limits. Ann Math Stat 1941; 12: 91–96.

20.

Hahn

Meeker

. Statistical intervals: A guide to practitioners. New York: Wiley, 1991.

21.

Krishnamurthy

Mathew

. Statistical tolerance regions – theory, applications, and computation. New York: Wiley, 2009.

22.

Meeker

Hahn

Rescobie

. Statistical intervals: A guide for practitioners and researchers. 2nd ed. New York: Wiley, 2017.

23.

Young

. tolerance: An R package for estimating tolerance intervals. J Stat Softw 2010; 36: 1–39.

24.

Dong

Mathew

. Central tolerance regions and reference regions for multivariate normal population. J Multivar Anal 2015; 134: 50–60.

25.

Krishnamurthy

Mathew

. Comparison of approximation methods for computing tolerance factors for a multivariate normal population. Technometrics 1999; 41: 234–249.

26.

Krishnamurthy

Mondal

. Improved tolerance factors for multivariate normal distributions. Commun Stat Simul Comput 2006; 35: 461–478.

27.

Mbodj

Mathew

. Approximate ellipsoidal tolerance regions for multivariate normal populations. Stat Prob Lett 2015; 97: 41–45.

28.

Lucagbo

. Rectangular statistical regions with applications in laboratory medicine and calibration. PhD Thesis, University of Maryland, Baltimore County, USA, 2021.

29.

Young

Mathew

. Nonparametric hyperrectangular tolerance and prediction regions for setting multivariate reference regions in laboratory medicine. Stat Methods Med Res 2020; 29: 3569–3585.

30.

Chen

. Prediction sets via parametric and non-parametric Bayes: With applications in pharmaceutical industry. PhD Thesis, Leiden University, The Netherlands, 2021.

31.

Smith

Hocking

. Algorithm AS 53: Wishart variate generator. J Roy Stats Scio Series C 1972; 21: 341–345.

32.

Genz

Bretz

. Computation of multivariate normal and t probabilities (Lecture Notes in Statistics). New York: Springer, 2009.

33.

Genz

Bretz

Miwa

, et al. normative: Multivariate normal and t distributions. R package version 1.1-1, https://cran.r-project.org/web/packages/mvtnorm/mvtnorm.pdf (2021).

Combined statistical decision limits based on two GH-2000 scores for the detection of growth hormone misuse

Abstract

Keywords

1 Introduction

2 Preliminary distributional results

3.1 Decision limit for one GH2000 score

3.2 Combined decision limits

3.3 Applications to the dataset on GH-2000 scores

3.4 Simulation studies

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iDs

References