Sage Journals: Discover world-class research

Abstract

Receiver operating characteristic analysis is one of the most popular approaches for evaluating and comparing the accuracy of medical diagnostic tests. Although various methodologies have been developed for estimating receiver operating characteristic curves and their associated summary indices, there is no consensus on a single framework that can provide consistent statistical inference while handling the complexities associated with medical data. Such complexities might include non-normal data, covariates that influence the diagnostic potential of a test, ordinal biomarkers or censored data due to instrument detection limits. We propose a regression model for the transformed test results which exploits the invariance of receiver operating characteristic curves to monotonic transformations and accommodates these features. Simulation studies show that the estimates based on transformation models are unbiased and yield coverage at nominal levels. The methodology is applied to a cross-sectional study of metabolic syndrome where we investigate the covariate-specific performance of weight-to-height ratio as a non-invasive diagnostic test. Software implementations for all the methods described in the article are provided in the tram add-on package to the R system for statistical computing and graphics.

Keywords

Transformation model receiver operating characteristic curve area under the receiver operating characteristic curve diagnostic test distribution regression ordinal outcome censoring Youden index overlapping coefficient limit of detection

1. Introduction

Estimating receiver operating characteristic (ROC) curves for evaluating the performance of medical diagnostic tests has been a main focus of statistical literature over the last decades.^1,2 Diagnostic tests screen for the presence or absence of a disease. Characterizing their accuracy is essential to ensure appropriate prevention, treatment and monitoring of diseases. ROC curves are a valuable tool in determining the diagnostic potential of a test and continue to be extensively applied in biomedical studies as new tests or biomarkers are developed in radiology, oncology, genetics, and other related fields. Increasingly more applications can be expected due to advancements in technology and analyzing the resulting data requires a computationally straightforward approach to provide accurate and consistent statistical inference.

Previous research has focused on extending statistical methodology for ROC curve estimation to address issues such as adjustment for covariates,^3,4 incorporating censoring due to instrument detection limits^5,6 and robustness to model misspecification.⁷ In addition, a wide variety of parametric and nonparametric methods have been proposed within frequentist and Bayesian paradigms (see Inácio et al.⁸ for a recent review). However, there is no consensus on an analytic approach that can handle all these issues simultaneously.

An attractive feature of the ROC curve, which has scantly been used for its estimation, is that it remains invariant to monotonic transformations of the test results. Although transformations have been used to bring continuous test results into a form that approximately satisfies the assumptions of a suitable parametric model,⁴ estimation of a transformation function has been limited to the Box-Cox power transformation family.^9,10 For rank-based methods, the transformation function can be left unspecified, but in all cases, a restriction to normality has been previously imposed on the model for the ROC curve.¹¹

In this article, we present a new unifying methodological framework for estimating ROC curves and its associated summary indices by modeling the relationship between the transformed test results and potential covariates. We employ transformation models to jointly estimate the transformation function and regression parameters.^12,13 This approach specifies a parametric model for the ROC curve but remains distribution-free because we do not impose any strong assumptions about the transformation function. Using the estimated parameters, we show how to evaluate covariate effects on the discriminatory performance of diagnostic tests. Unlike nonparametric methods which are flexible but difficult to interpret and implement, transformation models excel on both fronts. R implementations of all methods discussed in this article are available, along with a set of supporting examples.

1.1. Notation and preliminaries

Let the random variable $Y$ denote the continuous result of a diagnostic test and let $D$ denote the disease status, with $D = 1$ if a subject is diseased and 0 if nondiseased. We denote quantities conditional on the disease status using subscripts. For example, $Y_{1}$ and $Y_{0}$ are the test results in the diseased and nondiseased populations with cumulative distribution functions (CDFs) given by $F_{1}$ and $F_{0}$ and densities $f_{1}$ and $f_{0}$ , respectively. Suppose that the subject is diagnosed as diseased when their test result exceeds a threshold value, $c$ . By convention, we assume that larger values of the test result are more indicative of the disease. The probability of truly identifying a diseased and nondiseased subject is defined as sensitivity, $P (Y_{1} > c) = 1 - F_{1} (c)$ , and specificity, $P (Y_{0} \leq c) = F_{0} (c)$ , respectively. The set of pairs $(1 - specificity, sensitivity)$ for all $c \in R$ produce the ROC curve. By setting $p = 1 - F_{0} (c)$ , an equivalent representation of the ROC curve is

ROC (p) = 1 - F_{1} (F_{0}^{- 1} (1 - p))

Summary indices of the ROC curve quantify the degree of separation between the distributions

Y_{1}

and

Y_{0}

. The most widely used index is the area under the ROC curve (AUC) defined by

AUC = P (Y_{1} > Y_{0}) = \int_{0}^{1} ROC (p) d p

The AUC represents the probability that the test results of a randomly selected diseased subject exceed the one of a nondiseased subject and is directly related to the Mann–Whitney–Wilcoxon U-statistic (MWW).¹⁴ Alternative indices include the Youden index,¹⁵

J

, which combines sensitivity and specificity over all possible thresholds to provide the maximum potential effectiveness of a diagnostic test, given by

J = max_{c \in R} [F_{0} (c) - F_{1} (c)]

The Youden index is equivalent to the Smirnov (or the two-sample Kolmogorov-Smirnov) test statistic¹⁶ and can be represented as half the

L_{1}

distance between the two densities or as the complement of the overlapping coefficient (OVL)^17–20:

J = \frac{1}{2} \int | f_{0} (y) - f_{1} (y) | d y = 1 - \int min [f_{1} (y), f_{0} (y)] d y = 1 - OVL

Additionally, the threshold corresponding to

J

, where sensitivity and specificity are maximized, denoted as

c^{*}

, is often used in clinical practice as the optimal classification threshold to screen subjects.

Covariates may impact the level and the accuracy of a diagnostic test. In order to appropriately understand the accuracy of the test in subpopulations, we can use covariate-specific or conditional ROC curves.²¹ Let $X$ denote a vector of covariates that are hypothesized to have an impact on the accuracy of the test. The conditional CDF in the diseased population is given by $F_{1} (y ∣ x) = P (Y_{1} \leq y ∣ X = x)$ and analogously given for the nondiseased population. The covariate-specific ROC can be written as

ROC (p ∣ x) = 1 - F_{1} (F_{0}^{- 1} (1 - p ∣ x) ∣ x)

(1)

with its counterpart conditional summary indices,

AUC (x)

and

J (x)

, defined accordingly. The covariate-specific ROC curve can be generated by modeling the conditional distribution of the test results, known as the induced or indirect methodology.³

1.2. Overview

The article proceeds as follows. In Section 2, we propose a transformation modeling framework for parameterizing ROC curves from which we derive closed-form expressions for associated AUC and Youden summary indices. We discuss maximum likelihood estimation procedures for our model and corresponding inference. In Section 3, we assess the empirical performance of our methods using simulated data. We apply our approach to a cross-sectional study for detection of metabolic syndrome in Section 4 and conclude the article with a discussion.

2. Methods

2.1. Transformation model

The ROC curve is a composition of distribution functions and thus is invariant to strictly monotonically increasing transformations of $Y$ . We propose a model for the conditional distribution of the transformed test result given the disease status and covariates. This transformation is obtained from the data and leads to a distribution-free framework to parameterize the covariate-specific ROC curve and its summary indices.

Suppose there exists a strictly monotonically increasing function $h$ such that the relationship between the transformed test result and the covariates follows a shift-scale model

h (Y) = μ_{d} (x) + σ_{d} (x) Z

where

D = d

specifies the disease indicator (

D = 0

for nondiseased and

D = 1

for diseased),

X = x

a fixed set of covariates,

μ_{d} (x)

is the shift term,

σ_{d} (x)

is the scale term, and

Z \in R

is a latent random variable with an a priori known absolutely continuous log-concave CDF,

F_{Z}

. Given that

D

and

X

are fixed, the conditional CDF for

Y

P (Y \leq y ∣ D = d, X = x) = F_{d} (y ∣ x) = F_{Z} (\frac{h (y) - μ_{d} (x)}{σ_{d} (x)})

(2)

Equation (2) represents a general class of models called transformation models.^22,13 The transformation function

h

uniquely characterizes the distribution of

Y

, similar to the density or distribution function. Plugging in this conditional CDF of

Y

into equation (1),

h

cancels out and the covariate-specific ROC curve is given by

ROC (p ∣ x) = 1 - F_{Z} (ζ (x) F_{Z}^{- 1} (1 - p) - δ (x))

(3)

where

δ (x) = \frac{μ_{1} (x) - μ_{0} (x)}{σ_{1} (x)} and ζ (x) = \frac{σ_{0} (x)}{σ_{1} (x)}

Thus, the ROC curve is completely determined by the shift and scale terms of the model.

The binormal²³ and bilogistic²⁴ ROC curves can be obtained by setting $F_{Z}$ to the standard normal distribution function ${probit}^{- 1} = Φ$ , or the standard logistic distribution function ${logit}^{- 1} (x) = expit (x) = (1 + \exp (- x))^{- 1}$ , in equation (3), respectively. Similarly, the proportional hazard²⁵ and reverse proportional hazard alternatives²⁶ for the ROC curve also fall within the purview of our transformation model with $F_{Z}$ specified as ${cloglog}^{- 1} (x) = 1 - \exp (- \exp (x))$ (minimum extreme value distribution function) and ${loglog}^{- 1} (x) = \exp (- \exp (- x))$ (maximum extreme value distribution function), respectively.

However, to the best of our knowledge, the only literature where the transformation function $h$ is included in the model formulation of the ROC curve is Zou,²⁷ who jointly models the shift term and the parameters of a Box-Cox power transformation function. A key point of this article is that we explicitly estimate $h$ jointly with $μ (x)$ from the observed data and are not restricted to normality imposed by power transformation families. Thus, the methods we propose allow for proper propagation of uncertainty from the estimated transformation function $\hat{h}$ into the estimates of the shift and scale terms of the model.

The ROC curve in equation (3) follows a parametric model depending on $F_{Z}$ , but is distribution-free as by Alonzo and Pepe,²⁸ because no assumptions are made about the transformation $h$ and consequently for the distribution of the test results. The approach to model the test results as a function of the disease status and covariates was originally proposed in the latent variable ordinal regression setting by Tosteson and Begg²¹ and extended by Pepe³ to modeling covariate effects directly on the ROC curve.

Tosteson and Begg²¹ pointed out that to ensure concavity of the induced ROC curve, the scale term must be omitted, that is, $σ_{d} (x) = 1$ for $d = {0, 1}$ . The ROC curve is termed proper if it is concave or, equivalently, if the derivative of the ROC curve is a monotonically decreasing function.²⁹ A concave ROC curve is desirable as it yields the maximal sensitivity for a given value of specificity.³⁰ In this sense, as the decision criterion for classifying subjects is optimal when the ROC curve is concave, we focus on the remaining work on the model involving only the shift term. Hence, the effect of covariates on the ROC curve is contained in the difference between the shift terms for diseased and nondiseased subjects, $δ (x) = μ_{1} (x) - μ_{0} (x)$ . For a relaxation of this assumption, see Siegfried et al.³¹ who additionally estimate the scale functions through regression models.

2.1.1. Two-sample case

We first consider the case of two samples without covariates. Let the shift term take the form $μ_{d} (x) = δ d$ . The CDF of the test results in the nondiseased population is given by $F_{0} (y) = F_{Z} (h (y))$ and in the diseased population by $F_{1} (y) = F_{Z} (h (y) - δ)$ . Using Equation (3), the induced ROC curve can be expressed as

\begin{aligned} ROC (p) & = 1 - F_{Z} (h (h^{- 1} (F_{Z}^{- 1} (1 - p))) - δ) \\ = 1 - F_{Z} (F_{Z}^{- 1} (1 - p) - δ) \end{aligned}

(4)

The model assumption implies that a monotone function

h

exists to transform both

Y_{1}

and

Y_{0}

into the same distribution,

Z \sim F_{Z}

separated by a shift parameter,

δ

. The induced ROC curve from this model does not assume a particular distribution of the test result, rather, it quantifies the difference between the test result distributions on the scale of a user-defined

F_{Z}

. In this sense, the difference between the test result distributions is described by

δ

. Each choice of

F_{Z}

leads to a different interpretation of

δ

. For example, when

F_{Z}

is selected to be the standard normal distribution function,

δ

is essentially Cohen’s d, the standardized difference in means of the transformed test results comparing the diseased and nondiseased groups,

E [h (Y_{1}) - h (Y_{0})]

. Similarly, when

F_{Z}

is the standard logistic distribution function,

\exp (δ)

is the ratio of odds of having a positive test result comparing diseased and nondiseased groups. Closed-form expressions can be derived for summary indices of the ROC curve by solving the appropriate integrals. The expressions of AUC,

J

, the optimal threshold

c^{*}

, sensitivity and specificity at

c^{*}

are given for some choices of

F_{Z}

in Table 1.

Table 1.

Closed-form expressions for the area under the receiver operating characteristic curve (AUC), Youden Index ( $J$ ), optimal classification threshold ( $c^{*}$ ), sensitivity ( $Sens$ ), and specificity ( $Spec$ ) at $c^{*}$ in terms of the shift parameter $δ$ in the linear transformation model given by $F_{d} (y) = F_{Z} (h (y) - δ d)$ .

	$F_{Z}$
Index	${probit}^{- 1}$	${logit}^{- 1}$	${cloglog}^{- 1}$	${loglog}^{- 1}$
$AUC$	$Φ (\frac{δ}{\sqrt{2}})$	${\begin{cases} \frac{\exp (δ) (\exp (δ) - 1 - δ)}{(\exp (δ) - 1)^{2}} & δ \neq 0 \\ 1 / 2 & δ = 0 \end{cases}$	$expit (δ)$
$J$	$1 - 2 Φ (\frac{- \| δ \|}{2})$	$1 - 2 expit (\frac{- \| δ \|}{2})$	$\exp (\frac{- \| δ \|}{e^{\| δ \|} - 1}) - \exp (\frac{\| δ \|}{e^{- \| δ \|} - 1})$
$c^{*}$	$h^{- 1} (\frac{δ}{2})$		$h^{- 1} (\log (\frac{δ}{1 - e^{- δ}}))$	$h^{- 1} (\log (\frac{e^{δ} - 1}{δ}))$
$Sens (c^{*})$	$Φ (\frac{δ}{2})$	$expit (\frac{δ}{2})$	$\exp (\frac{- δ}{e^{δ} - 1})$	$1 - \exp (\frac{δ}{e^{- δ} - 1})$
$Spec (c^{*})$			$1 - \exp (\frac{δ}{e^{- δ} - 1})$	$\exp (\frac{- δ}{e^{δ} - 1})$

2.1.2. Conditional ROC curve

The accuracy of a diagnostic test may be influenced by a set of covariates $X$ . To evaluate their effect on the ROC curve and its summary indices, we assume a linear transformation model with a shift term that takes the form

μ_{d} (x) = δ d + x^{⊤} ξ + d x^{⊤} γ

(5)

where

ξ, γ \in R^{P}

are the coefficient vectors for the covariates and interaction term, respectively. Under this model, the resulting covariate-specific ROC curve is

ROC (p ∣ x) = 1 - F_{Z} (F_{Z}^{- 1} (1 - p) - (δ + x^{⊤} γ))

where the covariate effect on the ROC curve is given by the difference in shift terms between diseased and nondiseased subjects,

δ (x) = δ + x^{⊤} γ

. Similarly, the covariate-specific AUC is given by

AUC (x) = P (Y_{0} < Y_{1} ∣ X = x) = a (δ (x)) = a (δ + x^{⊤} γ)

(6)

where

a : R \mapsto [0, 1]

is the AUC function from the first row of Table 1 for different choices of

F_{Z}

. The expressions for

J

c^{*}

, sensitivity and specificity can analogously be adjusted to account for covariates, with

δ

replaced by

δ + x^{⊤} γ

in Table 1. In the case of a single continuous covariate

X = x \in R

, the interpretation of the interaction coefficient is as follows. For each possible specificity value, a unit increase in

x

results in a

γ

-unit increase in the ROC curve (or an increase in the sensitivity) on the scale of

F_{Z}

. If

γ

is positive, an increase in

x

corresponds to an increase in the ROC curve, indicating that a test is better able to discriminate the two populations for larger values of

x

and, vice versa. Note that the ROC curve varies with the covariate contingent upon the presence of an interaction between

d

and

x

. For

γ = 0

, the covariate affects the distribution of the test results from the diseased and nondiseased population, but not the ROC curve. That is, for all levels of

x

, the difference between the transformed distributions

h (Y_{1})

and

h (Y_{0})

is given by

δ

and the ROC curve is unchanged. Analogous interpretations hold when we are interested in modeling a set of covariates

X

, which could possibly include categorical covariates.

Standard regression techniques have also been proposed as an alternative to assess the effect of covariates on summary indices rather than deriving the induced ROC curve. For example, Dodd et al.³² model the partial AUC as a regression function of covariates. Our model equivalently results in a regression model for the AUC, where $δ + x^{⊤} γ$ is in the form of a usual linear predictor and $a$ is a monotonically increasing inverse link function which defines the scale for the regression coefficients. As will be shown in Section 2.2, an advantage of our approach is that we do not rely on less efficient binary regression techniques and directly estimate the regression parameters of the transformation model using maximum likelihood estimation. In Supplemental Material Section A, we show that our method is additionally related to the probabilistic index model (PIM) of Thas et al.^33,34

We can also consider more general and potentially nonlinear formulations of the shift and scale terms in our framework. For the special case of $F_{Z} = {probit}^{- 1}$ , the AUC from a shift-scale transformation model³¹ is given by

P (Y_{0} < Y_{1} ∣ X_{0} = x_{0}, X_{1} = x_{1}) = Φ (\frac{μ_{1} (x_{1}) - μ_{0} (x_{0})}{\sqrt{σ_{0} (x_{0})^{2} + σ_{1} (x_{1})^{2}}})

where

X_{0}

and

X_{1}

are the corresponding (potentially different) sets of covariates in the nondiseased and diseased populations, respectively. When the scale term depends only on a single set of covariates,

σ_{0} (x_{0}) = σ_{1} (x_{1}) = σ (x)

, despite varying sets of covariates in the shift terms, all the expressions hold from Table 1 with

δ

replaced by

\frac{μ_{1} (x_{1}) - μ_{0} (x_{0})}{σ (x)}

. However, such closed-form expressions cannot be derived for other choices of

F_{Z}

when the scale term depends on the disease indicator or on different sets of covariates. In such cases, AUCs and other summary indices can be derived using numerical techniques on the induced ROC curve.

2.2. Estimation

In this section, we propose estimation methods for a transformation model with univariate test results. We provide an explicit parameterization of the transformation function and the shift term. We then maximize the likelihood contributions for a potentially exact continuous, right-, left-, or interval-censored datum to jointly estimate the model parameters. This enables us to fully determine the ROC curve and its summary indices as well as handle test results which are ordinal or impacted by instrument detection limits.

2.2.1. Parameterization

We parameterize the transformation function as

h (y ∣ ϑ) = b {(y)}^{⊤} ϑ = \sum_{m = 0}^{M} ϑ_{m} b_{m} (y) for y \in R

(7)

where

b (y) = (b_{0} (y), \dots, b_{M} (y))^{⊤}

is a vector of

M + 1

basis functions with coefficients

ϑ \in R^{M + 1}

. Polynomials in Bernstein form offer a computationally attractive choice of basis that provides a flexible way of estimating the underlying transformation function. The Bernstein basis polynomial of order

M

is defined on the interval

[l, u]

b_{m} (y) = (\binom{M}{m}) {\tilde{y}}^{m} (1 - \tilde{y})^{M - m}, m = 0, \dots, M

(8)

where

\tilde{y} = \frac{y - l}{u - l} \in [0, 1]

. The restriction

ϑ_{m} \leq ϑ_{m + 1}

for

m = 0, \dots, M - 1

, guarantees the monotonicity of

h

. Observe that the transformation function is linear in the parameters that define it and any nonlinearity of the test results is modeled by the basis functions. If the order

M

is chosen to be sufficiently large, Bernstein polynomials can uniformly approximate any real-valued continuous function on an interval.³⁵

2.2.2. Likelihood

Denote the complete parameter vector as $θ = (β^{⊤}, ϑ^{⊤})^{⊤}$ , where $β = (δ, ξ^{⊤}, γ^{⊤})^{⊤} \in R^{2 P + 1}$ are the vector of regression coefficients parameterizing the function $μ_{d}$ from Section 2.1 and $ϑ \in R^{M + 1}$ are the basis coefficients. We follow the maximum likelihood approach proposed by Hothorn et al.¹³ to jointly estimate $β$ and $ϑ$ . The advantages of embedding the model in the likelihood framework are as follows. (i) All forms of random censoring (right, left, and interval) as well as truncation can directly be incorporated into likelihood contributions defined in terms of the distribution function.³⁶ Supplemental Material Section A details how ordinal biomarkers can be accommodated in the proposed modeling framework using interval-censored likelihood contributions. (ii) If the given model is correctly specified, under regularity conditions, the maximum likelihood estimator (MLE) will be asymptotically the most efficient estimator. (iii) The MLE is asymptotically normally distributed and has a sample variance that can be computed from the inverse of the Fisher information matrix. This can be used to generate confidence intervals (CIs) for the estimated parameters. (iv) The MLE is equivariant which implies invariance of the score test (or the Lagrange multiplier test) to reparameterizations.^37,38 Specifically, we will show in Section 2.3.1, by inverting the score test, our method produces confidence bands for the ROC curve and appropriate score intervals for its summary indices.

The likelihood contribution of a single observation $O = (Y, D, X)$ , where $Y \in (\underline{y}, \bar{y}] = {y \in R : \underline{y} < y \leq \bar{y}}$ is given by

\begin{aligned} L (θ ∣ O) = {\begin{cases} f_{Z} (h (y ∣ ϑ) - μ_{d} (x ∣ β)) h^{'} (y ∣ ϑ) & y \in R & ``exact continuous'' \\ 1 - F_{Z} (h (\underline{y} ∣ ϑ) - μ_{d} (x ∣ β)) & y \in (\underline{y}, \infty) & ``right censored'' \\ F_{Z} (h (\bar{y} ∣ ϑ) - μ_{d} (x ∣ β)) & y \in (- \infty, \bar{y}) & ``left censored'' \\ F_{Z} (h (\bar{y} ∣ ϑ) - μ_{d} (x ∣ β)) \\ - F_{Z} (h (\underline{y} ∣ ϑ) - μ_{d} (x ∣ β)) & y \in (\underline{y}, \bar{y}] & ``intervalcensored'' \end{cases} \end{aligned}

where

f_{Z}

is the density function of

Z

and

h^{'} (y ∣ ϑ)

is the first derivative of the transformation function with respect to

y

. Given a sample of

N

independent and identically distributed observations

O_{i}

for

i = 1, \dots, N

, the log-likelihood is given by

ℓ (θ) = \sum_{i = 1}^{N} \log (L_{i} (θ))

, where

L_{i}

is the likelihood contribution of observation

i

. The (unconditional) maximum likelihood estimate of

θ

is the solution to the optimization problem

\hat{θ} = (\hat{β}, \hat{ϑ}) = \underset{β, ϑ}{\arg \max} ℓ (β, ϑ)

subject to the monotonicity constraint

ϑ_{m} \leq ϑ_{m + 1}

for

m = 0, \dots, M - 1

. The resulting ROC curve only depends on

β

which is decoupled from the parameters needed to model the transformation function

ϑ

. The score function is defined as the first derivative of the log-likelihood function with respect to each of the parameters and is given by

\begin{aligned} S (θ) = (\begin{array}{l} \frac{\partial ℓ (θ)}{\partial β} \\ \frac{\partial ℓ (θ)}{\partial ϑ} \end{array}) = (\begin{array}{l} S_{β} (θ) \\ S_{ϑ} (θ) \end{array}) \end{aligned}

We perform constrained optimization using the likelihood and score contributions to determine the maximum likelihood estimates for

β

and

ϑ

(for computational details, see Hothorn³⁹). The asymptotic variance of the MLE can further be estimated by the expected Fisher information matrix which is the variance-covariance matrix of the score function and is defined as

\begin{aligned} I (θ) = - E (\begin{array}{ll} \frac{\partial^{2} ℓ (θ)}{\partial β \partial β^{⊤}} & \frac{\partial^{2} ℓ (θ)}{\partial β \partial ϑ^{⊤}} \\ \frac{\partial^{2} ℓ (θ)}{\partial ϑ \partial β^{⊤}} & \frac{\partial^{2} ℓ (θ)}{\partial ϑ \partial ϑ^{⊤}} \end{array}) = (\begin{array}{ll} I_{β, β} (θ) & I_{β, ϑ} (θ) \\ I_{β, ϑ} (θ)^{⊤} & I_{ϑ, ϑ} (θ) \end{array}) \end{aligned}

The matrix is partitioned such that the submatrix

I_{β, β} (θ)

corresponds to the parameter related to the disease indicator and covariates.

2.2.3. Limit of detection

Instrument precision can affect the evaluation of diagnostic biomarkers. For example, when biomarker levels are at or below the limit of detection (LOD) $y_{LOD}$ , the observed value lies in an interval $(- \infty, y_{LOD})$ and the resulting measurement is left censored. Often a replacement value is substituted for such measurements. Alternatively, only biomarker values above the LOD are used for the ROC analysis. It has been shown that these approaches lead to biased estimation.⁴⁰ Various adjustments to ROC curves and its summary indices have been proposed to handle such censored measurements.^41,6,42 However, these methods typically do not account for covariates. Our framework naturally accounts for such observations in the likelihood function for left censored test results. Similarly, the right censored likelihood accounts for measurements which are affected by an upper limit of detection. Thus, our method provides a smooth covariate-specific ROC curve for all values of specificity with estimates and inference appropriately incorporating the observed information.

2.3. Confidence intervals

In the following section, we present three methods to calculate confidence bands for the ROC curve and CIs for its summary indices. Since these quantities are functions $G : R^{2 P + 1} \to R$ of the regression parameters $β$ in the model, to maintain nominal coverage for a CI for $G (β)$ , appropriate methods are needed. The methods discussed include inverting the score test, the multivariate delta method and simulation from the asymptotic distribution of the estimate. The methods are ordered by their degree of theoretical justification. We start with score intervals which are invariant to parameter transformations but become computationally expensive when dealing with a large set of parameters. We then discuss estimating the variance using the delta method and conclude with a simple simulation method which is versatile without being computationally demanding.

2.3.1. Score intervals

In the two-sample univariate case where $δ$ defines the ROC curve, as in equation (5), we can construct score intervals for $δ$ . Unlike the Wald and other commonly used intervals, score intervals are especially desirable as they are invariant to transformations of the parameters. A score CI for $G (δ)$ (e.g. the AUC $a (δ)$ ), provides the same level of coverage as would a score CI for $δ$ . In turn, under a correctly specified model, a score CI for $δ$ allows the construction of accurately covered uniform confidence bands for the ROC curve as well as intervals for its summary indices such as the AUC and the Youden index.

We first generate score CIs for $δ$ by inverting the score test. In this case, the null hypothesis is given by $H_{0} : δ = δ_{0}$ where $δ_{0}$ is a specific value of the parameter of interest. Under $H_{0}$ , the restricted (conditional) MLE for $ϑ$ can be obtained by

\hat{ϑ} (δ_{0}) = \underset{ϑ}{\arg \max} ℓ (δ_{0}, ϑ)

or as a solution of the

M + 1

score equations

S_{ϑ} (δ_{0}, ϑ) = 0

. Note that this estimate is a function of

δ_{0}

. Letting

\tilde{θ} = (δ_{0}, \hat{ϑ} (δ_{0}))

, the quadratic (Rao) score statistic simplifies to

\begin{aligned} R (δ_{0}) & = S (\tilde{θ})^{⊤} I^{- 1} (\tilde{θ}) S (\tilde{θ}) \\ = (S_{δ} (\tilde{θ})^{⊤}, 0^{⊤}) I^{- 1} (\tilde{θ}) (S_{δ} (\tilde{θ})^{⊤}, 0^{⊤})^{⊤} \\ = S_{δ} (\tilde{θ})^{⊤} A_{δ, δ} (\tilde{θ}) S_{δ} (\tilde{θ}) \end{aligned}

where

A_{δ, δ} (\tilde{θ})

denotes the submatrix corresponding to

δ

of the inverse Fisher information matrix and is given by the Schur complement

I_{δ, δ} (θ) - I_{δ, ϑ} (θ) I_{ϑ, ϑ}^{- 1} (θ) I_{δ, ϑ} (θ)^{⊤}

. Under

H_{0}

R (δ_{0})

converges asymptotically to a chi-square distribution with 1 degree of freedom,

R (δ_{0}) \overset{D}{⟶} χ_{1}^{2}

. This result is explained by Rao.⁴³ Inverting the score statistic by enumerating values of

δ_{0}

allows for the construction of

(1 - α)

score CIs for

δ

defined as

{δ_{0} \in R ∣ R (δ_{0}) < χ_{1}^{2} (1 - α)}

where

χ_{1}^{2} (1 - α)

is the

(1 - α)

quantile value of the chi-squared distribution with 1 degree of freedom. Equivalently, we can use the square root of the score statistic to construct a

(1 - α)

score interval using quantiles of the standard normal distribution,

{δ_{0} \in R ∣ Φ^{- 1} (α / 2) < \sqrt{R (δ_{0})} \leq Φ^{- 1} (1 - α / 2)}

. Finally, we apply the function

G

to both the lower and upper limits of the interval to construct score confidence bands for the ROC curve or score CIs for its summary indices.

The score statistic is given by $R (δ_{0}) = S_{δ} (\tilde{θ})^{2} A_{δ, δ} (\tilde{θ})$ . Testing if there is a significant difference between the nondiseased and diseased populations coincides to the hypothesis test, $H_{0} : δ = 0$ . This is computationally efficient because only the distribution of $R (0)$ needs to be computed. However, computing score CIs for more than one parameter requires updating the restricted MLEs $\hat{ϑ} (δ_{0})$ for an enumeration of $δ_{0}$ values. This becomes computationally intractable when enumerating a higher-dimensional grid of parameters.

2.3.2. Delta method

Since the MLE satisfies

\sqrt{n} (\hat{β} - β) \overset{D}{\to} N_{P + 1} (0, A_{β, β} (θ))

then by the multivariate delta method,

G (\hat{β})

also follows a normal distribution with

V (G (\hat{β})) = \frac{1}{n} \nabla G (β)^{⊤} A_{β, β} (θ) \nabla G (β)

where

\nabla G (β)

is the gradient of

G

evaluated at

β

and the inverse Fisher information matrix

A_{β, β} (θ)

is given by the Schur complement

I_{β, β} (θ) - I_{β, ϑ} (θ) I_{ϑ, ϑ}^{- 1} (θ) I_{β, ϑ} (θ)^{⊤}

. For example, when the shift term takes the linear form as in equation (6) and

G

defines the AUC function for

F_{Z} = {probit}^{- 1}

, the entries of

\nabla G (β)

are given by

\frac{\partial G (β)}{\partial δ} = \frac{1}{\sqrt{2}} C, \frac{\partial G (β)}{\partial ξ_{i}} = 0 and \frac{\partial G (β)}{\partial γ_{i}} = \frac{x_{i}}{\sqrt{2}} C

where

C = ϕ (\frac{δ + x^{⊤} γ}{\sqrt{2}})

ϕ

is the density of the standard normal distribution and

i

indexes the

P

covariates. In general, the gradient can be estimated by calculating such derivatives and evaluating the resulting function at the MLE. Similarly, the variance-covariance matrix of the estimated parameters

A_{β, β} (θ)

can be computed by inverting the numerically evaluated Hessian matrix. Thus, a

(1 - α)

level CI for

G (β)

is given by

G (β) \pm Φ^{- 1} (α / 2) \sqrt{\hat{V} (G (\hat{β}))}

2.3.3. Simulated intervals

When the function $G$ has complex derivatives, as would be the case for nonlinear shift terms $μ_{d} (x)$ or when calculating optimal thresholds $c^{*}$ where $G$ includes the inverse of the transformation function, constructing CIs using the delta method becomes infeasible. For these cases, we apply a simple simulation-based algorithm which utilizes the asymptotic normality of the MLE to calculate CIs for the ROC curve and its summary indices, which are functions of the parameters of interest. The steps of the algorithm to construct $(1 - α)$ level CIs for $G (\hat{β})$ can be summarized as follows:

Generate $B$ independent samples from the asymptotic multivariate normal distribution of the parameter estimates $N_{P + 1} (\hat{β}, \frac{1}{n} {\hat{A}}_{β, β} (\hat{Θ}))$ and denote as ${\hat{β}}_{1}^{*}, \dots, {\hat{β}}_{B}^{*}$ .

For each sample $b = 1, \dots, B$ , calculate the function of interest $G ({\hat{β}}_{b}^{*})$ .

Construct the CI $(Q_{G ({\hat{β}}^{*})} (α / 2), Q_{G ({\hat{β}}^{*})} (1 - α / 2))$ , where $Q_{G ({\hat{β}}^{*})}$ is the empirical quantile function of the sample $G ({\hat{β}}_{1}^{*}), \dots, G ({\hat{β}}_{B}^{*})$ .

A similar algorithm is presented by Mandel,⁴⁴ who discuss its asymptotic validity and present several examples that show its empirical coverage adheres to nominal levels with results similar to the delta method.

3. Empirical evaluation

We conducted a simulation study to evaluate the performance of our estimators in the two-sample setting. We chose this setting to be able to compare various estimators commonly used in practice. The software details of all the methods used alongside their respective features and references are summarized in Table 2.

Table 2.
Overview of the different methods used in the simulation study.

ROC AUC Youden index

Reference R package Estimate CB Estimate CI Estimate CI

Hothorn³⁹ tram $✓$ $✓$ $✓$ $✓$ $✓$ $✓$

Harrell Jr^45,46 rms $✓$ $✓$

Thas et al.³³ pim $✓$ $✓$

Therneau⁴⁷ survival $✓$ $✓$

Robin et al.⁴⁸ pROC $✓$ $✓$ $✓$ $✓$ $✓$

Fay⁴⁹ asht $✓$ $✓$

Konietschke et al.⁵⁰ nparcomp $✓$ $✓$

Khan and Brandenburger⁵¹ ROCit $✓$ $✓$ $✓$ $✓$ $✓$

Feng et al.⁵² auRoc $✓$ $✓$

Perez-Jaume et al.⁵³ ThresholdROC $✓$ $✓$

Ridout and Linkie⁵⁴ overlap $✓$ $✓$

Franco-Pereira et al.⁵⁵ - $✓$ $✓$

Pèrez Fernàndez et al.⁵⁶ nsROC $✓$ $✓$ $✓$ $✓$ $✓$

		ROC		AUC		Youden index
Hothorn³⁹	tram	$✓$	$✓$	$✓$	$✓$	$✓$	$✓$
Harrell Jr^45,46	rms			$✓$	$✓$
Thas et al.³³	pim			$✓$	$✓$
Therneau⁴⁷	survival			$✓$	$✓$
Robin et al.⁴⁸	pROC	$✓$	$✓$	$✓$	$✓$	$✓$
Fay⁴⁹	asht			$✓$	$✓$
Konietschke et al.⁵⁰	nparcomp			$✓$	$✓$
Khan and Brandenburger⁵¹	ROCit	$✓$	$✓$	$✓$	$✓$	$✓$
Feng et al.⁵²	auRoc			$✓$	$✓$
Perez-Jaume et al.⁵³	ThresholdROC					$✓$	$✓$
Ridout and Linkie⁵⁴	overlap					$✓$	$✓$
Franco-Pereira et al.⁵⁵	-					$✓$	$✓$
Pèrez Fernàndez et al.⁵⁶	nsROC	$✓$	$✓$	$✓$	$✓$	$✓$

ROC: receiver operating characteristic; AUC: area under the ROC curve; CI: confidence interval; CB: confidence band.

References to the original publication along with R software details are given. The ( $✓$ ) indicates if a method computes the specific metric. The metrics included estimates for the ROC curve, AUC, and Youden index as well as corresponding CBs or CIs.

We considered a data generating process (DGP) such that nondiseased test results followed a standard normal distribution $F_{0} (y) = Φ (y)$ and the diseased test results a distribution with the CDF $F_{1} (y) = F_{Z_{DGP}} (F_{Z_{DGP}}^{- 1} (Φ (y)) - δ)$ . To obtain different shapes of the ROC curve, we chose three choices of $F_{Z_{DGP}} \in {{probit}^{- 1}, {logit}^{- 1}, {cloglog}^{- 1}}$ and varied $δ$ such that the $AUC \in {0.5, 0.65, 0.8, 0.95}$ or that $J \in {0, 0.25, 0.5, 0.8}$ , leading to a variety of configurations. Under this simulation paramaterization, the true ROC curves followed the form of equation (5) and the true summary indices could be calculated as a function of $δ$ from Table 1.

The conventional binormal model corresponded to $F_{Z_{DGP}} = {probit}^{- 1} = Φ$ and induced proper binormal ROC curves. This was the only configuration where the test results for both groups were normally distributed. We included this configuration to ascertain the loss of power associated with our estimators when the standard binormal assumption held. For other choices of $F_{Z_{DGP}}$ with $AUC > 0.5$ , the resulting distributions of the diseased test results were non-normal, with variances and higher moments differing between the two groups. Specifically, the configuration of $F_{Z_{DGP}} = {logit}^{- 1}$ led to light tailed distributions for the diseased test results, while $F_{Z_{DGP}} = {cloglog}^{- 1}$ led to skewed, heavy-tailed distributions. The corresponding density functions for the data generating model with selected AUC values are given in Figure 1.

Figure 1.

Density functions for the model used to generate the data for the simulations. The nondiseased test results followed a standard normal distribution corresponding to an $AUC = 0.5$ . The diseased test results varied with three choices of $F_{Z_{DGP}}$ : ${probit}^{- 1}$ , ${logit}^{- 1}$ , and ${cloglog}^{- 1}$ each of which had an AUC of 0.5, 0.65, 0.8, and 0.95. DGP: data generating process; AUC: area under the receiver operating characteristic curve.

For 10,000 replications of each configuration, we generated balanced data sets with sample sizes $N_{0} = N_{1} \in {25, 50, 100}$ . The transformation models discussed in Section 2 were fitted to the simulated data sets assuming a parameterization of the transformation function given by a Bernstein basis polynomial of order $M = 6$ (see Hothorn³⁹ for a discussion on suitable choices for $M$ ). The true data-generating model had a nonlinear transformation function $h = F_{Z_{DGP}}^{- 1} \circ Φ$ . Our model estimation procedure aimed to approximate this function alongside the shift parameter $δ$ . The functions implementing transformation models for different choices of $F_{Z}$ are available from the tram add-on package.³⁹ Note that the function $F_{Z_{DGP}}$ is for the data generating process in the simulation study and is distinct from $F_{Z}$ , the inverse link function used in the model. When $F_{Z_{DGP}} = F_{Z}$ , the model is correctly specified for the DGP.

Figure 2 displays the distribution of bias for the AUC estimates using the proposed methods under the various simulation configurations. We found that all three methods had minimal bias for an $AUC = 0.5$ , where the test results were unable to distinguish between the two groups. The models with $F_{Z} \in {{probit}^{- 1}, {logit}^{- 1}}$ yielded approximately unbiased AUC estimates in all cases, even when they were misspecified for the true data generating process. However, estimates based on the proportional hazards model $F_{Z} = {cloglog}^{- 1}$ , were biased for data generating processes other than where it was correctly specified.

Figure 2.

Distribution of bias from the simulation study for estimation of the AUC. The DGP for nondiseased results was $F_{0} (y) = Φ (y)$ and for diseased results $F_{1} (y) = F_{Z_{DGP}} (F_{Z_{DGP}}^{- 1} (Φ (y)) - δ)$ . We varied $F_{Z_{DGP}} \in {{probit}^{- 1}$ , ${logit}^{- 1}$ , ${cloglog}^{- 1}}$ , AUC and sample size. The proposed methods also varied by the same inverse link functions. An alignment of colors in the column (DGP) and the fill of the box plot is indicative that the method is correctly specified for the DGP. DGP: data generating process; AUC: area under the receiver operating characteristic curve.

We compared our approaches to a set of alternative methods (see Table 2) for computing CIs for the AUC and Youden index. We detail the empirical coverage and average width of the CIs for the AUC in Supplemental Figures S1 and S2, respectively. Estimates based on transformation models (R packages tram, orm, pim) yielded coverage close to the nominal level and significantly outperformed the other methods when the model was correctly specified for the true data generating process. All other methods generally performed close to nominal levels for low to medium AUC values (0.5–0.8), but broke down for higher AUC values. In addition, the score CIs from the transformation model with $F_{Z} = {logit}^{- 1}$ were accurate even when it was misspecified for the true data generating process. However, methods which used $F_{Z} = {cloglog}^{- 1}$ gave CIs which were shorter in length (overconfident).

Analogously, Supplemental Figures S3 and S4 detail the coverage and length of the CIs for the Youden index. The methods which were based on the overlap coefficient failed to cover the configuration where $J = 0$ because their lower limits were never below 0. Our methods estimated CIs for $δ \in R$ which naturally accounted for this scenario. The transformation model with $F_{Z} = {logit}^{- 1}$ provided coverage at nominal levels for all simulation configurations with a relatively small CI width. The approach of Franco-Pereira et al.⁵⁵ (FP) was also accurate under model misspecification but was more involved. Namely, it consisted of estimating Box-Cox transformation parameters under a binormal framework with bootstrap variance, all carried out on the logit scale and then back-transformed. In a setting with covariates, censoring or with $J = 0$ this methodology would be limited.

Supplemental Figures S5 and S6 show the coverage and area of the confidence bands for the ROC curve. All the approaches based on transformation models covered the configuration with $AUC = 0.5$ accurately. However, the other approaches did not yield coverage close to nominal levels in this configuration with the exception of Martínez-Camblor et al.,⁵⁷ whose confidence bands had a significantly larger area indicating lower power. For all other configurations, only transformation models which were correctly specified for the true data-generating model provided accurate results.

In addition to the simulations described above, we considered three other scenarios to evaluate the robustness of our proposed methods to model misspecification. The details for each of these scenarios are given in Supplemental Materials Section B. In terms of the AUC, we noticed that our models are generally robust to misspecification, but can break down in certain cases. However, the proportional hazard model with $F_{Z} = {cloglog}^{- 1}$ resulted in poor performance under misspecified configurations, indicating that it should be used with caution.

4. Application

The prevalence of obesity has increased consistently for most countries in the recent decade and this trend is a serious global health concern.⁵⁸ Obesity contributes directly to increased risk of cardiovascular disease (CVD) and its risk factors, including type 2 diabetes, hypertension, and dyslipidemia.^59,60 Metabolic syndrome (MetS) refers to the joint presence of several cardiovascular risk factors and is characterized by insulin resistance.⁶¹ The National Cholesterol Education Program Adult Treatment Panel III (NCEP-ATP III) criteria is the most widely used definition for MetS, but it requires laboratory analysis of a blood sample. This has led to the search for non-invasive techniques which allow reliable and early detection of MetS.

Waist-to-height ratio (WHtR) is a well-known anthropometric index used to predict visceral obesity. Visceral obesity is an independent risk factor for development of MetS by means of the increased production of free fatty acids whose presence obstructs insulin activity.⁶² This suggests that higher values of WHtR, reflecting obesity, and CVD risk factors, are more indicative of incident MetS. Several studies have found that WHtR is highly predictive of MetS.^63–65 However, as waist circumference changes with age and gender,⁶⁶ it is also important to study whether or not the performance of WHtR at diagnosing MetS is impacted by these variables. Evaluation of WHtR as a predictor of MetS after adjusting for covariates is necessary so that more tailored interventions can be initiated to improve outcomes.

We illustrate the use of our methods to data from a cross sectional study designed to validate the use of WHtR and systolic blood pressure (SBP) as markers for early detection of MetS in a working population from the Balearic Islands (Spain). Detailed descriptions of the study methodology and population characteristics are reported in Romero-Saldaña et al.⁶⁷ Briefly, data on 60 799 workers were collected during their work health periodic assessments between 2012 and 2016. Presence of MetS was determined by the NCEP-ATP III criteria and the sample consisted of 5487 workers with MetS.

4.1. Two-sample analysis

We first examined the unconditional performance of WHtR as a marker to diagnose MetS, denoted $Y$ and $D$ , respectively. We fitted a linear transformation model with corresponding ROC curve of the form in equation (5), where $δ$ is the shift parameter, for various choices of the inverse link function $F_{Z}$ . Associated inference of the AUC and $J$ was calculated using the closed-form expressions from Table 1. The resulting estimates are presented in Table 3. The AUCs were consistently bounded away from $0.5$ indicating a good capacity of WHtR to discriminate between workers with and without MetS. This can also be seen from the estimated ROC curve plotted in Figure 3 which lies well above the diagonal line as well as the modeled densities which have a small degree of overlap. The CIs and uniform confidence bands were quite small due to the large sample size.

Figure 3.

Estimates from the linear transformation model with a single shift parameter, $h (Y) = δ d + Z$ , where $Z$ is chosen to be a standard logistic distribution. (A) Density functions of WHtR for the workers who were diagnosed with MetS (dotted line) and those who were not (solid line). (B) ROC curve for WHtR as a marker of MetS with 95% uniform score confidence bands are represented by gray shaded areas. MetS: metabolic syndrome; WHtR: waist-to-height ratio.

Table 3.

Estimates and 95% score confidence intervals of the shift paramater, AUC and $J$ in the two-sample linear transformation model for WHtR as a marker of MetS.

$F_{Z}$	$δ$	AUC	$J$
${probit}^{- 1}$	1.492 (1.462, 1.521)	0.854 (0.849, 0.859)	0.544 (0.535, 0.553)
${logit}^{- 1}$	2.785 (2.730, 2.841)	0.871 (0.866, 0.875)	0.602 (0.593, 0.611)
${cloglog}^{- 1}$	1.186 (1.157, 1.215)	0.766 (0.761, 0.771)	0.412 (0.403, 0.421)
${loglog}^{- 1}$	1.425 (1.397, 1.453)	0.806 (0.802, 0.810)	0.484 (0.475, 0.492)

AUC: area under the receiver operating characteristic curve; MetS: metabolic syndrome; WHtR: waist-to-height ratio.

4.2. Conditional ROC analysis

Next, we investigated if the discriminatory ability of WHtR in separating workers with and without MetS varies with covariates. We considered a transformation model that included the main effects of covariates plus interaction terms with the disease indicator, which leads to the ROC curve given by

ROC (p ∣ x) = 1 - F_{Z} (F_{Z}^{- 1} (1 - p) - (δ + γ^{⊤} x))

where the covariates

x

included age, gender, and tobacco consumption. The choice of

F_{Z} = {logit}^{- 1}

was made using repeated holdout validation. We describe this model selection procedure in Supplemental Material Section C and show the results for different model choices and parameterizations.

Figure 4 displays the covariate-specific ROC curves fitted to these data. The performance of WHtR appeared to be better for females compared to males and decreased with age. The effect of smoking, although significant in the model, does not seem to substantially alter the ROC curves given the other covariates are kept fixed. To inspect the covariate effect further, we calculated the age- and gender-specific AUCs and Youden indices from the model. Figure 5 clearly shows that the discriminatory capabilities of WHtR in distinguishing workers with MetS is consistently better for females and decreases with age.

Figure 4.

Estimated covariate-specific ROC curves for WHtR as a marker of MetS for female (solid line) and male workers (dashed line). Vertical panels represent a specific age (30, 40, 50) and horizontal panels smoking status. ROC: receiver operating characteristic; MetS: metabolic syndrome; WHtR: waist-to-height ratio.

Figure 5.

Age-based AUC and Youden indices where WHtR is used as a marker to detect MetS for non-smoking female (solid line) and male (dashed line) workers. 95% Wald pointwise confidence bands are represented by gray shaded areas. AUC: area under the receiver operating characteristic curve; MetS: metabolic syndrome; WHtR: waist-to-height ratio.

5. Discussion

This article presents a new modeling framework for ROC analysis that can be used to characterize the accuracy of medical diagnostic tests. Our model is based on estimating an unknown transformation function for the test results and yields a distribution-free yet model-based estimator for the ROC curve. Covariates that influence the diagnostic accuracy of tests can naturally be accommodated as regression parameters into the model and covariate-specific summary indices such as the AUC and Youden index are easily computed using closed-form expressions.

Our proposed approach has several features which distinguish it from contemporary methods of ROC analysis. Firstly, we employ maximum likelihood to jointly estimate all parameters defining the transformation function and regression coefficients. This implies the variation in the estimated transformation parameters is accounted for and appropriately propagated to inference for the ROC curve. In turn, asymptotic efficiency is guaranteed for our estimators and we avoid reliance on resampling procedures for the construction of CIs. Secondly, transformation models focus on estimating the conditional distribution function whose evaluation directly provides the likelihood contributions for interval, right-, and left-censored data that commonly arises due to instrument detection limits. Thirdly, no strong assumptions are made regarding the transformation function which results in a highly flexible model that retains interpretability of the regression coefficients. Lastly, software implementations for all the methods described in this article are available in the tram R add-on package (see Supplemental Material for example code), thus enabling a unified framework for ROC analysis.

In our simulation study, interestingly, we found that a model with $F_{Z} = {logit}^{- 1}$ provided accurate results even when it was misspecified for the true data generating process. This model also behaves very similarly to the semiparametric cumulative probability model,⁶⁸ both of which estimate a log-odds ratio $δ$ . The equivalence of the transformation model’s odds ratio to the MWW test statistic has been well studied.⁶⁹ The MWW statistic has a bounded influence function and is robust to contaminations of the specified model.⁷⁰ Due to their equivalence, we hypothesize that the transformation model with $F_{Z} = {logit}^{- 1}$ is also endowed with the same robustness properties as the MWW and thus can be chosen when no a priori model is known.

One aspect that warrants further investigation is model selection, specifically with regards to the choice of $F_{Z}$ . One strategy would be to define $F_{Z}$ tailored to a specific interpretation of the parameters $δ$ , $β$ , and $γ$ , for example, as log-odds ratios with $F_{Z} = {logit}^{- 1}$ or $F_{Z} = {cloglog}^{- 1}$ for hazard ratios.³⁴ A second option is to use some form of cross validation in combination with model assessment via the probability integral transform (PIT) (as discussed in Supplemental Materials Section C). Third, and in analogy to single index models, one could introduce parameters to $F_{Z}$ such that the shape of the inverse link function is estimated along with all other model parameters (McLain and Ghosh⁷¹ discuss a family of link functions including the complementary log-log and logit). Finally, we could completely relax the assumption that the difference between the diseased and nondiseased distributions is described by a shift-term. In this case, separate transformation functions would be allowed in each of the two groups. Namely, consider a stratified model where the nondiseased results follow a distribution with the CDF $F_{0} (y) = F_{Z} (h_{0} (y))$ and the diseased with the CDF $F_{1} (y) = F_{Z} (h_{1} (y))$ . Defining a new transformation function $r = h_{1} \circ h_{0}^{- 1} \circ F_{Z}^{- 1} : [0, 1] \mapsto R$ , the smooth ROC curve with no shift assumptions is given by $ROC (p) = 1 - F_{Z} (r (1 - p))$ . This model has more flexibility but sacrifices the properness property desirable for the ROC curves. Furthermore, care has to be taken in defining the correct likelihood contributions for accurate inference of this model as uncertainty enters from both transformation functions.

In future work, we plan to pursue various extensions of transformation models for ROC analysis to consider (1) penalty terms for high-dimensional covariates,⁷² (2) mixed effects for clustered observations,⁷³ and (3) covariate-dependent transformation functions through forest-based machine learning methods.⁷⁴

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231176030 - Supplemental material for Estimating transformationsfor evaluating diagnostic testswith covariate adjustment

Supplemental material, sj-pdf-1-smm-10.1177_09622802231176030 for Estimating transformationsfor evaluating diagnostic testswith covariate adjustment by Ainesh Sewak and Torsten Hothorn in Statistical Methods in Medical Research

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Swiss National Science Foundation, grant number 200021_184603.

ORCID iD

Torsten Hothorn

Supplemental material

Supplemental materials for this article are available online.

References

Pepe

. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, UK: Oxford University Press, 2003.

Zou

Liu

Bandos

, et al. Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis. Boca Raton, FL, USA: CRC Press, 2011.

Pepe

. A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika 1997; 84: 595–608.

Faraggi

. Adjusting receiver operating characteristic curves and related indices for covariates. J R Stat Soc: Ser D (The Statistician) 2003; 52: 179–192.

Perkins

Schisterman

Vexler

. Receiver operating characteristic curve inference from a sample with a limit of detection. Am J Epidemiol 2007; 165: 325–333.

Bantis

Yan

Tsimikas

, et al. Estimation of smooth ROC curves for biomarkers with limits of detection. Stat Med 2017; 36: 3830–3843.

Inácio

Lourenço

de Carvalho

, et al. Robust and flexible inference for the covariate-specific receiver operating characteristic curve. Stat Med 2021; 40: 5779–5795.

Inácio

Rodríguez-Álvarez

Gayoso-Diz

. Statistical evaluation of medical tests. Annu Rev Stat Appl 2021; 8: 41–67.

Zou

Tempany

Fielding

, et al. Original smooth receiver operating characteristic curve estimation from continuous data: statistical methods for analyzing the predictive value of spiral CT of ureteral stones. Acad Radiol 1998; 5: 680–687.

10.

Zou

Hall

. Two transformation models for estimating an ROC curve derived from continuous data. J Appl Stat 2000; 27: 621–631.

11.

Zou

Hall

Shapiro

. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med 1997; 16: 2143–2156.

12.

Hothorn

Kneib

Bühlmann

. Conditional transformation models. J R Stat Soc: Ser B (Statistical Methodology) 2014; 76: 3–27.

13.

Hothorn

Möst

Bühlmann

. Most likely transformations. Scand J Stat 2018; 45: 110–134.

14.

Bamber

. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol 1975; 12: 387–415.

15.

Youden

. Index for rating diagnostic tests. Cancer 1950; 3: 32–35.

16.

Komaba

Johno

Nakamoto

. A novel statistical approach for two-sample testing based on the overlap coefficient, 2022. https://arxiv.org/abs/2206.03166. arXiv:2206.03166 [math.ST].

17.

Weitzman

. Measures of Overlap of Income Distributions of White and Negro Families in the United States. 3. Washington, DC: US Bureau of the Census, 1970. Washington, D.C.

18.

Feller

. An Introduction to Probability Theory and Its Applications. New York, NY, USA: Wiley, 1991.

19.

Schmid

Schmidt

. Nonparametric estimation of the coefficient of overlapping—theory and empirical application. Comput Stat Data Anal 2006; 50: 1583–1596.

20.

Martínez-Camblor

. About the use of the overlap coefficient in the binary classification context. Commun Stat-Theor Method 2022; 1–11.

21.

Tosteson

ANA

Begg

. A general regression methodology for ROC curve estimation. Med Decis Making 1988; 8: 204–215.

22.

Bickel

Doksum

. An analysis of transformations revisited. J Am Stat Assoc 1981; 76: 296–311.

23.

Dorfman

Alf Jr

. Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating-method data. J Math Psychol 1969; 6: 487–496.

24.

Ogilvie

Creelman

. Maximum-likelihood estimation of receiver operating characteristic curve parameters. J Math Psychol 1968; 5: 377–391.

25.

Gönen

Heller

. Lehmann family of ROC curves. Med Decis Making 2010; 30: 509–517.

26.

Khan

. Resilience family of receiver operating characteristic curves. IEEE Trans Reliab 2022. DOI: 10.1109/TR.2022.3194710.

27.

Zou

. Analysis of Some Transformation Models for the Two-sample Problem With Special Reference to Receiver Operating Characteristic Curves. PhD thesis, University of Rochester, 1997.

28.

Alonzo

Pepe

. Distribution-free ROC analysis using binary regression techniques. Biostatistics 2002; 3: 421–432.

29.

Pan

Metz

. The “proper” binormal model: parametric receiver operating characteristic curve estimation with degenerate data. Acad Radiol 1997; 4: 380–389.

30.

McIntosh

Pepe

. Combining several screening tests: optimality of the risk score. Biometrics 2002; 58: 657–664.

31.

Siegfried

Kook

Hothorn

. Distribution-free location-scale regression. Am Stat 2022. DOI: 10.1080/00031305.2023.2203177.

32.

Dodd

Pepe

. Partial AUC estimation and regression. Biometrics 2003; 59: 614–623.

33.

Thas

Neve

Clement

, et al. Probabilistic index models. J R Stat Soc: Ser B (Statistical Methodology) 2012; 74: 623–671.

34.

De Neve

Thas

Gerds

. Semiparametric linear transformation models: effect measures, estimators, and applications. Stat Med 2019; 38: 1484–1501.

35.

Farouki

. The bernstein polynomial basis: a centennial retrospective. Comput Aided Geom Des 2012; 29: 379–419.

36.

Lindsey

. Parametric Statistical Inference. Oxford, UK: Oxford University Press, 1996.

37.

Rao

. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 44. Cambridge, UK: Cambridge University Press, 1948. pp. 50–57.

38.

Dagenais

Dufour

. Invariance, nonlinear models, and asymptotic tests. Economet: J Economet Soc 1991; 59: 1601–1615.

39.

Hothorn

. Most likely transformations: the mlt package. J Stat Softw 2020; 92: 1–68.

40.

Lynn

. Maximum likelihood inference for left-censored HIV RNA data. Stat Med 2001; 20: 33–45.

41.

Mumford

Schisterman

Vexler

, et al. Pooling biospecimens and limits of detection: effects on ROC curve analysis. Biostatistics 2006; 7: 585–598.

42.

Xiong

Luo

Agboola

, et al. A family of estimators to diagnostic accuracy when candidate tests are subject to detection limits—application to diagnosing early stage Alzheimer’s disease. Stat Methods Med Res 2022; 31: 882–898.

43.

Rao

. Score test: historical review and recent developments. In Advances in Ranking and Selection, Multiple Comparisons, and Reliability. Boston, MA, USA: Birkhäuser, 2005; pp. 8–20.

44.

Mandel

. Simulation-based confidence intervals for functions with complicated derivatives. Am Stat 2013; 67: 76–81.

45.

Harrell Jr

. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. 608. New York: Springer, 2001.

46.

Harrell Jr

. rms Regression Modeling Strategies, 2022. https://CRAN.R-project.org/package=rms. R package version 6.3-0.

47.

Therneau

. survival: A Package for Survival Analysis in R, 2022. https://CRAN.R-project.org/package=survival. R package version 3.3-1.

48.

Robin

Turck

Hainard

, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 2011; 12: 77.

49.

Fay

. asht: Applied Statistical Hypothesis Tests, 2022. https://CRAN.R-project.org/package=asht. R package version 0.9.7.

50.

Konietschke

Placzek

Schaarschmidt

, et al. nparcomp: an R software package for nonparametric multiple comparisons and simultaneous confidence intervals. J Stat Softw 2015; 64: 1–17. DOI: http://www.jstatsoft.org/v64/i09/ .

51.

Khan

MRA

Brandenburger

. ROCit: Performance Assessment of Binary Classifier with Visualization, 2020. https://CRAN.R-project.org/package=ROCit. R package version 2.1.1.

52.

Feng

Manevski

Perme

. auRoc: Various Methods to Estimate the AUC, 2020. https://CRAN.R-project.org/package=auRoc. R package version 0.2-1.

53.

Perez-Jaume

Skaltsa

Pallarès

, et al. ThresholdROC: optimum threshold estimation tools for continuous diagnostic tests in R. J Stat Softw 2017; 82: 1–21.

54.

Ridout

Linkie

. Estimating overlap of daily activity patterns from camera trap data. J Agric Biol Environ Stat 2009; 14: 322–337.

55.

Franco-Pereira

Nakas

Reiser

, et al. Inference on the overlap coefficient: the binormal approach and alternatives. Stat Methods Med Res 2021; 30: 2672–2684.

56.

Pérez Fernández

Martínez Camblor

Filzmoser

, et al. nsROC: an R package for non-standard ROC curve analysis. R J 2018; 10: 55–77.

57.

Martínez-Camblor

Pérez-Fernández

Corral

. Efficient nonparametric confidence bands for receiver operating-characteristic curves. Stat Methods Med Res 2018; 27: 1892–1908.

58.

Abarca-Gómez

Abdeen

Hamid

, et al. Worldwide trends in body-mass index, underweight, overweight, and obesity from 1975 to 2016: a pooled analysis of 2416 population-based measurement studies in 128.9 million children, adolescents, and adults. Lancet 2017; 390: 2627–2642.

59.

Zalesin

Franklin

Miller

, et al. Impact of obesity on cardiovascular disease. Endocrinol Metab Clin North Am 2008; 37: 663–684.

60.

Grundy

. Obesity, metabolic syndrome, and cardiovascular disease. J Clin Endocr Metab 2004; 89: 2595–2600.

61.

Eckel

Alberti

KGMM

Grundy

, et al. The metabolic syndrome. Lancet 2010; 375: 181–183.

62.

Bosello

Zamboni

. Visceral obesity and metabolic syndrome. Obes Rev 2000; 1: 47–56.

63.

Shao

Shen

, et al. Waist-to-height ratio, an optimal predictor for obesity and metabolic syndrome in Chinese adults. J Nutr Health Aging 2010; 14: 782–785.

64.

Romero-Saldaña

Fuentes-Jiménez

Vaquero-Abellán

, et al. New non-invasive method for early detection of metabolic syndrome in the working population. Eur J Cardiovasc Nurs 2016; 15: 549–558.

65.

Suliga

Ciesla

Głuszek-Osuch

, et al. The usefulness of anthropometric indices to identify the risk of metabolic syndrome. Nutrients 2019; 11: 2598.

66.

Stevens

Katz

Huxley

. Associations between gender, age and waist circumference. Eur J Clin Nutr 2010; 64: 6–15.

67.

Romero-Saldaña

Tauler

Vaquero-Abellán

, et al. Validation of a non-invasive method for the early detection of metabolic syndrome: a diagnostic accuracy test in a working population. BMJ Open 2018; 8: e020476.

68.

Tian

Hothorn

, et al. An empirical comparison of two novel transformation models. Stat Med 2020; 39: 562–576.

69.

Wang

Tian

. The equivalence between Mann-Whitney Wilcoxon test and score test based on the proportional odds model for ordinal responses. In 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS). Kyoto, Japan: IEEE, pp. 1–5.

70.

Hampel

. The influence curve and its role in robust estimation. J Am Stat Assoc 1974; 69: 383–393.

71.

McLain

Ghosh

. Efficient sieve maximum likelihood estimation of time-transformation models. J Stat Theory Pract 2013; 7: 285–303.

72.

Kook

Hothorn

. Regularized transformation models: the tramnet package. R J 2021; 13: 581–594.

73.

Tamási

Hothorn

. tramME: mixed-effects transformation models using template model builder. R J 2021; 13: 581–594.

74.

Hothorn

Zeileis

. Predictive distribution modeling using transformation forests. J Comput Graph Stat 2021; 30: 1181–1196.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.46 MB

		ROC		AUC		Youden index
Reference	R package	Estimate	CB	Estimate	CI	Estimate	CI
Hothorn³⁹	tram	$✓$	$✓$	$✓$	$✓$	$✓$	$✓$
Harrell Jr^45,46	rms			$✓$	$✓$
Thas et al.³³	pim			$✓$	$✓$
Therneau⁴⁷	survival			$✓$	$✓$
Robin et al.⁴⁸	pROC	$✓$	$✓$	$✓$	$✓$	$✓$
Fay⁴⁹	asht			$✓$	$✓$
Konietschke et al.⁵⁰	nparcomp			$✓$	$✓$
Khan and Brandenburger⁵¹	ROCit	$✓$	$✓$	$✓$	$✓$	$✓$
Feng et al.⁵²	auRoc			$✓$	$✓$
Perez-Jaume et al.⁵³	ThresholdROC					$✓$	$✓$
Ridout and Linkie⁵⁴	overlap					$✓$	$✓$
Franco-Pereira et al.⁵⁵	-					$✓$	$✓$
Pèrez Fernàndez et al.⁵⁶	nsROC	$✓$	$✓$	$✓$	$✓$	$✓$

Estimating transformations for evaluating diagnostic tests with covariate adjustment

Abstract

Keywords

1. Introduction

1.1. Notation and preliminaries

2. Methods

2.1. Transformation model

2.2.1. Parameterization

2.2.3. Limit of detection

2.3. Confidence intervals

2.3.1. Score intervals

2.3.2. Delta method

2.3.3. Simulated intervals

3. Empirical evaluation

4.1. Two-sample analysis

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231176030 - Supplemental material for Estimating transformationsfor evaluating diagnostic testswith covariate adjustment

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Supplemental material

References

Supplementary Material