Sage Journals: Discover world-class research

Abstract

The discriminative and predictive power of a continuous-valued marker for survival outcomes can be summarized using the receiver operating characteristic and predictiveness curves, respectively. In this paper, fully parametric and semi-parametric copula-based constructions of the joint model of the marker and the survival time are developed for characterizing, plotting, and analyzing both curves along with other underlying performance measures. The formulations require a copula function, a parametric specification for the margin of the marker, and either a parametric distribution or a non-parametric estimator for the margin of the time to event, to respectively characterize the fully parametric and semi-parametric joint models. Estimation is carried out using maximum likelihood and a two-stage procedure for the parametric and semi-parametric models, respectively. Resampling-based methods are used for computing standard errors and confidence bounds for the various parameters, curves, and associated measures. Graphical inspection of residuals from each conditional distribution is employed as a guide for choosing a copula from a set of candidates. The performance of the estimators of various classification and predictiveness measures is assessed in simulation studies, assuming different copula and censoring scenarios. The methods are illustrated with the analysis of two markers using the familiar primary biliary cirrhosis data set.

Keywords

Area under the curve cumulative/dynamic discrimination incident/dynamic risk prediction right-censored data standardized total gain time-varying performance

Introduction

Marker development is a major research topic for supporting medical decision-making in the prognosis and timely treatment of progressive diseases. Consequently, the development and estimation of marker performance and decision-analytic measures are crucial topics in medical research. Important aspects of measurement include discrimination and prediction, which can be quantified by the receiver operating characteristic (ROC) and the predictiveness curves, respectively. For time-to-event data, both curves can be obtained when considering the status of a survival outcome at a certain time of the follow-up study, in order to both assess the performance of the marker and identify subjects that are at high risk of experiencing the outcome.

Previously proposed methodologies for studying time-varying performance measures have mainly addressed the estimation of a specific type of ROC curve only by employing non-parametric and semi-parametric (SP) techniques¹; also, the estimation of the time-varying area under the ROC curve, which is a measure that describes the discriminatory value of the marker, has been an ongoing research focus, with existing smoothing methods proving to be cumbersome when it comes to implementation.² Furthermore, the estimates of the ROC curve and its summaries when employing kernel smoothing methods are not transformation invariant; that is, monotone transformations to either the marker or the survival time, or both, may lead to different estimates for the ROC curve and its summaries.³ This is exacerbated by the fact that there is no methodology that brings both curves and their summaries together under a common framework, with a partial estimation of the predictiveness curve being the best scenario when a parametric ROC model is provided.⁴

Since employing different techniques to separately estimate measures from each curve may potentially result in misleading inferences, the correct representation of the joint model of the continuous-valued marker $M$ and the survival time $T$ should both enable the characterization of both curves and lead to coherent estimates. In this paper, two copula-based approaches are proposed to characterize such joint model. The first model is an extension of the model for cross-sectional data in Escarela et al.,⁵ which is conveniently constructed by customizing and linking with a copula a parametric marginal continuous cumulative distribution function (CDF) for $M$ and a parametric marginal survival distribution function for $T$ , leading to a fully parametric (FP) formulation. Such construction is appealing since the copula that characterizes the joint behavior of monotone increasing transforms of $M$ and $T$ is exactly the same copula as for the original pair $(M, T)$ ; that is, the unique copula associated with $(M, T)$ is invariant under monotone increasing transformations of the margins,⁶ which clearly benefits both the modeling and the inference process.

The second model represents the joint density function of the marker and the survival time by applying a latent variable transformation to the margin of the survival time in the aforementioned parametric copula model, in such a way that the transformed model allows for the marginal survival distribution of the time to event to be specified as discrete, resulting in a SP specification that is particularly useful when the parametric margin of $T$ does not appear to be adequate, or when the survival data are either discrete or subject to interval-censoring. Expressions of the ROC and predictiveness curves along with their summaries are then obtained in terms of the resulting joint models. Residual-based diagnostics are proposed to criticize the fit of the two joint models and, thus, to help in the choice of a copula. Simulation studies are carried out to assess the performance of the procedures under different copula and censoring scenarios. As an illustration, the performance of two markers for survival outcomes of the Mayo Clinic Primary Biliary Cholangitis (PBC) follow-up is studied.

Methods

Background: Marker performance measures

Define the dynamic false positive rate at cut-point $m$ and time $t$ as $FP (m, t) = Pr {M > m ∣ T > t}$ , and define the cumulative true positive and incident true positive rates at cut-point $m$ and time $t$ as

\begin{aligned} {TP}^{C} (m, t) = Pr {M > m ∣ T \leq t} & and \\ {TP}^{I} (m, t) = Pr {M > m ∣ T = t}, & respectively \end{aligned}

Two main definitions of the ROC curve have been proposed for measuring the time-dependent discriminating ability of the marker. The Cumulative/Dynamic ROC curve at time

t

{ROC}^{C / D} (q, t)

, is defined as the plot of

{[FP (c, t), {TP}^{C} (c, t)]; c \in R}

, where

R

is the real line; that is

{ROC}^{C / D} (q, t) = {TP}^{C} [{FP}^{- 1} (q, t), t]

(1)

where

{FP}^{- 1} (q, t) = inf_{c} {c : FP (q, t) \leq q}

, for

q \in [0, 1]

. The Incident/Dynamic ROC curve at time

t

{ROC}^{I / D} (q, t)

, is defined as the plot of

{[FP (c, t), {TP}^{I} (c, t)]; c \in R}

; that is

{ROC}^{I / D} (q, t) = {TP}^{I} [{FP}^{- 1} (q, t), t], q \in [0, 1]

(2)

Common measures of marker discrimination effectiveness at a specific time

t

are the areas under the ROC curve

{AUC}^{C / D} (t) = \int_{0}^{1} {ROC}^{C / D} (x, t) d x

and

{AUC}^{I / D} (t) = \int_{0}^{1} {ROC}^{I / D} (x, t) d x

; here,

{AUC}^{C / D} (t)

is equal to the probability that the marker will assign a higher value to a randomly chosen subject who has died by time

t

than to a randomly chosen subject who is still alive by time

t

, whereas

{AUC}^{I / D} (t)

is the probability that the marker will assign a higher value to a randomly chosen subject who dies at time

t

than to a randomly chosen subject who is still alive by time

t

{AUC}^{C / D} (t)

is useful in a setting where discrimination accuracy of short-term survival has to be assessed, and

{AUC}^{I / D} (t)

is a summary that indicates time-varying performance with no need to select a specific time-frame.¹

The risk function at time $t$ is defined as $Risk (m, t) = Pr {T \leq t ∣ M = m}$ . If $Risk (m, t)$ is monotonous increasing in $m$ at time $t$ , the time-dependent predictiveness curve is defined as

R (v, t) = Pr {T \leq t ∣ M = F_{M}^{- 1} (v)}

(3)

where

F_{M}^{- 1} (\cdot)

is the quantile function corresponding to the marginal CDF of

M

, denoted here by

F_{M} (\cdot)

. The predictiveness function in equation (3) is the plot of

{[F_{M} (m), Risk (m, t)]; m \in R}

. For a fixed time

t

R (v, t)

displays the distribution of the predicted risk of the event occurring within time

t

, and its inverse is the CDF of the risk function; that is,

R^{- 1} (u, t) = Pr {Risk (M, t) \leq u}

, for

u \in (0, 1)

. In clinical decision-making, pre-specified percentiles of

R^{- 1}

are used as thresholds for clustering subjects who share similar risks; for instance, if

p_{U}

is the percentile for the high-risk group according to a prespecified criterion, then

1 - R^{- 1} (p_{U}, t)

is the proportion of the population deemed to be at high risk. A summary measure of time-dependent risk prediction is the total gain, which is defined as

TG (t) = \int_{0}^{1} | R (v, t) - Pr {T \leq t} | d v

High values of total gain are obtained when the predictiveness curve is steep, which is a consequence of a useful predictive marker, reaching a maximum at time

t

equal to

2 S (t) [1 - S_{T} (t)]

.⁷ An alternative summary quantification of predictiveness is the time-varying standardized total gain, which is defined as

STG (t) = TG (t) / {2 S (t) [1 - S_{T} (t)]}

, and provides a time-dependent measure that allows for comparisons across different studies.

A global measure of marker usefulness can be established using a measure of concordance between the marker and the outcome such as the C-index, whose definition for survival outcomes is given by the probability that an individual who died at an earlier time $T_{j}$ than another individual’s $T_{k}$ has a larger value of the marker; that is⁸:

C-index = Pr {M_{j} > M_{k} ∣ T_{j} < T_{k}}

(4)

Extending the results in Liu et al.,⁹ it can be shown that the C-index can be expressed in terms of a linear combination of Kendall’s tau; namely,

C-index = (1 - τ) / 2

, where

τ

is Kendall’s tau, defined by

\begin{aligned} τ & = Pr {(M_{j} - M_{k}) (T_{j} - T_{k}) > 0} \\ - Pr {(M_{j} - M_{k}) (T_{j} - T_{k}) < 0} \end{aligned}

(5)

with

(M_{j}, T_{j})

and

(M_{k}, T_{k})

being independent copies of

(M, T)

. Markers whose association with the time to event lies between that of countermonotonicity, where

C-index = 1

and

τ = - 1

, and that of independence, where

C-index = 1 / 2

and

τ = 0

, are expected to exhibit useful discriminative and predictive properties.

Copula modeling

A strategy for constructing a joint model for the continuous random vector $(M, T)$ is provided by the copula function, which is defined as a bivariate distribution function whose margins are uniform on $[0, 1]$ . In the usual setting, given continuous CDF’s $F_{X} (x) = Pr {X \leq x}$ and $F_{Y} (y) = Pr {Y \leq y}$ of $X$ and $Y$ , respectively, Sklar’s theorem indicates that there is a unique copula $C (\cdot, \cdot)$ such that the joint CDF of $(X, Y)$ can be defined as $F_{X Y} (x, y) \equiv C [F_{X} (x), F_{Y} (y)]$ . Since some popular copulas mainly represent dependence structures whose association, as measured by Kendall’s tau, mainly cover the positive range for $F_{X Y} (x, y)$ , this study instead adopts the following joint representation:

\begin{aligned} H_{M T} (m, t) & = Pr {M \leq m, T > t} \\ \equiv C [F_{M} (m), S_{T} (t)] \end{aligned}

(6)

The formulation in equation (6) is known as the semi-survival copula and the association between

M

and

T

is the negative of that obtained when

C

enters the model as in the joint CDF of

(M, T)

,¹⁰, thus allowing popular copula families, whose association range mainly cover the interval (0,1), to model useful discriminative and predictive markers.

Define $C_{j}^{'} (u, v) = \partial C (x_{1}, x_{2}) / \partial x_{j} |_{(x_{1}, x_{2}) = (u, v)}$ , for $j = 1, 2$ . It can be shown that

\begin{aligned} FP (m, t) & = 1 - H_{M T} (m, t) / S_{T} (t) \\ {TP}^{C} (m, t) & = \frac{H_{M T} (m, t) - F_{M} (m) - S_{T} (t) + 1}{1 - S_{T} (t)} \\ {TP}^{I} (m, t) & = 1 - C_{2}^{'} [F_{M} (m), S_{T} (t)] and \\ Risk (m, t) & = 1 - C_{1}^{'} [F_{M} (m), S_{T} (t)] \end{aligned}

Also, in the semi-survival copula setting given by equation (6), it can be shown that the corresponding Kendall’s tau can be expressed in terms of the underlying copula as Nelsen¹¹

τ = 1 - 4 \int \int_{[0, 1]^{2}} C (u, v) d C (u, v)

, which is free of marginal effects.

When $T$ follows a non-parametric distribution in the form of a step-wise function, with jumps at times $t_{1} < t_{2} < \dots < t_{r}$ , it is possible to adopt the latent variable strategy described in de Leon and Wu¹² in order to construct the joint model for $(M, T)$ . Assume that the random couple $(M, Y)$ is continuous, with $Y$ taking values in the positive reals, and let $T = t_{k} I_{(t_{k - 1}, t_{k}]} (Y)$ , where $I_{A} (x) = 1$ if $x \in A$ and $I_{A} (x) = 0$ otherwise, $t_{0} = 0$ and $k = 1, 2, \dots r$ , then

\begin{aligned} Pr {M \leq m, T = t_{k}} & = Pr {M \leq m, Y > t_{k - 1}} \\ - Pr {M \leq m, Y > t_{k}} \end{aligned}

(7)

The joint density function

f_{M T}^{(sp)} (m, t)

(M, T)

is obtained by differentiating equation (7) with respect to

m

, and can be written in terms of the copula representation in equation (6) as follows:

\begin{aligned} f_{M T}^{(sp)} (m, t_{k}) & = f_{M} (m) {C_{1}^{'} [F_{M} (m), S_{T}^{(np)} (t_{k - 1})] \\ - C_{1}^{'} [F_{M} (m), S_{T}^{(np)} (t_{k})]} \end{aligned}

(8)

for

k = 1, 2, \dots, r

and

m \in R

; here, and

S_{T}^{(np)} (t) = Pr {T > t}

is an ordinal discrete survival distribution, with

S_{T}^{(np)} (t) = S_{T}^{(np)} (t_{k})

for

t \in [t_{k}, t_{k + 1})

. The corresponding joint CDF, defined as

F_{M T}^{(np)} (m, t) = Pr {M \leq m, T \leq t}

, can be computed as:

F_{M T}^{(sp)} (m, t) = \int_{- \infty}^{m} \sum_{k : t_{k} \leq t} f_{M T}^{(sp)} (x, t_{k}) d x

(9)

It follows that

\begin{aligned} FP (m, t) & = 1 - \frac{F_{M} (m) - F_{M T}^{(sp)} (m, t)}{S_{T}^{(np)} (t)} \\ {TP}^{C} (m, t) & = 1 - \frac{F_{M T}^{(sp)} (m, t)}{1 - S_{T}^{(np)} (t)} \\ {TP}^{I} (m, t) & = 1 - \frac{\int_{- \infty}^{m} f_{M T}^{(sp)} (x, t) d x}{S_{T}^{(np)} (t_{k - 1}) - S_{T}^{(np)} (t_{k})} \end{aligned}

for

t \in (t_{k - 1}, t_{k}]

, and

Risk (m, t) = \frac{\sum_{k : t_{k} \leq t} f_{M T}^{(sp)} (m, t_{k})}{f_{M} (m)}

Inference

Estimation

Consider the pair of marker and survival time $(M_{i}, T_{i})$ , and let $D_{i}$ denote an independent right-censoring time for independent subjects $i = 1, \dots, n$ . Let $m_{i}$ be the observed value of $M_{i}$ , and let $t_{i}$ be the observed survival time of $min (T_{i}, D_{i})$ and let $δ_{i}$ be the corresponding censoring status $I (T_{i} = t_{i})$ , where $I (\cdot)$ is an indicator function. For the FP model in equation (6), the observed likelihood function is given by

\begin{aligned} L (θ, ω, λ) & = \prod_{i = 1}^{n} {[- {\frac{\partial^{2} H_{M T} (x, y)}{\partial x \partial y} |}_{(x, y) = (m_{i}, t_{i})}]}^{δ_{i}} \\ \times {[{\frac{\partial H_{M T} (x, y)}{\partial x} |}_{(x, y) = (m_{i}, t_{i})}]}^{1 - δ_{i}} \\ = \prod_{i = 1}^{n} {f_{M} (m_{i}) f_{T} (t_{i}) c [F_{M} (m_{i}), S_{T} (t_{i})]}^{δ_{i}} \\ \times {f_{M} (m_{i}) C_{1}^{'} [F_{M} (m_{i}), S_{T} (t_{i})]}^{1 - δ_{i}} \end{aligned}

where

θ

is the dependence parameter of the copula function,

ω

and

λ

are the vector of parameters of

F_{M} (m)

and

S_{T} (t)

, respectively,

f_{M} (m) = d F_{M} (m) / d m

, and

c (u, v) = \partial^{2} C (u, v) / \partial u \partial v

. In this study, the function nlm of the R language was employed to find the optimum of

- \log L (θ, ω, λ)

, and the corresponding Hessian. Standard errors (SEs) for the parameters can be computed as the squared root of the diagonal entries in the inverse of the observed information matrix.

For the SP model in equation (8), a two-stage estimation procedure was adopted. The first stage involves the estimation of both margins $F_{M} (m)$ and $S_{T}^{(np)} (t)$ under the independence working assumption; here, maximum likelihood is used to estimate $ω$ , and the Kaplan–Meier estimator is used for $S_{T}^{(np)} (t)$ . The second stage involves maximum likelihood of the dependence parameter $θ$ with the margins held fixed from the first stage; that is, by maximizing the following likelihood function:

\begin{aligned} L (θ) & = \prod_{i = 1}^{n} {[{\hat{f}}_{M T}^{(sp)} (m_{i}, t_{i})]}^{δ_{i}} \\ \times {[{\hat{f}}_{M} (m_{i}) - \sum_{k : t_{k} \leq t_{i}} {\hat{f}}_{M T}^{(sp)} (m_{i}, t_{k})]}^{1 - δ_{i}} \end{aligned}

where

{\hat{f}}_{M} (m)

and

{\hat{f}}_{M T}^{(sp)} (m, t)

are the estimators of

f_{M} (m)

and

f_{M T}^{(sp)} (m, t)

, respectively.

SE’s and confidence intervals (CI’s) for the various parameters and quantities can be obtained by extending the conditional bootstrap for univariate censored data in Karrison¹³. The resulting procedure is in fact a specialization of the non-parametric bootstrap for a copula model with censoring presented in Lawless and Yilmaz¹⁴, and is described as follows:

First, obtain the estimates of $θ$ and $ω$ , and of either $λ$ or $S_{T}^{(n p)} (t)$ depending on the model specification. These are then employed to characterize the populations from which the pairs of marker and survival time are sampled.

Generate random pairs $U^{*} = {(u_{i}^{*}, v_{i}^{*}); i = 1, \dots, n}$ from the fitted copula model $C (u, v; \hat{θ})$ , and then obtain $m_{i}^{*} = {\hat{F}}_{M}^{- 1} (u_{i}^{*})$ and $y_{i}^{*} = {\hat{F}}_{T}^{- 1} (1 - v_{i}^{*})$ , for $i = 1, \dots, n$ .

Simulate censoring times ${d_{i}^{*}; i = 1, \dots n}$ from an estimated censoring distribution, which can be obtained from the Kaplan–Meier estimate of ${(t_{i}, 1 - δ_{i}); i = 1, \dots, n}$ , and then let $t_{i}^{*} = min (y_{i}^{*}, d_{i}^{*})$ and $δ_{i}^{*} = I (T_{i}^{*} = t_{i}^{*})$ .

Obtain the estimates ${\hat{θ}}^{*}$ , ${\hat{ω}}^{*}$ , ${\hat{λ}}^{*}$ for the bootstrap dataset ${(m_{i}^{*}, t_{i}^{*}, δ_{i}^{*}); i = 1, \dots n}$ .

Compute the statistic of interest ${\hat{s}}^{*} = s ({\hat{θ}}^{*}, {\hat{ω}}^{*}, {\hat{λ}}^{*})$ .

Repeat steps 2–5 $B$ number of times.

With the sequence of bootstrap estimates

{\hat{s}}^{*}

, the corresponding standard deviation is estimated by computing the standard deviation of the estimates. Also, a

100 (1 - α) %

for

s

can be estimated using the pivotal method,¹⁵ which is computed as

(2 \hat{s} - {\tilde{F}}_{{\hat{s}}^{*}}^{- 1} (α / 2), 2 \hat{s} - {\tilde{F}}_{{\hat{s}}^{*}}^{- 1} (1 - α / 2))

, where

\hat{s}

is the point estimator of

s

from the observed dataset, and

{\tilde{F}}_{{\hat{s}}^{*}}^{- 1} (\cdot)

is the empirical quantile function corresponding to the CDF of the bootstrap estimates

{\hat{s}}^{*}

. In this study, the R package

c o p u l a

was employed to obtain

U^{*}

from each copula family described below. In the illustration below, the implementation of the bootstrap procedure did not appear to be computationally demanding.

A strategy for copula selection from a given set can be performed using model diagnostics for each conditional distribution of each copula model, as both conditional distributions uniquely determine the joint model. Model criticism for the conditional distribution of $T$ given $M$ can be borrowed from the survival analysis literature because the marker $M$ is assumed to be always observed. Let ${\hat{Λ}}_{i} = - \log [{\hat{S}}_{T ∣ M} (t_{i} ∣ M = m_{i})]$ be the estimated conditional cumulative hazard at time $t_{i}$ given that $M = m_{i}$ , with ${\hat{S}}_{T ∣ M} (t ∣ M = m)$ being the estimate of $Pr {T > t ∣ M = m}$ and computed as the estimate of $1 - Risk (m, t)$ . If ${\hat{S}}_{Λ} (Λ)$ denotes the Kaplan–Meier for ${({\hat{Λ}}_{i}, δ_{i}); i = 1, \dots, n}$ and the conditional distribution of $T$ given $M$ is correct, then the plot of $\log [- \log {\hat{S}}_{Λ} ({\hat{Λ}}_{i})]$ vs $\log ({\hat{Λ}}_{i})$ for $i = 1, \dots, n$ , should roughly yield a straight line with slope $1$ .

Residuals for assessing the distribution of $M$ given $T$ can be obtained by modifying Dunn and Smyth’s¹⁶ normalized quantile residuals for a univariate continuous CDF $F_{X} (x)$ . If $(x_{1}, x_{2}, \dots, x_{n})$ denotes the observed random sample, such quantile residuals are defined as $r_{i} = Φ^{- 1} ({\hat{F}}_{X} (x_{i}))$ , where ${\hat{F}}_{X} (x_{i})$ denotes the fitted distribution function of $X$ for the $i$ -th subject. If $F_{X} (x)$ is correct, the residuals are independent and exactly normal, and thus usual graphic inspections, such as the quantile-quantile plot, can be employed for the assessment; in the current conditional context, however, such residuals $r_{M ∣ T} (m_{i}, t_{i})$ must be defined according to the censoring status as either the estimate of the conditional CDF of $M$ given that $T$ has been observed, or the conditional CDF of $M$ given that $T$ exceeds the censoring time; that is,

\begin{aligned} r_{M ∣ T} (m_{i}, t_{i}) & = δ_{i} \hat{Pr} {M \leq m_{i} ∣ T = t_{i}} \\ + (1 - δ_{i}) \hat{Pr} {M \leq m_{i} ∣ T > t_{i}} \\ = δ_{i} [1 - \hat{{TP}^{I}} (m_{i}, t_{i})] \\ + (1 - δ_{i}) [1 - \hat{FP} (m_{i}, t_{i})] \end{aligned}

Families of copulas and margins

This study employed the four most commonly applied copula families, including the Gaussian which is defined as $C (u, v; ρ) = Φ_{2} [Φ^{- 1} (u), Φ^{- 1} (v); ρ]$ , where $Φ_{2} (\cdot, \cdot; ρ)$ is the joint CDF of the bivariate Gaussian distribution with the vector of means equal to $(0, 0)^{T}$ , and the covariance matrix equal to a $2 \times 2$ non-singular matrix with $1$ in each diagonal entry and the dependence parameter $ρ$ in each off-diagonal entry, $ρ \in [- 1, 1]$ , and $Φ^{- 1} (\cdot)$ is the inverse of the standard normal CDF $Φ (\cdot)$ . The corresponding conditional copula and the copula density functions are¹⁷

\begin{aligned} C_{1}^{'} (u, v; ρ) & = Φ ([Φ^{- 1} (v) - ρ Φ^{- 1} (u)] / \sqrt{1 - ρ^{2}}), and \\ c (u, v; ρ) & = \exp {\frac{2 ρ Φ^{- 1} (v) Φ^{- 1} (v)}{2 (1 - ρ^{2})} \\ - \frac{2 ρ Φ^{- 1} (v) Φ^{- 1} (v)}{2 (1 - ρ^{2})}} / \sqrt{1 - ρ^{2}} \end{aligned}

respectively; also, it can be verified¹⁸ that Kendall’s tau for

H (m, t)

is given by

τ = - (2 / π) \arcsin (ρ)

, implying that the range of useful markers corresponds to

ρ \in (0, 1)

. In the estimation, this study adopted an arctanh-link (Fisher’s transformation) for

ρ

, so that

ρ

takes values in

[- 1, 1]

Three popular copulas belonging to the Archimedean class were also employed in this study. The Archimedean class considered here is represented as $C (u, v; θ) = ψ_{θ}^{- 1} [ψ_{θ} (u) + ψ_{θ} (v)]$ , where $ψ_{θ} (\cdot)$ is a convex decreasing generating function with parameter $θ$ that satisfy $ψ_{θ} (0) = \infty$ and $ψ_{θ} (1) = 0$ . It can be shown that the conditional copula and the copula density functions can be written in terms of the generating function and its derivatives as¹⁹:

\begin{aligned} C_{1}^{'} (u, v; θ) & = ψ_{θ}^{'} (u) / ψ_{θ}^{'} [C (u, v; θ)] and \\ c (u, v; θ) & = - \frac{ψ_{θ}^{'} (u) ψ_{θ}^{'} (v) ψ_{θ}^{″} [C (u, v; θ)]}{{ψ_{θ}^{'} [C (u, v; θ)]}^{3}} \end{aligned}

respectively, and that Kendall’s tau for

H (m, t)

is given by²⁰:

τ = - 1 - 4 \int_{0}^{1} ψ_{θ} (t) / ψ_{θ}^{'} (t) d t

This study adopted the following Archimedean copulas, which are characterized by their generating functions:

\begin{aligned} Clayton: \, & ψ_{θ} (t) = t^{- θ} - 1, θ \in (0, \infty) \\ Frank: \, & ψ_{θ} (t) = - \log (\frac{\exp {- θ t} - 1}{\exp {- θ} - 1}), θ \in R - {0} \\ Gumbel: \, & ψ_{θ} (t) = (- \log t)^{θ}, θ \in (1, \infty) \end{aligned}

In the estimation procedures described above, the dependence parameter of the Clayton and Gumbel copulas were parameterized as

θ = \exp (κ)

and

θ = \exp (κ) + 1

, respectively, so that the parameter space is bounded accordingly while the parameter

κ

takes values in the real line.

Since biomarkers are often skewed,²¹ this study adopted the skewed-normal distribution for the marginal distribution of $M$ , whose probability density function (PDF) is characterized by²²:

f_{M} (m; ω) = \frac{2}{ω_{2}} ϕ (\frac{m - ω_{1}}{ω_{2}}) Φ [ω_{3} (\frac{m - ω_{1}}{ω_{2}})]

(10)

where

m \in R

ω = (ω_{1}, ω_{2}, ω_{3})

, with

ω_{1} \in (- \infty, \infty)

ω_{2} \in (0, \infty)

and

ω_{3} \in (- \infty, \infty)

being the location, scale and shape parameters, respectively, and

ϕ (\cdot)

denotes the standard normal PDF. For the parametric model, it was assumed that the marginal distribution of the survival time is Weibull, with the corresponding CDF given by:

F_{T} (t; λ_{1}, λ_{2}) = 1 - \exp {- {(\frac{t}{λ_{2}})}^{λ_{1}}}

(11)

where

t \in (0, \infty)

, and

λ_{1} \in (0, \infty)

and

λ_{2} \in (0, \infty)

are the shape and scale parameters, respectively. In the estimation process, this study employed a log-link for the parameters whose parameter-space is

(0, \infty)

Simulation studies

To evaluate and investigate the performance of the approaches developed here, datasets under the Frank and Gumbel copula models with an assumed association corresponding to $τ = - 0.5$ , and two censoring scenarios were generated and then fitted using the FP and SP models. $F_{M} (m)$ was taken as the skewed-normal distribution characterized in equation (10), with $ω_{1} = 0$ , $ω_{2} = 1$ and $ω_{3} = 2$ , and $F_{T} (t)$ was taken as the Weibull represented in equation (11), with $λ_{1} = 1.5$ and $λ_{2} = 1$ . The censoring times were generated from a uniform distribution over $[0, b]$ , where $b$ was determined according to a predetermined proportion of censored times $Pr {T > D}$ . Under the assumed families of distributions for $D$ and $T$ , it can be shown that if $D$ and $T$ are independent

Pr {T > D} = \frac{λ_{2}}{b λ_{1}} {Γ^{(upper)} [\frac{1}{λ_{1}}, {(\frac{b}{λ_{2}})}^{λ_{1}}] - Γ (\frac{1}{λ_{1}})}

where

Γ^{(upper)} (\cdot, \cdot)

is the upper incomplete gamma function and

Γ (\cdot)

is the gamma function; thus, for the

40 %

and

60 %

of censored times considered in the simulations below,

b = 2.2189

and

b = 1.3207

, respectively.

In this study, 1000 datasets of independent vectors ${(u_{i}, v_{i}); i = 1, \dots, 250}$ from the Gumbel copula with uniform margins on $[0, 1]$ were generated. The marker and actual survival time were computed as $m_{i} = {\hat{F}}_{M}^{- 1} (u_{i})$ and $y_{i} = {\hat{F}}_{T}^{- 1} (1 - v_{i})$ , respectively. For each dataset, censoring times ${d_{i}; i = 1, \dots n}$ from a uniform distribution over $[0, b]$ were generated, and the survival time and the corresponding censoring indicator were computed as $t_{i} = min (y_{i}, d_{i})$ and $δ_{i} = I (T_{i} = t_{i})$ , respectively. The FP and SP models were then fitted to the resulting dataset, given by ${(m_{i}, t_{i}, δ_{i}); i = 1, \dots n}$ ; here, the parametric families were correctly specified in the corresponding components to construct both models. Results of bias and root mean squared error (RMSE) of the estimators from the FP and SP models for ${ROC}^{C / D} (q, t)$ , ${AUC}^{C / D} (t)$ , ${ROC}^{I / D} (q, t)$ , ${AUC}^{I / D} (t)$ , $R (q, t)$ and $STG (t)$ , for $q = 0.25$ , and $q = 0.50$ at the median $t_{0.50}$ and third quartile $t_{0.75}$ of $T$ , classified by $40 %$ and $60 %$ of censored times, for the Frank and Gumbel copula models are shown, respectively, in Tables (1) and (2).

Table 1.

Bias and MSE for estimators of ${ROC}^{C / D} (q, t)$ , ${AUC}^{C / D} (t)$ , ${ROC}^{I / D} (q, t)$ , ${AUC}^{I / D} (t)$ , $R (q, t)$ and $STG (t)$ , at $q = 0.25, 0.50$ and $t = t_{0.50}, t_{0.75}$ for the FP and SP models, from $1, 000$ simulations of datasets of size $n = 250$ , classified by $40 %$ and $60 %$ of censoring of a Frank copula model corresponding to $τ = - 0.5$ .

$Censoring = 40 %$
	$t_{0.50}$		$t_{0.75}$
	FP	SP		FP	SP
	Bias	RMSE	Bias	RMSE		Bias	RMSE	Bias	RMSE
${ROC}^{C / D} (0.25, t)$	$0.0003$	$0.0318$	− $0.0044$	$0.0321$		$0.0006$	$0.0288$	− $0.0028$	$0.0293$
${ROC}^{C / D} (0.50, t)$	− $0.0003$	$0.0152$	− $0.0028$	$0.0159$		− $0.0003$	$0.0151$	− $0.0035$	$0.0160$
${AUC}^{C / D} (t)$	− $0.0001$	$0.0191$	− $0.0022$	$0.0193$		− $0.0001$	$0.0186$	− $0.0021$	$0.0188$
${ROC}^{I / D} (0.25, t)$	$0.0029$	$0.0331$	$0.0027$	$0.0364$		$0.0022$	$0.0250$	$0.0042$	$0.0295$
${ROC}^{I / D} (0.50, t)$	$0.0014$	$0.0262$	− $0.0007$	$0.0292$		$0.0012$	$0.0260$	$0.0006$	$0.0303$
${AUC}^{I / D} (t)$	$0.0010$	$0.0196$	$0.0005$	$0.0218$		$0.0009$	$0.0178$	$0.0015$	$0.0209$
$R (0.25, t)$	− $0.0003$	$0.0364$	$0.0036$	$0.0460$		$0.0007$	$0.0537$	$0.0018$	$0.0693$
$R (0.50, t)$	− $0.0025$	$0.0432$	− $0.0015$	$0.0594$		− $0.0002$	$0.0280$	− $0.0015$	$0.0366$
$STG (t)$	− $0.0001$	$0.0344$	− $0.0031$	$0.0345$		$0.0002$	$0.0335$	− $0.0031$	$0.0337$
$Censoring = 60 %$
	$t_{0.50}$		$t_{0.75}$
	FP	SP		FP	SP
	Bias	RMSE	Bias	RMSE		Bias	RMSE	Bias	RMSE
${ROC}^{C / D} (0.25, t)$	− $0.0009$	$0.0381$	− $0.0061$	$0.0386$		− $0.0004$	$0.0348$	− $0.0065$	$0.0358$
${ROC}^{C / D} (0.50, t)$	− $0.0011$	$0.0185$	− $0.0040$	$0.0195$		− $0.0010$	$0.0185$	− $0.0046$	$0.0202$
${AUC}^{C / D} (t)$	− $0.0007$	$0.0229$	− $0.0033$	$0.0231$		− $0.0007$	$0.0224$	− $0.0033$	$0.0228$
${ROC}^{I / D} (0.25, t)$	$0.0017$	$0.0406$	$0.0032$	$0.0440$		$0.0001$	$0.0335$	$0.0242$	$0.0410$
${ROC}^{I / D} (0.50, t)$	$0.0020$	$0.0324$	− $0.0013$	$0.0352$		$0.0005$	$0.0346$	$0.0175$	$0.0467$
${AUC}^{I / D} (t)$	$0.0001$	$0.0242$	$0.0004$	$0.0263$		$0.0005$	$0.0237$	$0.0139$	$0.0333$
$R (0.25, t)$	$0.0028$	$0.0455$	$0.0079$	$0.0550$		$0.0012$	$0.0759$	$0.0038$	$0.1131$
$R (0.50, t)$	− $0.0001$	$0.0519$	$0.0038$	$0.0669$		− $0.0005$	$0.0385$	− $0.0052$	$0.0597$
$STG (t)$	− $0.0011$	$0.0410$	− $0.0047$	$0.0412$		− $0.0007$	$0.0403$	− $0.0043$	$0.0407$

RMSE: root mean squared error; FP: fully parametric; SP: semi-parametric; ROC: receiver operating characteristic; MSE: mean squared error; AUC: area under the curve.

Table 2.

$Censoring = 40 %$
	$t_{0.50}$		$t_{0.75}$
	FP	SP		FP	SP
	Bias	RMSE	Bias	RMSE		Bias	RMSE	Bias	RMSE
${ROC}^{C / D} (0.25, t)$	$0.0026$	$0.0324$	− $0.0026$	$0.0341$		− $0.0023$	$0.0334$	− $0.0036$	$0.0343$
${ROC}^{C / D} (0.50, t)$	− $0.0003$	$0.0184$	− $0.0032$	$0.0199$		− $0.0014$	$0.0188$	− $0.0029$	$0.0199$
${AUC}^{C / D} (t)$	$0.0013$	$0.0197$	− $0.0016$	$0.0208$		$0.0015$	$0.0203$	− $0.0014$	$0.0209$
${ROC}^{I / D} (0.25, t)$	$0.0033$	$0.0258$	$0.0032$	$0.0294$		− $0.0018$	$0.0186$	− $0.0034$	$0.0204$
${ROC}^{I / D} (0.50, t)$	$0.0014$	$0.0237$	− $0.0009$	$0.0266$		$0.0023$	$0.0209$	$0.0010$	$0.0225$
${AUC}^{I / D} (t)$	$0.0016$	$0.0167$	$0.0008$	$0.0189$		$0.0012$	$0.0140$	$0.0012$	$0.0152$
$R (0.25, t)$	− $0.0001$	$0.0344$	$0.0030$	$0.0446$		$0.0016$	$0.0501$	$0.0013$	$0.0624$
$R (0.50, t)$	$0.0015$	$0.0432$	$0.0007$	$0.0544$		$0.0022$	$0.0341$	− $0.0010$	$0.0427$
$STG (t)$	$0.0037$	$0.0343$	− $0.0003$	$0.0360$		$0.0024$	$0.0352$	− $0.0017$	$0.0361$
$Censoring = 60 %$
	$t_{0.50}$		$t_{0.75}$
	FP	SP		FP	SP
	Bias	RMSE	Bias	RMSE		Bias	RMSE	Bias	RMSE
${ROC}^{C / D} (0.25, t)$	$0.0007$	$0.0368$	− $0.0061$	$0.0389$		− $0.0007$	$0.0368$	− $0.0058$	$0.0380$
${ROC}^{C / D} (0.50, t)$	− $0.0009$	$0.0211$	− $0.0054$	$0.0229$		− $0.0005$	$0.0208$	− $0.0043$	$0.0220$
${AUC}^{C / D} (t)$	$0.0001$	$0.0224$	− $0.0037$	$0.0236$		$0.0005$	$0.0223$	− $0.0027$	$0.0231$
${ROC}^{I / D} (0.25, t)$	$0.0025$	$0.0317$	$0.0024$	$0.0348$		$0.0016$	$0.0241$	$0.0172$	$0.0329$
${ROC}^{I / D} (0.50, t)$	$0.0003$	$0.0287$	− $0.0024$	$0.0313$		$0.0018$	$0.0265$	$0.0134$	$0.0323$
${AUC}^{I / D} (t)$	$0.0009$	$0.0203$	− $0.0001$	$0.0222$		$0.0009$	$0.0178$	$0.0101$	$0.0223$
$R (0.25, t)$	$0.0015$	$0.0444$	$0.0080$	$0.0551$		$0.0012$	$0.0714$	$0.0042$	$0.1048$
$R (0.50, t)$	$0.0009$	$0.0507$	$0.0033$	$0.0635$		$0.0003$	$0.0444$	− $0.0042$	$0.0683$
$STG (t)$	$0.0018$	$0.0390$	− $0.0040$	$0.0407$		$0.0008$	$0.0387$	− $0.0038$	$0.0399$

RMSE: root mean squared error; FP: fully parametric; SP: semi-parametric; ROC: receiver operating characteristic; MSE: mean squared error; AUC: area under the curve.

Both Tables show similar results, with the FP model tending to exhibit smaller biases and RMSE’s than the estimators of the SP model, more notably when censoring is $60 %$ . Discrepancies between the two estimators do not appear to be relevant for all scenarios, and there are no noticeable differences or trends when varying $q$ and $t$ . In general, all estimators perform well, showing relatively small biases and RMSE’s.

Illustration: Application to the PBC dataset

The methods presented in this study are illustrated using data from the Mayo Clinic trial in PBC conducted between 1974 and 1984, which is available in the survival package of the R program. PBC is an autoimmune disease in which the bile ducts in the liver are slowly destroyed, leading to irreversible scarring of liver tissue (cirrhosis), and eventually liver decompensation and, consequently, premature death.²³ Patients with PBC who are at high risk of death can potentially benefit from a liver transplant, the only curative treatment for PBC²⁴; therefore, identifying such high-risk group with a marker is crucial for medical decision making.

The dataset considered here consisted of a cohort of 312 patients with PBC. There were 125 (40%) deaths and 19 liver transplant recipients during the follow-up. The main goal of the study was to assess the time-varying discriminative and predictive power of a marker in mortality outcomes after registration, and thus the time to transplantation was taken as a censored time. Similar to the illustration in Bansal and Heagerty,²⁵ this study analyzed the performance of the following two markers, which were computed as the linear predictor obtained from the usual Cox regression model: (a) 4-cov, from the model that includes orthogonal polynomials of degree 1 of albumin and age, natural logarithm of prothobin time and edema status (two levels: Edema despite diuretic therapy and otherwise); and (b) mayo, from adding the natural logarithm of bilirubin to the set of covariates in 4-cov, thus emulating the Mayo marker.²⁶

Figure 1 displays conditional residual plots for $T$ given $M$ and $M$ given $T$ of FP and SP models constructed with the Clayton, Frank, Gaussian and Gumbel copulas. The Clayton and Gaussian copulas yield the worst fits, whereas the Frank and Gumbel copulas appear to give reasonable fits to the 4-cov and mayo markers, respectively. The plots show negligible discrepancies when comparing the residuals of the FP and SP models for each copula and marker. Accordingly, this study adopted the Frank and Gumbel copulas for the 4-cov and mayo markers, respectively, in both the FP and SP models.

Figure 1.

Conditional residual plots from the fully parametric and semi-parametric model fits to the markers 4-cov and mayo using the Clayton, Frank, Gaussian and Gumbel copulas.

For the FP Frank copula model of 4-cov, it was found that the parameter $\log λ_{1}$ of the Weibull distribution was non-significant ( $p -value = 0.55$ ), and then it was set to $0$ in order to obtain the best-fitting model; that is, an exponential distribution was used for the margin of $T$ . The maximum likelihood estimators (MLE’s) of the resulting model were: $\hat{θ} = 5.498$ (SE $0.560$ ), ${\hat{ω}}_{1} = 8.841$ (SE $0.490$ ), ${\hat{\log ω}}_{2} = 0.490$ (SE $0.050$ ), ${\hat{ω}}_{3} = 4.503$ (SE $0.825$ ) and ${\hat{\log λ}}_{2} = 8.398$ (SE $0.081$ ); here, the corresponding Kendall’s tau was estimated as $\hat{τ} = - 0.487$ , with its 95% bootstrap CI computed as $(- 0.535, - 0.421)$ . The two-stage estimates of the SP Frank copula model for 4-cov were: $\hat{θ} = 5.061$ (SE $0.544$ ), ${\hat{ω}}_{1} = 8.825$ (SE $0.067$ ), ${\hat{\log ω}}_{2} = 0.462$ (SE $0.051$ ) and ${\hat{ω}}_{3} = 4.561$ (SE $1.101$ ), with $\hat{τ} = - 0.420$ and its 95% bootstrap CI computed as $(- 0.519, - 0.389)$ .

For the FP Gumbel model of mayo, the parameter $κ$ that is linked to the dependence parameter as $θ = \exp (κ) + 1$ turned out not to be significant ( $p -value = 0.69$ ), and then it was set to $0$ , and therefore $θ$ was set to $2$ , which corresponds to $τ = - 0.5$ . The resulting parsimonious model yielded the following MLE’s: ${\hat{ω}}_{1} = 5.988$ (SE $0.077$ ), ${\hat{\log ω}}_{2} = 0.842$ (SE $0.040$ ), ${\hat{ω}}_{3} = 6.301$ (SE $1.289$ ), ${\hat{\log λ}}_{1} = 0.136$ (SE $0.063$ ) and ${\hat{\log λ}}_{2} = 8.422$ (SE $0.082$ ). The two-stage estimation for the SP Gumbel copula model of mayo yielded $\hat{θ} = 2.022$ , with the corresponding 95% bootstrap CI computed as $(1.802, 2.350)$ ; consequently, $θ$ was set to $2$ as well, obtaining then the following estimates: ${\hat{ω}}_{1} = 5.975$ (SE $0.083$ ), ${\hat{\log ω}}_{2} = 0.841$ (SE $0.049$ ) and ${\hat{ω}}_{3} = 6.595$ (SE $2.235$ ).

Figure 2 shows estimates and 95% confidence bounds of ${ROC}^{C / D} (q, t)$ , ${ROC}^{I / D} (q, t)$ and $R (q, t)$ for both markers at year 1, 4 and 6 from both the FP and SP copula models; here, the confidence bounds were constructed by joining pointwise CIs for each curve at various cut points of the marker quantile $q$ . It is clear that both models yield very similar fitted curves, with the confidence bounds being slightly wider for the SP model. All ROC curves indicate that both markers provide useful discriminating power for either outcome; in addition, the shapes of the confidence bounds of the ROC curves suggest that the corresponding population curves are at least approximately convex, which is a crucial feature for rational decision making,²⁷ that implies that larger values of the marker are associated with a higher likelihood of outcome presence (e.g. Lloyd²⁸). When it comes to comparing the discriminatory power of the two markers, mayo outperforms 4-cov since the corresponding ROC curves show estimates closer to the point $(0, 1)$ , particularly for years 1 and 4; also, the three predictiveness curves of mayo appear to be steeper and closer to the point $(1, 1)$ , allowing for better election of marker thresholds, which indicates that mayo provides higher predictive accuracy.

Figure 2.

Parametric and semi-parametric estimates, along with 95% confidence bounds, of receiver operating characteristic (ROC) $^{C / D} (q, t)$ , ${ROC}^{I / D} (q, t)$ and $R (q, t)$ for markers 4-cov and mayo at times $t = 365$ , $t = 1460$ and $t = 2190$ (days after registration).

Figure 3 depicts estimates and 95% confidence bounds of ${AUC}^{C / D} (t)$ , ${AUC}^{I / D} (t)$ and $STG (t)$ , for $t$ greater than one year, for 4-cov and mayo from the parametric and SP models; here, the confidence bounds were drawn by joining the pointwise CIs for each measure at different time points. The estimated functions appear somehow similar for both models, with the SP approach estimating wider confidence bounds, particulary for 4-cov. The cumulative/dynamic plots indicate that 4-cov keeps a fairly constant discriminatory rate for the status of the event occurring within time $t$ at around 0.8, and that the corresponding discriminatory rate of mayo decreases steadily with time to $0.8$ . The cumulative/dynamic plots show that the rate for discriminating the status of the event instantaneously occurring at time $t$ decreases steadily with time to 0.6 using either marker, with both markers yielding estimates of the AUC that exceed $0.75$ in the first 2000 days of the follow-up. More marked differences between both markers can be noticed in the plots of $STG (t)$ , where the estimate for 4-cov appears fairly constant over time at around 0.5, whereas the estimate for mayo decreases steadily from around 0.7 to around 0.5, which supports the predictiveness superiority of mayo.

Figure 3.

Parametric and semi-parametric estimates, along with 95% confidence bounds, for area under the curve (AUC) $^{C / D} (t)$ , ${AUC}^{I / D} (t)$ and $STG (t)$ corresponding to markers 4-cov and mayo.

Discussion

In this paper, two new copula-based approaches to the representation of the joint model of marker and time to event were proposed, leading to parametric and SP characterizations of marker classification and predictivenes performance measures for survival outcomes. While the parametric representation of the model allows for the independent customization of each margin, such tailored specification is only required for the margin of the marker in the SP formulation, with the margin of the survival time being provided by a non-parametric estimator. In both models, the copula function encodes the dependence structure between the two random variables.

Maximum likelihood and a two-stage procedure were employed to obtain estimates in the parametric and SP specifications, respectively, and bootstrapping was performed for both approaches to obtain SEs and CIs. A strategy for choosing a copula from a set of candidates was based on assessing diagnostics plots for each conditional distribution. Simulation studies, which considered bias and RMSE for the assessment of the point estimators corresponding to classification and predictiveness measures, showed that both approaches performed well under different dependence and censoring scenarios.

The analysis of two markers from the PBC Mayo Clinic data illustrated the modeling process of the proposed methods, with both approaches producing similar results. This is the first study that displays ROC and predictiveness curves under the same framework. Meticulous investigation of the copula choice was necessary, and residuals plots for the conditional distributions offered a useful tool for the identification of the copula class with best fit for each marker. Unlike previous studies, where the methodologies were focused on point estimators mainly, the displays of the time-varying curves are by no means erratic (see e.g. Bansal and Heagerty,¹ Kamarudin et al.,²⁹ and Viallon and Latouche³⁰); in addition, the curves estimated here and their summaries showed CIs that are relatively narrow. Therefore, the methods developed in this paper provided an enhancement in the understanding of the classification and predictiveness characteristics of each marker.

The quantities that are used to compute a marker are registered at the beginning of the follow-up and tend to be fully observed, with the resulting observations being relatively easy to model parametrically by employing valid transformations, similar to the modeling of the usual parametric ROC analysis (e.g. Hanley³¹). Linking a FP model for the marker with a non-parametric survival distribution for the time to event via the copula, in order to obtain the SP joint model, represents a convenient extension of the parametric joint model when univariate parametric models fail to give satisfactory fits to the survival margin, or when the survival time is subject to sampling schemes that lead to complex incomplete data structures such as interval censoring, which are difficult to model parametrically in the bivariate setting.

Although the copula-based methods do not lead to closed-form estimates of marker performance measures, they are appealing alternatives since the estimation procedures are for all measures within the same framework, and they consider all marker values and all failure and censoring times, with no information appearing to be lost. Previously proposed methods for estimating ${AUC}^{I / D} (t)$ only, for instance, have mainly been based on smoothing techniques, and have proved to be highly influenced by data corresponding to earlier time points than $t$ , which can be considered a drawback since the focus is on assessing the prospective performance of a marker, not the retrospective.²

While the copula-based methods presented in this study represent a convenient way to model the dependence between the marker and the survival time, they were formulated using four copula families only. Various bivariate models have been defined in terms of finite mixtures of copulas, leading to rich classes of joint models that can be employed to study complex dependencies^32,33; however, with the adoption of such mixture models, new problems arise, including the customization of the margins in each component, which might not be easy to address, particularly because the survival time is subject to various types of censoring.

In the cross-sectional data context, there has been interest in comparing ROC curves corresponding to two correlated markers. The three main comparison scenarios include³⁴: (a) Testing whether the two ROC curves are equal for all marker quantiles $q$ , (b) Testing whether the two AUC’s are equal, and (c) Testing whether the two ROC curves are equal at a particular marker quantile. In the survival outcome setting, Zhang and Shao³⁵ developed vine copula-based algorithms to generate multivariate data for correlated markers, focusing mainly on the simulation for given concordance indexes. It is fair to say that extensions of the characterizations presented in this paper can be used in the three-dimensional framework by using vine copulas, to then elaborate the corresponding hypothesis tests for the various comparison scenarios. This, of course, warrants further research.

Footnotes

Software availability

R functions and codes for this manuscript can be accessed through the following GitHub link respecting the corresponding copyrights:

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article: This research was supported in part with individual career development grants from CONACYT, Mexico, through Sistema Nacional de Investigadores.

ORCID iD

Gabriel Escarela

Appendix

Proofs for the continuous-valued time forms of $FP (m, t)$ , ${TP}^{C} (m, t)$ , ${TP}^{I} (m, t)$ and $Risk (m, t)$ in the “Methods” Section.

Clearly

\begin{aligned} FP (m, t) & = 1 - Pr {M \leq m ∣ T > t} \\ = 1 - H_{M T} (m, t) / S_{T} (t) \end{aligned}

Since

{TP}^{C} (m, t) = Pr {M > m, T \leq t} / Pr {T \leq t}

and employing the following survival copula

C^{(S)}

given in Nelsen¹¹

^{(p 32)}

C^{(S)} (u, v) = u + v - 1 + C (1 - u, 1 - v)

where

(u, v) \in [0, 1] \times [0, 1]

, then (12)

\begin{aligned} Pr {M > m, T \leq t} & = Pr {M \leq m, T > t} \\ - Pr {M \leq m} - Pr {T > t} + 1 \end{aligned}

and thus

{TP}^{C} (m, t) = \frac{H_{M T} (m, t) - F_{M} (m) - S_{T} (t) + 1}{1 - S_{T} (t)}

Also,

\begin{aligned} {TP}^{I} (m, t) & = 1 - Pr {M \leq m ∣ T = t} \\ = 1 + \frac{{\frac{\partial H (x, y)}{\partial y} |}_{(x, y) = (m, t)}}{f_{T} (t)} \\ = 1 - C_{2}^{'} [F_{M} (m), S_{T} (t)] \end{aligned}

here,

f_{T} (t) = - d S (t) / d t

is the PDF of

T

. The Risk function can be obtained using equation (12) as follows:

\begin{aligned} Risk (m, t) & = \frac{{\frac{- \partial Pr {M > x, T \leq y}}{\partial x} |}_{(x, y) = (m, t)}}{f_{M} (m)} \\ = \frac{{\frac{\partial [F_{M} (x) - H (x, y)]}{\partial x} |}_{(x, y) = (m, t)}}{f_{M} (m)} \\ = 1 - C_{1}^{'} [F_{M} (m), S_{T} (t)] \end{aligned}

here,

f_{M} (m) = d F (m) / d m

is the PDF of

M

References

Bansal

Heagerty

. A tutorial on evaluating the time-varying discrimination accuracy of survival models used in dynamic decision making. Med Decis Making 2018; 38: 904–916.

van Geloven

Zwinderman

, et al. Estimation of incident dynamic AUC in practice. Comput Stat Data Anal 2021; 154: 1–15.

Tang

. Compare diagnostic tests using transformation-invariant smoothed ROC curves. Stat Plan Infer 2010; 140: 3540–3551.

Huang

Pepe

. A parametric ROC model-based approach for evaluating the predictiveness of continuous markers in case-control studies. Biometrics 2009; 65: 1133–1144.

Escarela

Rodríguez

Núñez-Antonio

. Copula modeling of receiver operating characteristic and predictiveness curves. Stat Med 2020; 39: 4252–4266.

Genest

Favre

A-C

. Everything you always wanted to know about copula modeling but were afraid to ask. J Hydrol Eng 2007; 12: 347–368.

Bura

Gastwirth

. The binary regression quantile plot: assessing the importance of predictors in binary regression. Biom J 2001; 4: 5–21.

Heagerty

Zheng

. Survival model predictive accuracy and ROC curves. Biometrics 2005; 61: 92–105.

Liu

Ning

Cheng

, et al. A flexible and robust method for assessing conditional association and conditional concordance. Stat Med 2019; 38: 3656–3668.

10.

Chaieb

Rivest

Abdous

. Estimating survival under a dependent truncation. Biometrika 2006; 93: 655–669.

11.

Nelsen

. An introduction to copulas. New York: Springer-Verlag, 2006.

12.

de Leon

. Copula-based regression models for a bivariate mixed discrete and continuous outcome. Stat Med 2011; 30: 175–185.

13.

Karrison

. Bootstrapping censored data with covariates. J Stat Comput Simul 1990; 36: 195–207.

14.

Lawless

Yilmaz

. Semiparametric estimation in copula models for bivariate sequential survival times. Biom J 2011; 53: 779–796.

15.

Davison

Hinkley

. Bootstrap methods and their application. Cambridge: Cambridge University Press, 1997.

16.

Dunn

Smyth

. Randomized quantile residuals. J Computa-tional Graph Stat 1996; 5: 236–244.

17.

Schepsmeier

Stöber

. Derivatives and fisher information of bivariate copulas. Stat Paper 2014; 55: 525–542.

18.

Fang

H-B

Fang

K-T

Kotz

. The meta-elliptical distributions with given marginals. J Multivar Anal 2002; 82: 1–16.

19.

Schmitz

. Copulas and Stochastic Processes. Rheinich-Westfälische Technische Hochschule Aachen University. PhD Thesis, 2004.

20.

Genest

MacKay

. The joy of copulas: Bivariate distributions with uniform marginals. Am Stat 1986; 40: 280–283.

21.

van Domelen

Mitchell

Perkins

, et al. Gamma models for estimating the odds ratio for a skewed biomarker measured in pools and subject to errors. Biostatistics 2019; 22: 250–265.

22.

Azzalini

. A class of distributions which includes the normal ones. Scand J Stat 1985; 12: 171–178.

23.

Lammers

Kowdley

van Buuren

. Predicting outcome in primary biliary cirrhosis. Ann Hepatol 2014; 13: 316–326.

24.

Liermann Garcia

Evangelista Garcia

McMaster

, et al. Transplantation for primary biliary cirrhosis: retrospective analysis of 400 patients ina single center. Hepatology 2001; 33: 22–27.

25.

Bansal

Heagerty

. A comparison of landmark methods and time-dependent ROC methods to evaluate the time-varying performance of prognostic markers for survival outcomes. Diagn Prognostic Res 2019; 3. Article number: 14.

26.

Dickson

Grambsch

Fleming

, et al. Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 1989; 10: 1–7.

27.

Egan

. Signal detection theory and ROC analysis. New York: Academic Press, 1975.

28.

Lloyd

. Estimation of a ROC curve. Stat Probab Lett 2002; 59: 99–111.

29.

Kamarudin

Cox

Kolamunnage-Dona

. Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med Res Methodol 2017; 17, Article number: 53.

30.

Viallon

Latouche

. Discrimination measures for survival outcomes: connection between the AUC and the predictiveness curve. Biom J 2011; 53: 217–236.

31.

Hanley

. The use of the ’binormal’ model for parametric ROC analysis of quantitative diagnostic tests. Stat Med 1996; 15: 1575–1585.

32.

Arakelian

Karlis

. Clustering dependencies via mixtures of copulas. Commun Stat – Simul Comput 2021; 43: 1644–1661.

33.

Zhuang

Diao

Grace

. A bayesian nonparametric mixture model for grouping dependence structures and selecting copula functions. Economet Stat 2022; 39: 172–189.

34.

Bantis

Feng

. Comparison of two correlated ROC curves at a given specificity or sensitivity level. Stat Med 2016; 35: 4352–4367.

35.

Zhang

Shao

. A numerical strategy to evaluate performance of predictive scores via a copula-based approach. Stat Med 2020; 39: 2671–2684.

Copula modeling for the estimation of measures of marker classification and predictiveness performance with survival outcomes

Abstract

Keywords

Introduction

Methods

Background: Marker performance measures

Copula modeling

Inference

Estimation

Families of copulas and margins

Simulation studies

Illustration: Application to the PBC dataset

Discussion

Footnotes

Software availability

Declaration of conflicting interests

Funding

ORCID iD

Appendix

References