Sage Journals: Discover world-class research

Abstract

Most of the studies for longitudinal quantile regression are based on the correct specification. Nevertheless, one specific model can hardly perform precisely under different conditions and assessing which conditions are (approximately) satisfied to determine the optimal one is rather difficult. In the case of the mixed effect model, the misspecification of the fixed effect part will cause a lack of predicting accuracy of random effects, and affect the efficiency of the cumulative function estimator. On the other hand, limited research has focused on incorporating multiple candidate procedures in longitudinal data analysis, which is of current emergency. This paper proposes an exponential aggregation weighting algorithm for longitudinal quantile regression. Based on the secondary smoothing loss function, we establish oracle inequalities for aggregated estimator. The proposed method is applied to evaluate the cumulative $τ$ th quantile function for additive mixed effect model with right-censored history process, and an aggregation-based best linear prediction for random effects is constructed as well. We show that the asymptotic properties are conveniently imposed owing to the smoothing scheme. Simulation studies are carried out to exhibit the rationality, and our method is illustrated to analyze the data set from a multicenter automatic defibrillator implantation trial.

Keywords

Aggregation longitudinal quantile regression random effect right censoring smoothing loss function

Introduction

Quantile regression (Koenker and Bassett¹) is an increasingly prevalent tool. It provides a constructive strategy for examining the effect of covariates on the entire response distribution and offers a more flexible, robust approach for data analysis. In longitudinal studies, the conditional $τ$ th quantile regression of a given probability level $τ \in (0, 1)$ is assumed as

Q_{τ} (y (t) | x (t)) = q_{τ} (x (t))

(1)

where

y (t)

x (t)

is the response variable and

p \times 1

dimensional covariate vector at time

t

, respectively.

q_{τ} (x (t))

is an unspecified function which contains conditional

τ

th quantile of model errors among

x (t)

. For a determined

q_{τ} (x (t))

, there is a great deal of literature for related statistical inferences. Yin and Cai² proposed a working independent method for linear quantile regression (LQR) with correlated failure time data. Lu and Fan³ estimated parameters in weighted quantile regression. Tang et al.⁴ proposed a B-spline-based variable selection procedure for varying-coefficient quantile regression, and Wang and Sun⁵ extended the approach to partial linear varying coefficient model with the within-subject correlation structure.

One of the significances for longitudinal quantile regression is dealing with censoring. Galvao et al.⁶ proposed a three-step estimator in the sense of panel quantile regression with fixed individual effects and left censoring. Harding and Lamarche⁷ developed a penalized estimation with a parametric or nonparametric mechanism to reduce attrition bias in Big Data, which is applicable to censored data as well. Recently, Liu et al.⁸ discussed the estimation of the cumulative $τ$ th quantile function ( $τ$ -CQF) for history process with right-censored time-to-event variable. Nevertheless, most of the existing methodologies are based on the fixed specification of model (1). Actually, there are a number of sampling designs, which require consideration for the correlation between observations that belongs to the same unit or cluster. The random effect provides a flexible medium in terms of complex data analysis. Koenker,⁹ Geraci and Bottai,¹⁰ and Kulkarni et al.¹¹ generalized equation (1) to the longitudinal quantile mixed effect model (LQMM) with linear fixed effect and random intercepts. More recently, Geraci¹² developed the algorithm for the covariate-affected random effect structure by Gaussian quadrature technique, and Geraci¹³ proposed a spline-based approximation for the nonparametric additive form of the fixed effect pattern to make an inference.

In practice, identifying an extremely correct model is rather difficult for the data at hand. Traditional model selection is commonly used to search for the most approximate one. Machado¹⁴ proposed a robust selection criterion for general M-estimators which is applicable to quantile regression, and Koenker¹⁵ suggested the traditional Akaike information criterion. The shortcoming of the above methods is frequently ignoring the level of uncertainty associated with the choice of the model while reporting precision estimates. One approach to incorporating such uncertainty is model averaging, which combines competing specifications via propensity weights. Since the pioneering contribution of Hjort and Claeskens,¹⁶ the number of works on the topic has proliferated in the past few years. For instance, Hansen¹⁷ proposed the Mallows’ criterion for selecting weights of least squares regressive models; Lu and Su¹⁸ used a jackknife model averaging strategy to sort weights in general quantile regression; and Wang et al.¹⁹ generalized the method with high-dimensional covariates, etc.

It is noted that the aforementioned thoughts are based on the weights via vested information criterion, and emphasize the parametric model assumption. To make the comparison with complex nonparametric forms, Yang²⁰ proposed an adaptive estimation for least squares regression by model mixing. Recently, Shan and Yang,²¹ Gu and Zou²² imposed the method to contribute the aggregation algorithm for quantile regression and expectile regression, respectively. The estimators are theoretically shown to approach the best candidate. However, few literary works considered the aggregation for longitudinal quantile regression, especially in view of $τ$ -CQF evaluating. To address the void, we contribute an adaptive estimation for longitudinal quantile regression. An exponential aggregation weighting (EAW) algorithm is adopted to combine multiple candidate estimating procedures, which efficiently guards against the misspecification of the fixed effect $q_{τ} (x (t))$ and exhibits optimally without knowing which of the original procedures works the best.

In an effort to acquire the asymptotic properties of aggregated estimator, we establish the oracle inequalities under particular risk measures. Since the risk bound under the squared error loss can be hardly derived in quantile regression, (Gu and Zou²²), we propose a novel smoothing scheme for the quantile check loss function, which supplies the strong convexity and the $L$ -smoothness. Furthermore, we utilize EAW algorithm to additive mixed effect model and construct the estimation of $τ$ -CQF. One can inspect that the estimator behaves consistently with that of the best procedure under the secondary smoothing loss. We also develop an EAW-based best linear prediction (EAW-BLP) of random effects via the proposed weights. A real dataset from multicenter automatic defibrillator implantation trial (MADIT) is analyzed to report the usability of the proposed methodology.

The rest of the paper is organized as follows. In Section 2.1 we introduce the EAW algorithm for (1). We propose a secondary smoothing approximation to check the loss function and employ it during the weighting algorithm. We apply EAW algorithm to additive mixed effect model and propose an estimator for $τ$ -CQF based on right-censored history process in Section 2.2, and the prediction of the random effects is modified with the aggregated weights in Section 2.3. We establish oracle inequalities of EAW estimator under original prediction risk and squared loss in Section 3.1, and related consistency of $τ$ -CQF estimator in Section 3.2, respectively. Simulation studies are presented in Section 4.1 to examine the validity of our algorithm, and we check the predicting accuracy of EAW-BLP for random effects in Section 4.2. We use the method to the MADIT dataset in Section 5. All technical proofs are attached in Appendix.

1. Methodology

In this section, we establish several aggregated estimators for longitudinal quantile regression. We firstly propose the EAW algorithm in the sense of model (1), and apply it to evaluate the cumulative $τ$ th quantile function ( $τ$ -CQF) for additive mixed effect model with right-censored history process. In addition, the prediction of the random effects is discussed in the aggregation framework as well.

2.1 EAW algorithm

Suppose there is a sample of $n$ subjects to be observed and for the $i$ th subject, $y_{i} (t_{i j})$ and $x_{i} (t_{i j})$ are collected at time points $t_{i j}$ ( $j = 1, \dots, n_{i}$ ), where $n_{i}$ is the respective total number of observations. Assume that the observed times of all subjects are bounded by $T$ . There are a large corpus of strategies on obtaining a consistent estimator from observation pairs $(x_{i} (t_{i j}), y_{i} (t_{i j}), t_{i j})$ if the structure of equation (1) is known. However, the specification may be changeable in different contexts and a unique estimator is sensitive to the failure of efficiency. Let $Δ = {δ_{k} : k > 1}$ be a set of candidate procedures and $δ_{k}$ be the $k$ th procedure of $q_{τ} (x (t))$ . Further, let ${\hat{q}}_{τ}^{(k)} (x (t))$ be the related estimator from $δ_{k}$ . There is no special restriction on $δ_{k}$ as they can be either parametric or nonparametric. For instance, $δ_{1}$ is a standard parametric linear regression, $δ_{2}$ is a varying-coefficient regression, $δ_{3}$ represents the nonparametric additive model, etc. On the other hand, the cardinality of $Δ$ can be either finite or countably infinite. The goal in this section is to combine multiple candidate procedures in $Δ$ to produce an adaptive estimator.

Gu and Zou²² proposed the aggregated regression for the asymmetric squared loss. Note that for quantile regression the loss function is $ρ_{τ} (v) = v (τ - I (v < 0))$ .¹⁵ One can hardly obtain a similar inequality with (2.4) in Gu and Zou²² since it cannot be justified as strongly convex or $L$ -smooth, while this will provide a succinct course in the follow-up theoretical study. Although Shan and Yang²¹ considered a surrogate of $ρ_{τ} (v)$ , which is nondifferentiable at $v = 0$ , they only contributed the oracle inequality via the surrogate loss. Comparatively, directly smoothing strategies such as piecewise of $ρ_{τ} (v)$ ^23,24 involve the first order power (or equivalently, the absolute) of $v$ , and thus it is hard to guarantee the strong convexity, which is essential to attain the risk bound under the squared error loss. To overcome the difficulty, we propose a secondary smoothing approximation of $ρ_{τ} (v)$ that

ρ_{τ, h}^{*} (v) = ρ_{τ, h} (v) + a (h) v^{2}

(2)

where

a (h)

is a function of

h

(0, + \infty)

, and

ρ_{τ, h} (v)

is the smoothing approximation of

ρ_{τ} (v)

that is convex and

L

-smooth in

R

. In this paper we use the structure of Aravkin et al.²⁴ as

ρ_{τ, h} (v) = {\begin{cases} (1 - τ) | v | - h (1 - τ)^{2} / 2, v < - (1 - τ) h, \\ v^{2} / (2 h), - (1 - τ) h \leq v \leq τ h, \\ τ | v | - h τ^{2} / 2, v > τ h . \end{cases}

From Lemma 1 in Appendix, it can be proved that

ρ_{τ, h}^{*} (v)

is strongly convex,

L

-smooth and has the minimum at the same point as

ρ_{τ} (v)

There are two smoothing coefficients in the secondary smoothing loss function (2). Figure 1 plots the curves of $ρ_{τ, h}^{*} (v)$ under varying pairs of $(h, a (h))$ . One can see that the skewness between the tails of $ρ_{τ, h}^{*} (v)$ and $ρ_{τ} (v)$ is dominated by $a (h)$ and the global approximation for original check loss is adjudicated by $h$ . Therefore, in order to warrant $ρ_{τ, h}^{*} (v)$ close to $ρ_{τ} (v)$ across the board, $a (h) h$ should be adjusted and $a (h)$ be evaluated smaller than $h$ . Related discussions of the constraints are presented in the next sections to protect the asymptotic efficiency for the aggregated estimation.

Figure 1.

The curves comparing with the proposed secondary smoothing loss function and the original check loss function. Different dotted lines represent the secondary smoothing loss under different values of $(h, a (h))$ , and the solid line is the original check loss.

In what follows, we propose EAW algorithm for longitudinal quantile regression. The algorithm is based on the multiple splits for the observed dataset, which is typically considered in machine learning, and utilizes the secondary smoothing loss during the weighting process.

Algorithm 1 Step 1.

( $b$ th Random Split) Given $1 \leq b \leq B$ , let $(x_{i}^{b}, y_{i}^{b}, t_{i}^{b})$ denote the $b$ th randomly permuted data of $(x_{i}, y_{i}, t_{i})$ via subject size (where $x_{i} = (x_{i} (t_{i 1}), \dots, x_{i} (t_{i n_{i}}))^{⊤}$ , $y_{i} = (y_{i} (t_{i 1}), \dots, y_{i} (t_{i n_{i}}))^{⊤}$ , $t_{i} = (t_{i 1}, \dots, t_{i n_{i}})^{⊤}$ ), and $N_{0} = max (1, ⌊ c n ⌋)$ for some $0 < c < 1$ . Split the sort into a training set $N_{1}^{b} = {x_{i}^{b}, y_{i}^{b}, t_{i}^{b}}_{i = 1}^{N_{0}}$ and a test set $N_{2}^{b} = {x_{i}^{b}, y_{i}^{b}, t_{i}^{b}}_{i = N_{0} + 1}^{n}$ .

Step 2.

For each procedure $δ_{k} \in Δ$ , estimate the conditional quantile in equation (1) by $N_{1}^{b}$ , denoted as ${\hat{q}}_{τ, b}^{(k)} (x (t))$ .

Step 3.

Given a sequence ${π_{k} : δ_{k} \in Δ}$ satisfying that $π_{k} \geq 0$ , $\sum_{δ_{k} \in Δ} π_{k} = 1$ , use $N_{2}^{b}$ to calculate the $b$ th exponential aggregation weights that ${\hat{Ω}}_{k, N_{0} + 1}^{b} = π_{k}$ and for $N_{0} + 2 \leq i \leq n$ ,

{\hat{Ω}}_{k, i}^{b} = \frac{π_{k} \exp {- λ \sum_{l = N_{0} + 1}^{i - 1} \sum_{j = 1}^{n_{l}} {\hat{L}}_{τ, b, l j}^{(k)}}}{\sum_{δ_{k} \in Δ} π_{k} \exp {- λ \sum_{l = N_{0} + 1}^{i - 1} \sum_{j = 1}^{n_{l}} {\hat{L}}_{τ, b, l j}^{(k)}}}

(3)

where

{\hat{L}}_{τ, b, l j}^{(k)} = ρ_{τ, h}^{*} (y_{l}^{b} (t_{l j}) - {\hat{q}}_{τ, b}^{(k)} (x_{l}^{b} (t_{l j})))

λ > 0

is the tuning parameter and

ρ_{τ, h}^{*} (v)

is the secondary smoothing check loss function defined in (2).

Step 4.

Repeat steps 1-3 by $B$ times, the aggregated estimator of $q_{τ} (x (t))$ is obtained by

{\hat{q}}_{τ}^{M} (x (t)) = \frac{1}{B} \sum_{b = 1}^{B} \sum_{i = N_{0} + 1}^{n} \sum_{δ_{k} \in Δ} \frac{{\hat{Ω}}_{k, i}^{b}}{n - N_{0}} {\hat{q}}_{τ, b}^{(k)} (x (t))

Remark 1. In addition to smoothing coefficients, there are several tuning parameters in Algorithm 1:

λ

controls the propensity of corresponding procedures in the weighting process;

{π_{k} : k \geq 1}

are regarded as prior weights of candidates. In longitudinal quantile regression, the performance is similar to that in Shan and Yang,²¹ Gu and Zou²²:

λ \to 0

to average all of the candidate procedures and

λ \to \infty

to select the best historic smoothing loss on the test set. Theoretically, we can set

π_{k} = \frac{\exp {- \sum_{l = 1}^{N_{0}} \sum_{j = 1}^{n_{l}} {\hat{L}}_{τ, b, l j}^{(k)}}}{\sum_{δ_{k} \in Δ} \exp {- \sum_{l = 1}^{N_{0}} \sum_{j = 1}^{n_{l}} {\hat{L}}_{τ, b, l j}^{(k)}}}

as a reasonable criterion and the upper-bound of

λ

will be exhibited in Section 3.1.

Remark 2. Although the EAW algorithm will be computationally intensive when the number of random partition is large, it eliminates the lack of accuracy due to randomly splitting sample. Indeed, multiple splits can be carried out in parallel as the results of each loop are mutually independent. All of the above will be shown during the theoretical proof and empirical studies. Besides, if the sequential updating mechanism is of less importance in practice, one can use the full-data set to implement Step 2 and to produce weights via split sets.

1.1. Estimating

τ

-CQF with censored history process

In this subsection, we evaluate $τ$ -CQF for LQMM with right-censored history process. For the data in the form $(x_{i}^{⊤} (t_{i j}), y_{i} (t_{i j}), z_{i}^{⊤} (t_{i j}), t_{i j})$ , $t_{i j} \in [0, T]$ , $i = 1, \dots, n$ , $j = 1, \dots, n_{i}$ , we consider the additive mixed effect model whose $τ$ th conditional quantile is:

Q_{τ} (y_{i} (t_{i j}) | x_{i} (t_{i j}), z_{i} (t_{i j}), u_{i}) = q_{F, τ} (x_{i} (t_{i j})) + z_{i}^{⊤} (t_{i j}) u_{i}

(4)

where

u_{i}

is the

q

-dimensional subject-specific zero-mean random effects vector and

z_{i} (t_{i j})

be an endogenous time-dependent covariates associated with

u_{i}

. Generally, we assume

u_{i} = (u_{1, i}, \dots, u_{q, i})^{⊤}

is independent of the model error term, identically distributed according to a determined density

f_{u}

with zero-mean and a

τ

-dependent

q \times q

covariance matrix

Φ_{τ}

. Moreover, denote

z_{i} = (z_{i} (t_{i 1}), \dots, z_{i} (t_{i n_{i}}))^{⊤}

y_{i} | x_{i}, z_{i}, u_{i}

is supposed to be independently distributed with the joint asymmetric Laplace (AL) distribution, which provides a possible parametric link between minimizing the sum of check loss and maximizing the AL-likelihood (see in, Koenker and Machado,²⁵ Geraci and Bottai²⁶).

Note that $y_{i} | x_{i}, z_{i}, u_{i} \sim AL (μ, σ, τ)$ has unknown parameters $μ$ and $σ$ , where the $j$ th component of $μ$ is $q_{F, τ} (x_{i} (t_{i j})) + z_{i}^{⊤} (t_{i j}) u_{i}$ , the joint density of $(y_{i}, u_{i})$ is

f (y_{i}, u_{i} | μ, σ, τ, Φ_{τ}) = f (y_{i} | u_{i}, μ, σ, τ) f_{u} (u_{i} | Φ_{τ})

(5)

Fitting model (4) is equivalent to estimating parameters in (5). If

q_{F, τ} (x_{i} (t_{i j}))

is correctly specified, a number of methodologies have been proposed to make related inference including parametric methods (e.g. Geraci¹²) and nonparametric methods (e.g. Geraci¹³). As it cannot be determined under different conditions, the EAW algorithm is applicable for constructing the best approximation. Actually, let

Δ = {δ_{k} : k = 1, \dots, K}

be the set of total

K

-multiple candidate procedures, an aggregated estimator for the fixed effect part in equation (4) is

{\hat{q}}_{F, τ}^{M} (x (t)) = \frac{1}{B} \sum_{b = 1}^{B} \sum_{i = N_{0} + 1}^{n} \sum_{δ_{k} \in Δ} \frac{{\hat{Ω}}_{k, i}^{b}}{n - N_{0}} {\hat{q}}_{F, τ, b}^{(k)} (x (t))

(6)

where

{\hat{q}}_{F, τ, b}^{(k)} (x (t))

is the estimator by

δ_{k}

with the

b

th random split,

{\hat{Ω}}_{k, i}^{b}

is given in (3). Since the random effect part is specified as a known structure, the intention has been involved in

{\hat{q}}_{F, τ, b}^{(k)} (x (t))

and it will not be added into weighting process.

For the $i$ th subject, let $T_{i}$ and $C_{i}$ represent the terminal time and censoring time, respectively, and $η_{i} = I (T_{i} \leq C_{i})$ is the censoring indicator. ${y_{i} (t_{i j}), t_{i j} \geq 0}$ hence can be taken as observations of the history process that are ceased by the observed event time $T_{i}^{*} = min {C_{i}, T_{i}}$ . Our main interest is to estimate $τ$ -CQF from the actual observations ${(x_{i}^{⊤} (t_{i j}), y_{i} (t_{i j}), z_{i}^{⊤} (t_{i j}), t_{i j}, η_{i}) : i = 1, \dots, n; j = 1, \dots, n_{i}}$ and $t_{i n_{i}} \leq T_{i}^{*}$ .

Let $K_{1} (t) = Pr (T_{i} > t)$ and $K_{2} (t) = Pr (C_{i} > t)$ be the survival functions of $T_{i}$ , $C_{i}$ , and $K (t) = Pr (T_{i}^{*} > t)$ be the survival function of actual cut-off time $T_{i}^{*}$ . Without loss of generality, we assume that censoring time is independent of terminal time and the history process for each subject. In the light of Liu et al.,⁸ the $τ$ th quantile state function ( $τ$ -QSF) of (4) at time $t$ is defined as $H_{τ} (t) = Q_{τ} (y (t) | x (t), z (t), u)$ . We combine the inverse probability of censoring weighting (IPCW) method with EAW estimator equation (6) to estimate $τ$ -QSF that

{\hat{H}}_{τ} (t) = \frac{1}{n} \sum_{i = 1}^{n} \frac{I (T_{i}^{*} \geq t)}{\hat{K} (t)} {\hat{q}}_{F, τ}^{M} (x_{i} (t))

where

\hat{K} (t) = {\hat{K}}_{1} (t) {\hat{K}}_{2} (t)

{\hat{K}}_{l} (t)

(l = 1, 2)

are corresponding Kaplan–Meier estimators of

K_{1} (t)

and

K_{2} (t)

, respectively.

The $τ$ -CQF in the time period $[0, s]$ is defined as $μ_{τ} (s) = \int_{0}^{s} H_{τ} (t) d t$ . Using the above estimator we can obtain that

{\hat{μ}}_{τ} (s) = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{s} \frac{I (T_{i}^{*} \geq t)}{{\hat{K}}_{1} (t) {\hat{K}}_{2} (t)} {\hat{q}}_{F, τ}^{M} (x_{i} (t)) d t, s \in [0, L]

(7)

However, in practice, one can only observe

y_{i} (t)

, covariate

x_{i} (t)

at discrete time points

t_{i j}, j = 1, \dots, n_{i}; i = 1, \dots, n

rather than in a continuous interval. Therefore, the linear interpolation in Deng²⁷ is exploited to compute (7).

Let $D_{i} = {t_{i j} : max (t_{i j}) \leq T_{i}^{*}; j = 0, 1, \dots, n_{i}}$ be the set of actual observed time points for the $i$ th subject, and $t_{i 0} = 0$ means that observations of all subjects can be gathered in the initial time $t = 0$ . We rearrange $t_{i j}$ for all subjects as $0 = t_{(0)} < \dots < t_{(N)}$ , and $D = {t_{(k)}, k = 0, \dots, N}$ is the set of ordered distinct observed time points. For $t_{(k)} \in D_{1} \cup \dots \cup D_{n}$ , the fitted $τ$ -QSF is given as

{\hat{H}}_{τ} (t_{(k)}) = \frac{1}{n} \sum_{i = 1}^{n} \frac{I (T_{i}^{*} \geq t_{(k)})}{{\hat{K}}_{1} (t_{(k)}) {\hat{K}}_{2} (t_{(k)})} {\hat{q}}_{F, τ}^{M} (x_{i} (t_{(k)}))

and for

t \in (t_{(k - 1)}, t_{(k)}]

in a time quantum,

{\hat{H}}_{τ} (t) = {\hat{H}}_{τ} (t_{(k - 1)}) + \frac{t - t_{(k - 1)}}{t_{(k)} - t_{(k - 1)}} {{\hat{H}}_{τ} (t_{(k)}) - {\hat{H}}_{τ} (t_{(k - 1)})}

As a consequence, the estimator of

τ

-CQF at all observed time points

t_{(k)} \in D

\begin{aligned} {\hat{μ}}_{τ} (t_{(k)}) = \sum_{l = 1}^{k} \int_{t_{(l - 1)}}^{t_{(l)}} {\hat{H}}_{τ} (t) d t = \frac{1}{2} \sum_{l = 1}^{k} {{\hat{H}}_{τ} (t_{(l)}) + {\hat{H}}_{τ} (t_{(l - 1)}) (t_{(l)} - t_{(l - 1)})} \end{aligned}

(8)

It is obviously shown that the estimation of (7) can be expressed by using

{{\hat{μ}}_{τ} (t_{(k)}) : k = 1, \dots, N}

in (8). To be computational convenience for

{\hat{H}}_{τ} (t)

, we use a nonparametric approximation for

{\hat{K}}_{1} (t) {\hat{K}}_{2} (t)

n^{- 1} \sum_{i = 1}^{n} I (T_{i}^{*} \geq t)

, and the estimators of

H_{τ} (t)

μ_{τ} (t)

are reformed by

{\tilde{H}}_{τ} (t)

and

{\tilde{μ}}_{τ} (t)

, replacing each

{\hat{H}}_{τ} (t_{(k)})

term with

{\tilde{H}}_{τ} (t_{(k)}) = \sum_{i = 1}^{n} \frac{I (T_{i}^{*} \geq t_{(k)})}{\sum_{j = 1}^{n} I (T_{j}^{*} \geq t_{(k)})} {\hat{q}}_{F, τ}^{M} (x_{i} (t_{(k)}))

(9)

1.2. EAW-based prediction for random effects

Predicting random effects for LQMM is an ongoing research issue. Geraci and Bottai²⁶ proposed a best linear prediction (BLP) of $u_{i}$ in terms of linear fixed effect. For a given probability level, the predictor depends on the estimation of parameters in (5) with correct specification of $q_{F, τ} (x_{i} (t_{i j}))$ and the construction of $u_{i}$ , and satisfies that its mean squared error (MSE) reaches the minimum. In (4) we assume that random effects have a linear pattern $z_{i}^{⊤} (t_{i j}) u_{i}$ , for the $i$ th cluster, the BLP of $u_{i}$ is given as

{\hat{u}}_{i}^{(τ)} = {\hat{Φ}}_{τ} z_{i}^{⊤} {\hat{Σ}}_{i}^{- 1} (y_{i} - \hat{E} y_{i})

where

Σ_{i} = z_{i} Φ_{τ} z_{i}^{⊤} + {Cov}_{i}

and

{Cov}_{i}

is regarded as the covariance matrix of random errors in (4). However, since diverse procedures result in changeable estimators, the main purpose of this subsection is to find the most effective prediction. A natural thought is to utilize the exponential aggregated weights to contribute the predictor among candidate

q_{F, τ} (x (t))

As we mentioned previously, candidate predictors can be generated with a full-data set as long as we ignore the sequential updating mechanism. It is suitable since the predictor may be lack of efficiency if we only use a fraction of the samples to predict all random effects. Let ${\hat{u}}_{i}^{(τ), k}$ be the BLP of the procedure $δ_{k}$ with respect to the $i$ th cluster, and ${\hat{Ω}}_{k, m}^{b}$ $(N_{0} + 1 \leq m \leq n)$ be the exponential aggregated weights from Algorithm 1. Denote that

{\hat{Ω}}_{k} = \frac{1}{B} \sum_{b = 1}^{B} \sum_{m = N_{0} + 1}^{n} \frac{{\hat{Ω}}_{k, m}^{b}}{n - N_{0}}

Then the EAW-based BLP of the random effects can be computed as

{\hat{u}}_{i}^{(τ), M} = \sum_{δ_{k} \in Δ} {\hat{Ω}}_{k} {\hat{u}}_{i}^{(τ), k}, i = 1, \dots, n

(10)

2. Asymptotic properties

In this section, we build the asymptotic properties of proposed estimators. The outperformance of EAW estimator is critiqued by the oracle inequalities to show that the risks are automatically close to those of the best candidate. By the secondary smoothing loss function (2), we derive a homologous risk bound under quadratic losses. On the other hand, the consistency of the proposed $τ$ -CQF is given as long as the candidate set involves a consistent procedure, which is conveniently verified under the smoothing strategy.

2.1. Oracle inequalities

We illustrate the theoretical properties of EAW estimator of equation (1). The intended adaptivity of the estimator is emerged via oracle inequalities, which provide statistical risk bounds in the sense of original check loss and squared error loss, respectively. For longitudinal data, risk bounds depend on the formulation of data collection as well. Define the counting process of time points for observations on the $i$ th subject as $N_{i} (t) = \sum_{j = 1}^{n_{i}} I (t_{i j} \leq t)$ , we refer to Fan and Li²⁸ to assume $N_{i} (t)$ to be a random sample from a certain population in the finite interval. Moreover, let the $L_{2}$ norm of a generic function $f$ with respect to the distribution $P_{X}$ of $X$ be $‖ f ‖^{2} = \int f^{2} (x) d P_{X}$ . Following conditions are imposed for theoretical results.

(A.1)
$sup_{δ_{k} \in Δ} max_{t \in [0, T]} | {\hat{q}}_{τ}^{(k)} (x (t)) - q_{τ} (x (t)) | \leq C_{τ}$ with probability one.
(A.2)
Let $e_{τ} (t) = y (t) - q_{τ} (x (t))$ , the $max_{t}$ -sub-exponential norm of $e_{τ} (t)$ given $x (t)$ and $t$ is bounded by $K_{τ}$ , where the $max_{t}$ -sub-exponential norm of $X (t)$ is defined as
$max_{t} ‖ X (t) ‖_{SEXP} := max_{t \in [0, T]} {sup_{p \geq 1} p^{- 1} {(E | X (t) |^{p})}^{1 / p}}$
Further, there exists a positive $K_{ρ}^{^{'}}$ such that $\int_{- K_{τ}}^{K_{τ}} | ρ_{h, τ}^{} (v) - ρ_{τ} (v) | d v \leq K_{ρ}^{^{'}}$ .
Condition (A.1) and the upper-boundedness of the sub-exponential norm are fairly common in the related work of aggregations. The constraints are mild to be satisfied only if $q_{τ} (x (t))$ and the variance of $e_{τ} (t)$ are bounded almost surely. In addition, the boundedness of the integral of $| ρ_{h, τ}^{} (K_{τ}) - ρ_{τ} (K_{τ}) |$ on $[- K_{τ}, K_{τ}]$ is equivalent to the upper-boundedness of $a (h) h$ , which implies that $ρ_{τ, h}^{*}$ performs close to $ρ_{τ}$ . Based on the above conditions, it can be shown that the EAW estimator behaves optimally with the best procedure via the following theorem: Theorem 1
Under conditions $(A .1)$ – $(A .2)$ , if $λ$ satisfies
$λ \leq min {\frac{\underline{c} \exp (- 2 \bar{c} C_{τ} (4 e {\bar{K}}_{ζ})^{- 1})}{(16 \sqrt{2} T^{2} K_{τ}^{2} M_{0} + 16 {\bar{c}}^{2} T C_{τ}^{2} M_{0})}, \frac{(2 e)^{- 1}}{4 T C_{τ} {\bar{K}}_{ζ}}}$
the risk of EAW estimator under the original check loss $ρ_{τ} (\cdot)$ satisfies
$\begin{aligned} E \int ρ_{τ} (y - {\hat{q}}_{τ}^{M} (x (t))) d N (t) \leq inf_{δ_{k} \in Δ} {E \int ρ_{τ} (y - {\hat{q}}_{τ}^{(k)} (x (t))) d N (t) + \frac{\log (1 / π_{k})}{λ (n - N_{0})} + H_{1}} \end{aligned}$
(11)
Moreover, the risk of EAW estimator under the squared error loss satisfies
$\begin{aligned} E \int {‖ {\hat{q}}_{τ}^{M} (x (t)) - q_{τ} (x (t)) ‖}^{2} d N (t) \leq inf_{δ_{k} \in Δ} {\frac{\bar{c}}{\underline{c}} E \int {‖ {\hat{q}}_{τ}^{(k)} (x (t)) - q_{τ} (x (t)) ‖}^{2} d N (t) + \frac{\log (1 / π_{k})}{\underline{c} λ (n - N_{0})}} \end{aligned}$
(12)
where $M_{0} = \exp {(4 T {\bar{K}}_{ζ})^{- 1} + (4 e)^{- 1}}$ , $H_{1}$ , ${\bar{K}}_{ζ}$ , $\underline{c}$ and $\bar{c}$ are positive, which will be presented in the proof.
Remark 3. Through the proof we have $H_{1} = T h max {τ^{2}, (1 - τ^{2})}$ , the risk bound under the original check loss is unavoidably dominated by the smoothing parameter as the aggregated weights are contributed via the secondary smoothing loss function. On the other hand, the smoothing parameter $h$ is restricted by $K_{ρ}^{^{'}}$ so that it cannot be too large. When $h$ is chosen near zero, the EAW estimator has an approximate performance with that by original loss. Comparing with Shan and Yang,²¹ we can find that the converge rate of EAW estimator is fast as long as $H_{1} + \log (1 / π_{k}) / {(λ (n - N_{0})} = o ((n - N_{0})^{- 1 / 2})$ . In practice if we only care about the prediction risk, one can use $h = (n - N_{0})^{- 1}$ , $a (h) = n^{- 1} h$ to fit a drastic transition of ${\hat{e}}_{τ}^{(k)} (t)$ .

Remark 4. The second inequality in Theorem 1 should not have been obtained by the original loss of quantile regression. However, it provides an advantageous demonstration for the asymptotic properties of $τ$ -CQF estimator. From a model specification perspective, (12) shows that the EAW estimator has an analogous consistency as long as $Δ$ contains the correct candidate. This requires that $\bar{c} / \underline{c}$ is upper-bounded theoretically, which is by no means mutually exclusive with Condition (A.2).
2.2. Consistency for $τ$ -CQF estimator

In this subsection, we anchor our investigation to demonstrate the consistency of estimators in Section 2.2. Let $M (t) = {x (t), z (t)}$ denote the observed covariate process such as baseline information and study time and so on, ${\bar{B}}_{Y} (t) = {Y (s) : s < t}$ and ${\bar{B}}_{M} (t) = {M (s) : s < t}$ be the longitudinal response history and the longitudinal covariate history prior to time $t$ . A set of regularity conditions are given to establish the theoretical property.

(B.1)
Conditional on $u$ , ${\bar{B}}_{M} (t)$ , ${\bar{B}}_{Y} (t)$ and $T \geq t$ , $M (t)$ is fully observed and its distribution depends only on ${\bar{B}}_{M} (t)$ for $t \in [0, L]$ . In addition, $M (t)$ is continuously differentiable in $[0, L]$ with probability one and $max_{t \in [0, L]} ‖ M^{^{'}} (t) ‖ < \infty$ , where $M^{^{'}} (t)$ denotes the derivative of $M (t)$ at time $t$ .
(B.2)
For any $t < L$ , the intensity of the counting process $N_{C} (t)$ given $u$ , ${\bar{B}}_{M} (t)$ , ${\bar{B}}_{Y} (t)$ and $T \geq t$ is determined only by ${\bar{B}}_{M} (t)$ and $M (t)$ , where $N_{C} (t) = \sum_{i = 1}^{C} I (t_{i} \leq t)$ .
(B.3)
For $δ_{k} \in Δ$ , there exits the nonstochastic function ${\bar{q}}_{F, τ}^{(k)} (x (t))$ such that for $t \in [0, T]$ , $‖ {\hat{q}}_{F, τ}^{(k)} (x (t)) - {\bar{q}}_{F, τ}^{(k)} (x (t)) ‖ = o_{p} (1)$ . Besides, $‖ {\hat{Φ}}_{τ}^{(k)} - {\bar{Φ}}_{τ}^{(k)} ‖ = o_{p} (1)$ with some ${\bar{Φ}}_{τ}^{(k)}$ in the support of $Φ_{τ}$ .
(B.4)
Given a positive constant $M_{0}$ , $max_{t \in [0, T]} ‖ {\bar{q}}_{F, τ}^{(k)} (x (t)) ‖ \leq M_{0}$ and $min_{‖ b ‖ = 1} b^{⊤} {\bar{Φ}}_{τ}^{(k)} b > M_{0}^{- 1}$ for $δ_{k} \in Δ$ .
(B.5)
There exists a constant $a > 0$ such that $min {K_{1} (L), K_{2} (L)} \geq a$ .
(B.6)
There exits some $K_{ρ}^{^{″}} > 0$ such that $a (h) h \geq K_{ρ}^{^{″}}$ .
Aforementioned assumptions are widely considered in Zeng and Cai,²⁹ Deng,²⁷ and Liu et al.,⁸ which are essential to longitudinal time-to-event data with censoring mechanism. Condition (B.3) is typically recommended to respond the misspecification of $q_{F, τ} (x (t))$ , and guarantees the convergency of candidate procedures as well. Condition (B.6), combined with Condition (A.2), claims that the smoothing coefficients of $ρ_{τ, h}^{*} (v)$ are controlled in an intercell near the right side of the origin to guarantee both the approximation of $ρ_{τ} (v)$ and the consistency of $τ$ -QSF. Based on the above conditions, we establish the consistency of the estimators for $τ$ -QSF, $τ$ -CQF by the following theorem: Theorem 2
Suppose that Conditions $(A .1)$ – $(A .2)$ and $(B .1)$ – $(B .6)$ hold, for a given probability level $0 < τ < 1$ , if ${\bar{q}}_{F, τ}^{(k)} (x (t)) = q_{F, τ} (x (t))$ for some $δ_{k} \in Δ$ , ${\hat{H}}_{τ} (t)$ is a consistent estimator of $τ$ -QSF, and ${\hat{μ}}_{τ} (s)$ in (7) is consistent to $μ_{τ} (s)$ .
Remark 5. Theorem 2 implies that the consistency of $τ$ -CQF estimator is constructed by the effective estimation of the fixed effect, i.e. $Δ$ contains the correct specification of $q_{F, τ} (x (t))$ . Instead of predicting random effects, the constraint of model (4) can be changed to degenerate into a classical case for the particular practice. Traditional quantile loss estimation may be considered as one of the candidate procedure. See Galvao et al.⁶, Harding and Lamarche⁷ for corresponding discussions.

It is worthily shown that the same construction of Theorem 2 will be derived in terms of ${\tilde{μ}}_{τ} (s)$ and ${\tilde{H}}_{τ} (t)$ , which performs much more popular for handling practical problems.
3. Numerical studies

3.1. Simulation

In this section, we report the performance of EAW estimators proposed in previous sections by Monte Carlo simulation studies for longitudinal quantile regression. We firstly design an experiment for equation (1), then fit $τ$ -QSF and $τ$ -CQF for mixed effect model.

Experiment 1. In this experiment, we consider a general longitudinal quantile model as follows:

{\begin{aligned} y_{i} (t_{i j}) = q_{1, τ} + q_{2, τ} + q_{3, τ} ε_{i} (t_{i j}) \\ q_{1, τ} = β_{1, τ} x_{1, i} (t_{i j}) + β_{3, τ} x_{3, i} \\ q_{2, τ} = (0.5 - τ) {α_{1, τ} (t_{i j}) x_{1, i} (t_{i j}) + α_{2, τ} (t_{i j}) x_{2, i}} \\ q_{3, τ} = (0.75 - τ) {f_{2, τ} (x_{2, i}) + f_{3, τ} (x_{3, i})} \end{aligned}

(13)

For

τ \in (0, 1)

, model (13) is composed of the constant coefficient linear part

q_{1, τ}

, the varying-coefficient part

q_{2, τ}

and the nonparametric part

q_{3, τ}

. The observation time points for each subject value in the scheduled set

{1, \dots, [T]}

and

[T]

is the terminal time satisfying

T \sim U (10, 30)

. The random errors

ε_{i} = (ε_{i} (t_{i 1}), \dots, ε_{i} (t_{i n_{i}}))^{⊤}

are generated from a multivariate normal distribution with mean

0_{1 \times n_{i}}

and an AR(1) structure variance-covariance matrix

Σ_{n_{i}} (ρ) = (\begin{matrix} 1 & ρ & \dots & ρ^{n_{i} - 1} \\ ρ & 1 & \dots & ρ^{n_{i} - 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ^{n_{i} - 1} & ρ^{n_{i} - 2} & \dots & 1 \end{matrix})

where

ρ = 0.5

to make the term with medium correlation. Besides, the covariates

x_{1, i} (t) \sim U (t / 20, 2 + t / 20)

(\begin{matrix} x_{2, i} \\ x_{3, i} \end{matrix}) \sim N ((\begin{matrix} 1 \\ 0 \end{matrix}), (\begin{matrix} 1 & [T] / 35 \\ [T] / 35 & 1 \end{matrix}))

and

(β_{1, τ}, β_{3, τ}) = (0.5, - 1)

. The varying-coefficients

α_{1, τ} (t) = 0.8 \sin (π t / 20)

α_{2, τ} (t) = 3 - \cos ((t - 25) π / 15)

, and the nonparametric form

f_{2, τ} (x) = (2 x - 1) / (x^{2} + 1)

f_{3, τ} (x) = \exp (0.5 x + 1)

, respectively.

According to the above specification, the $τ$ th conditional quantile function is presented as $Q_{τ} (y (t) | x (t)) = q_{1, τ} + q_{2, τ} + q_{3, τ} b_{τ}$ , where $b_{τ}$ is the $τ$ -quantile of $N (0, 1)$ . Intuitively, under particular quantile levels, different terms dominate the performance of the true model (such as when $τ = 0.5$ , $Q_{τ} (y (t) | x (t)) = q_{1, τ}$ and the varying-coefficient regression approximates the model better as $τ$ near $0.75$ , among others). To compare EAW estimator with a single procedure, we construct three candidate models to fit (13):

LQR:

Q_{1, τ} (y (t) | x (t)) = θ_{1} x_{1} (t) + θ_{2} x_{2} + θ_{3} x_{3}

Vary-Coefficient LQR (VCLQR):

Q_{2, τ} (y (t) | x (t)) = θ_{1} (t) x_{1} (t) + θ_{2} (t) x_{2} + θ_{3} (t) x_{3}

Nonprametric additive quantile regression (NAQR):

Q_{3, τ} (y (t) | x (t)) = θ_{1} (x_{1} (t)) + θ_{2} (x_{2}) + θ_{3} (x_{3})

We use the LQR for estimating LQR, and B-spline estimations for VCLQR and NAQR. We take the size of subjects as

n = 200

with

500

independent replications. In the procedure of EAW, all subjects are randomly partitioned into a training set of

N_{0} = n / 2

and the test set for others,

π_{k} = 1 / 3

for

k = 1, 2, 3

and set the tuning parameter

λ = 1

. The algorithm is averaged over

B = 50

random splits. Moreover, smoothing parameters of the secondary smoothing loss function are set up as

h = 1

and

a (h) = 0.01

. Denote the estimation of the true quantile function

Q_{τ}

{\hat{Q}}_{τ}

from a specific procedure. The estimated (original check loss based) prediction risk (PR) and the estimated root MSE (RMSE) are designed as quantization criteria that

\begin{aligned} PR (τ) = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} ρ_{τ} (y_{i} (t_{i j}) - {\hat{Q}}_{τ, i j}) \\ RMSE (τ) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} {({\hat{Q}}_{τ, i j} - Q_{τ, i j})}^{2}} \end{aligned}

Table 1 reports the performance measures for all candidate procedures and EAW estimator under different probability levels and thickens the best among candidates. One can notice when

τ < 0.5

the nonlinear form

q_{3, τ}

plays a leading role and NAQR fits the model better than LQR and VCLQR. Conversely, when

τ

is larger than 0.75, VCLQR is more suitable for the specified model. As

τ

tends to the median, three kinds of specifications result in the satisfying PR but the LQR model has a smaller RMSE than the other two nonparametric estimators. All the above results are fundamentally consistent with our prediction. Figure 2 typically presents the frequency of the value of aggregated weights among computations, which indicates that our weights are chosen around the best candidate and, the PR and RMSE of the EAW estimator emerge close to those of the “most approximate” specification of the true model under different probability levels. This phenomenon confirms the outperformance of our adaptive aggregation method in the longitudinal quantile model. The number in brackets represents the standard error of corresponding indicators, which shows that the EAW estimator can keep a stable deviation if one of the above candidates leads an effective result.

Figure 2.

Histograms of exponential aggregation weights of candidate procedures for either $τ = 0.5$ or $0.75$ under $200$ sample size and $500$ replications.

To demonstrate the rationality of the methodology in Section 2.2, we conduct the following simulation.

Experiment 2. Assume the additive mixed effect model (4) has the following structure:

y (t) = β_{1, τ} t + β_{2, τ} x_{2} + (τ - 0.5) α_{τ} (t) + z u_{1} + u_{2} + f_{τ} (x_{2}) ε

(14)

Table 1.

PRs, RMSEs, and related SEs (in parentheses) for kinds of estimations of Experiment 1, with the correlation coefficient $ρ = 0.5$ of random errors. All results are multiplied by $10^{2}$ .

	Measure	Procedure
$τ$	( $\times 10^{2}$ )	LQR	VCLQR	NAQR	Aggregation
0.05	PR	27.94(0.98)	28.16(1.19)	24.18(0.62)	25.26(1.17)
	RMSE	165.59(8.89)	166.03(11.18)	91.10(9.64)	114.10(20.79)
0.1	PR	41.50(1.02)	41.02(1.14)	37.95(0.84)	38.60(1.13)
	RMSE	116.79(4.30)	115.14(6.71)	73.66(7.49)	75.47(10.23)
0.25	PR	53.34(1.13)	53.57(1.14)	51.82(1.10)	51.77(1.08)
	RMSE	52.96(1.46)	59.46(2.18)	42.22(3.86)	38.55(3.10)
0.4	PR	43.47(0.90)	46.20(0.97)	43.37(0.91)	43.31(0.90)
	RMSE	18.00(0.96)	44.72(2.93)	20.74(2.91)	16.66(1.42)
0.5	PR	31.60(0.62)	35.92(0.79)	32.10(0.64)	31.60(0.62)
	RMSE	2.18(1.36)	42.04(3.20)	14.38(2.43)	2.37(1.31)
0.75	PR	11.86(0.94)	8.88(0.58)	12.98(1.00)	8.69(0.58)
	RMSE	44.02(2.80)	44.73(1.67)	48.58(3.14)	43.14(1.64)
0.9	PR	14.63(0.74)	14.58(0.57)	15.14(0.84)	14.29(0.57)
	RMSE	196.81(8.10)	152.37(2.78)	201.72(7.63)	153.34(2.90)
0.95	PR	10.65(0.42)	12.42(0.50)	10.86(0.56)	11.49(0.78)
	RMSE	308.83(11.03)	254.33(5.16)	317.35(10.55)	259.91(7.53)

Note: The best indicator among candidates are presented in bold.

RMSE: root mean squared error; LQR: linear quantile regression; VCLQR: vary-coefficient linear quantile regression; NAQR: nonprametric additive quantile regression; PR: prediction risk.

we take

x_{2}

be independently distributed from

χ_{1}^{2}

and

z = ξ_{1} + ξ_{2}

ξ_{1} \sim N (0, 1)

and

ξ_{2} \sim N (0, 1)

(β_{1, τ}, β_{2, τ})

are parameters of fixed effect and valued as

(0.5, 0.5)

α_{τ} (t) = 3 \sin (π (t - 10) / 15)

and

f_{τ} (x_{2}) = \exp (- 0.3 x_{2} + 1)

, respectively. The random effects are generated as

(\begin{matrix} u_{1, i} \\ u_{2, i} \end{matrix}) \sim N (0, (\begin{matrix} 4 & 0 \\ 0 & 4 \end{matrix}))

and the random error

ε

has the

τ

th quantile

b_{τ}

, of which distribution is set up as either standard normal distribution

N (0, 1)

or standard Laplace distribution

Lap (0, 1)

To specify different observed times of an unbalanced design, we utilize the following hazard function to generate the survival time $T$ :

h (t) = h_{0} \exp {w γ + α (β_{1, τ} t + β_{2, τ} x_{2})}

Besides, the censoring point

C

can be sampled from a uniform distribution

U (a_{C}, b_{C})

. For each subject, the process with censored observations in (14) is similar to Deng,²⁷ where the lifetime random sample

{v_{i} : i = 1, \dots, n}

with

h (t)

is generated via the following expression:

v_{i} = \frac{1}{α β_{1, τ}} \log {1 - \frac{α β_{1, τ} \log (1 - s_{i})}{h_{0} \exp (w γ + α β_{2, τ} x_{2, i})}}

where

α = 0.25

h_{0} = 2 \times 10^{- 3}

w \sim U (0, 1)

γ = 1

, and

s_{i} \sim U (0, 1)

. Set the simulated terminal time

t_{i} = min {v_{i}, c_{i}}

that controls the censoring rate at about

32 %

via

a_{C} = 20

and

b_{C} = 40

. Hence, for a probability level

τ

, the right-censored longitudinal observations are generated by

y_{i} (s) = β_{1, τ} s + β_{2, τ} x_{2, i} + (τ - 0.5) α_{τ} (s) + z_{i} u_{1, i} + u_{2, i} + f_{τ} (x_{2, i}) ε_{i}

(15)

where

i = 1, \dots, n

and

s = 1, \dots, ⌈ t_{i} ⌉

. We specify LQR and NAQR with constant coefficients to fit the fixed effect part in (15) by using the “LQMM” package, and supply a misspecified linear quantile model (MLQR)

Q_{τ} (y_{i} (t) | x_{i} (t), z_{i} (t), u_{i}, t) = β_{1, τ} \sqrt{t} + (β_{2, τ} + u_{1, i}) z_{i} + β_{3, τ} + u_{2, i}

for the median case. The EAW algorithm is adopted to combine candidate models and to estimate

τ

-QSF,

τ

-CQF, respectively. The simulation studies are carried out by 500 successful completions of algorithm. Simple computation leads that the “true function” of

H_{τ} (t)

and

μ_{τ} (t)

are

\begin{aligned} H_{τ} (t) & = 0.5 t + (3 τ - 1.5) \sin (\frac{π (t - 10)}{15}) + M_{f} b_{τ} + 0.5 \\ μ_{τ} (t) & = 0.25 t^{2} - \frac{(45 τ - 22.5)}{π} \cos (\frac{π (t - 10)}{15}) + (M_{f} b_{τ} + 0.5) t \end{aligned}

where

M_{f}

is averaged by the sample of

f_{τ} (x_{2, i})

generated among simulations.

We implement the aforementioned experiment with sample size $n = 120$ and $300$ , split $2 / 3$ of the total subjects for training and other $1 / 3$ for testing. Smoothing parameters of the secondary smoothing loss function are taken with the same value as those in Experiment 1, and the weighting process is carried out with a single partition. We present the estimated PR and RMSE of the fixed effect part in Tables 2 and 3. Moreover, the measure of estimators for $H_{τ} (t)$ and $μ_{τ} (t)$ is displayed as average MSE (AMSE) that

AMSE = \frac{1}{T R} \sum_{r = 1}^{R} \int_{0}^{T} {{\hat{f}}^{(r)} (t) - f (t)}^{2} d t

where

{\hat{f}}^{(r)} (t)

is the corresponding estimated function of

f (t)

in the

r

th replication. It can be seen that in most cases the aggregated estimator shows the gratifying performance regardless of the change of

τ

and

ε

. The PR and RMSE of the fixed effect emerge the same conclusion as those in Experiment 1. Note that model (15) approaches to the linear structure for

τ = 0.5

, and the nonparametric specification leads a more appreciable result otherwise. Although the split for the aggregation is set by

B = 1

, the EAW estimator still leads to expected properties and hence is verified to be feasible. For the estimator of

τ

-QSF and

τ

-CQF, we can find that the performance of AMSEs is consistent with that of PRs and RMSEs in different specifications: When

τ = 0.5

, LQR has smaller AMSEs than NAQR while latter fits the model much more conveniently otherwise, and the EAW estimator preserves an excellent accuracy in all cases. Figure 3 visualizes the above conclusions, whose tails of fitted curves are partly biased since the number of observed subjects decreases due to the failure of observations and censoring. Particularly, for median, we supply an additional misspecified mixed effect model MLQR to make the comparison. The results show that our EAW estimator can effectively distinguish the incorrect model as well.

Figure 3.

Fitted values versus the true curve of $τ$ -QSF, $τ$ -CQF for either $τ = 0.5$ or 0.9 under different specifications with 300 subjects. The first row shows fitted curves with standard normal errors, and the second row is obtained under Laplace errors.

Table 2.

Fitting the fixed effect of Experiment 2, estimating $τ$ -QSF and $τ$ -CQF with different candidate procedures, including the incorrect specification, where $ε \sim N (0, 1)$ .

			Procedure
n	$τ$	Measure	LQR	MLQR	NAQR	Aggregation
120	0.25	PR	1.404(0.066)	–	1.322(0.066)	1.335(0.067)
		RMSE	0.767(0.200)	–	0.410(0.199)	0.466(0.198)
		${AMSE}_{H (t)}$	0.869	–	0.403	0.507
		${AMSE}_{μ (t)}$	179.941	–	74.138	86.788
	0.5	PR	1.580(0.057)	1.648(0.059)	1.582(0.058)	1.581(0.058)
		RMSE	0.212(0.130)	1.210(0.147)	0.354(0.171)	0.295(0.183)
		${AMSE}_{H (t)}$	0.120	2.202	0.377	0.227
		${AMSE}_{μ (t)}$	30.061	105.668	62.796	38.539
	0.9	PR	0.950(0.082)	–	0.829(0.059)	0.844(0.061)
		RMSE	1.010(0.188)	–	0.579(0.201)	0.615(0.190)
		${AMSE}_{H (t)}$	2.356	–	0.420	0.615
		${AMSE}_{μ (t)}$	211.733	–	81.527	86.382
300	0.25	PR	1.364(0.045)	–	1.321(0.042)	1.324(0.043)
		RMSE	0.581(0.113)	–	0.300(0.145)	0.334(0.146)
		${AMSE}_{H (t)}$	0.780	–	0.224	0.283
		${AMSE}_{μ (t)}$	71.095	–	40.393	45.674
	0.5	PR	1.577(0.036)	1.644(0.037)	1.579(0.036)	1.578(0.036)
		RMSE	0.147(0.102)	1.153(0.096)	0.248(0.140)	0.190 (0.118)
		${AMSE}_{H (t)}$	0.070	2.414	0.189	0.121
		${AMSE}_{μ (t)}$	16.696	101.240	36.221	22.317
	0.9	PR	0.862(0.041)	–	0.836(0.043)	0.836(0.040)
		RMSE	0.875(0.078)	–	0.431(0.165)	0.483(0.161)
		${AMSE}_{H (t)}$	2.643	–	0.237	0.375
		${AMSE}_{μ (t)}$	149.028	–	47.922	54.292

Note: The best indicator among candidates are presented in bold.

RMSE: root mean squared error; LQR: linear quantile regression; NAQR: nonprametric additive quantile regression; MLQR: misspecified linear quantile model; PR: prediction risk; AMSE: average mean squared error; AMSE: average mean squared error.

Table 3.

Fitting the fixed effect of Experiment 2, estimating $τ$ -QSF and $τ$ -CQF with different candidate procedures, including the incorrect specification, where $ε \sim Lap (0, 1)$ .

			Procedure
n	$τ$	Measure	LQR	MLQR	NAQR	Aggregation
120	0.25	PR	1.551(0.064)	–	1.475(0.054)	1.493(0.059)
		RMSE	0.672(0.164)	–	0.477(0.215)	0.494(0.185)
		${AMSE}_{H (t)}$	0.928	–	0.526	0.655
		${AMSE}_{μ (t)}$	114.439	–	85.878	82.199
	0.5	PR	1.759(0.061)	1.821(0.061)	1.760(0.062)	1.760(0.061)
		RMSE	0.210(0.137)	1.202(0.141)	0.350(0.169)	0.286(0.163)
		${AMSE}_{H (t)}$	0.121	2.148	0.376	0.226
		${AMSE}_{μ (t)}$	30.418	100.972	58.478	37.915
	0.9	PR	0.986(0.069)	–	0.901(0.048)	0.926(0.054)
		RMSE	1.048(0.170)	–	0.716(0.210)	0.767(0.198)
		${AMSE}_{H (t)}$	2.593	–	0.576	1.008
		${AMSE}_{μ (t)}$	195.698	–	100.184	98.928
300	0.25	PR	1.514(0.041)	–	1.480(0.038)	1.484(0.038)
		RMSE	0.542(0.069)	–	0.358(0.157)	0.379(0.143)
		${AMSE}_{H (t)}$	0.924	–	0.295	0.406
		${AMSE}_{μ (t)}$	58.788	–	47.932	49.196
	0.5	PR	1.766(0.037)	1.828(0.037)	1.767(0.037)	1.767(0.037)
		RMSE	0.150(0.096)	1.138(0.090)	0.247(0.127)	0.196(0.114)
		${AMSE}_{H (t)}$	0.069	2.358	0.210	0.116
		${AMSE}_{μ (t)}$	16.520	92.745	34.048	23.640
	0.9	PR	0.928(0.032)	–	0.907(0.032)	0.913(0.032)
		RMSE	0.946(0.070)	–	0.562(0.162)	0.608(0.160)
		${AMSE}_{H (t)}$	2.704	–	0.332	0.628
		${AMSE}_{μ (t)}$	157.932	–	60.054	64.797

Note: The best indicator among candidates are presented in bold.

RMSE: root mean squared error; LQR: linear quantile regression; NAQR: nonprametric additive quantile regression; MLQR: misspecified linear quantile model

3.2. Performance of EAW-based predictors

To check the efficiency of (10), we design a series of simulations for the similar mixed effect model in Experiment 2 but with different distributions of $u_{i}$ :

Case I
$(u_{1, i}, u_{2, i})^{⊤}$ follows a bivariate normal distribution with the same covariance in Experiment 2;
Case II
$u_{1, i}$ and $u_{2, i}$ are independently distributed with the symmetric Laplace distribution with $(σ_{1}, σ_{2}) = (1, 2)$ .
Taking normal error as an example, the simulation is carried out in view of $τ = 0.25, 0.5, 0.9$ . Other options such as candidate procedures, splitting strategies are set same as that in Experiment 2. To reflect the predicting accuracy, Table 4 reports the comparison of MSE between EAW-based BLP ${\hat{u}}^{(τ)}$ and other separate specifications. It implies that the EAW predictor performs generally well in all probability levels, and is close to that of “correct estimating procedure” as $n$ increases. The scatter-curves of MSEs with different cluster sizes are plotted in Figure 4, which shows that the EAW-based BLP is pretty efficient for the part of significant difference among different specifications. It is relevant to point out that when the cluster size is small, large variations of ${\hat{u}}^{(τ)}$ ’s MSE may be enacted and we recommend to increase the number of random splits for EAW to eliminate the volatility.

Table 4.
MSEs of BLP ${\hat{u}}^{(τ)}$ with different distributions. Each pair $(a, b)$ represents the MSE of $({\hat{u}}_{1}^{(τ)}, {\hat{u}}_{2}^{(τ)})$ respectively.

Procedure

n $τ$ MSE LQR MLQR NAQR EAW-Prediction

120 0.25 Case I (0.205, 1.198) – (0.211, 0.504) (0.211, 0.598)

Case II (0.180, 1.740) – (0.200, 0.931) (0.193, 1.024)

0.5 Case I (0.186, 0.343) (0.321, 0.441) (0.195, 0.404) (0.188, 0.350)

Case II (0.163, 0.444) (0.242, 0.732) (0.164, 0.633) (0.163, 0.469)

0.9 Case I (0.253, 1.432) – (0.330, 0.523) (0.311, 0.576)

Case II (0.235, 2.531) – (0.231, 0.772) (0.230, 0.957)

300 0.25 Case I (0.203, 0.694) – (0.224, 0.452) (0.224, 0.456)

Case II (0.180, 1.332) – (0.177, 0.643) (0.177, 0.687)

0.5 Case I (0.184, 0.310) (0.297, 0.403) (0.186, 0.345) (0.185, 0.321)

Case II (0.164, 0.403) (0.218, 0.537) (0.164, 0.550) (0.164, 0.412)

0.9 Case I (0.265, 0.740) – (0.312, 0.530) (0.308, 0.536)

Case II (0.237, 1.226) – (0.247, 0.620) (0.246, 0.639)

Note: The best indicator among candidates are presented in bold.

LQR: linear quantile regression; NAQR: nonprametric additive quantile regression; MLQR: misspecified linear quantile model; MSE: mean squared error; BLP: best linear prediction.

Figure 4.
Mean squared errors of predicted random effects under different model specifications in Experiment 2. Different columns contain different distributions via the probability level be either 0.25 or 0.9.
4. MADIT data analysis

			Procedure
120	0.25	Case I	(0.205, 1.198)	–	(0.211, 0.504)	(0.211, 0.598)
		Case II	(0.180, 1.740)	–	(0.200, 0.931)	(0.193, 1.024)
	0.5	Case I	(0.186, 0.343)	(0.321, 0.441)	(0.195, 0.404)	(0.188, 0.350)
		Case II	(0.163, 0.444)	(0.242, 0.732)	(0.164, 0.633)	(0.163, 0.469)
	0.9	Case I	(0.253, 1.432)	–	(0.330, 0.523)	(0.311, 0.576)
		Case II	(0.235, 2.531)	–	(0.231, 0.772)	(0.230, 0.957)
300	0.25	Case I	(0.203, 0.694)	–	(0.224, 0.452)	(0.224, 0.456)
		Case II	(0.180, 1.332)	–	(0.177, 0.643)	(0.177, 0.687)
	0.5	Case I	(0.184, 0.310)	(0.297, 0.403)	(0.186, 0.345)	(0.185, 0.321)
		Case II	(0.164, 0.403)	(0.218, 0.537)	(0.164, 0.550)	(0.164, 0.412)
	0.9	Case I	(0.265, 0.740)	–	(0.312, 0.530)	(0.308, 0.536)
		Case II	(0.237, 1.226)	–	(0.247, 0.620)	(0.246, 0.639)

To illustrate the practical superiority of the EAW estimation, we apply the methodology to analyze the MADIT data. The dataset has been studies by Deng²⁷ and Liu et al.⁸ Note that most of the statistical modeling and estimating are based on the particular specification, and have not explicated the outperformance of the fitted model or estimating methodologies. We will demonstrate the rationality and the advantage of our method.

The MADIT was designed to evaluate the effectiveness of an implantable cardiac defibrillator (ICD) in preventing sudden death in high-risk patients. In the collected database, a total of 181 patients from 36 centers in the United States were recruited to be fully sequential and randomly observed. Of which 89 patients were assigned to the ICD group and others were assigned to the conventional therapy group, and all of the observations were censored in large degree. Concretely, the data set consists of patient ID code, Treatment code (1 for ICD and 0 for conventional), observed survival time in days, death indicator (1 for death and 0 for censored), cost type, and daily cost from the start to the completion of the trial. Following types of cost were present in the experiment: Type 1 is for hospitalization and emergency department visits; Type 2 is for outpatient tests and procedures; Type 3 is for physician/specialist visits; Type 4 is for community services; Type 5 is for medical supplies, and Type 6 is for medications.

It is widely accepted that the medical costs are not affected by Treatment Types as they often appear in the survival part to influence the risk rate of a subject, thus we will not consider it in the outcome model. On the other hand, patients may have more than one cost types simultaneously even at the same time point, seven significant covariates $x_{1, i} (t), \dots, x_{7, i} (t)$ are contained, where $x_{r, i} (t) = 1$ if the observation of Type $r$ ( $r = 1, \dots, 6$ ) is 1 and $x_{r, i} (t) = 0$ otherwise, and $x_{7, i} (t) = f (t)$ is imposed to depict the affect of the treatment time. To contrast with Liu et al.,⁸ the same data preprocessing method is adopted to make the comparison, that is, compresses the data set by summing the daily cost in terms of each 12-day periods. We set up both longitudinal quantile regression (denoted as “Liu et al.”) and additive mixed effect models (denoted as “AQMM”) to fit the MADIT data:

\begin{aligned} Q_{τ}^{(1)} (y_{i} (t) | x_{i} (t)) & = β_{0, τ} + \sum_{p = 1}^{6} β_{p, τ} x_{p, i} (t) + β_{7, τ} f (t) \\ Q_{τ}^{(2)} (y_{i} (t) | x_{i} (t), u_{i}) & = β_{0, τ} + \sum_{p = 1}^{6} β_{p, τ} x_{p, i} (t) + β_{7, τ} f (t) + t u_{1, i} + u_{2, i} \end{aligned}

(16)

where

f (t)

is specified as either

f (t) = t

f (t) = \log (t)

such that

Q_{τ}^{(1)} (y_{i} (t) | x_{i} (t))

represents the conditional quantile in Liu et al.⁸ In the second model

u_{i} = (u_{1, i}, u_{2, i})^{⊤}

are assumed as the zero-mean random effects with a symmetric positive-definite variance-covariance matrix.

We implement EAW estimator among $Q_{τ}^{(1)} (y_{i} (t) | x_{i} (t))$ and $Q_{τ}^{(2)} (y_{i} (t) | x_{i} (t), u_{i})$ in (16) with 50 independent splits (the smoothing parameters $h$ =1 and $a (h) = 0.01$ ). Table 5 lists the estimated 5-year and total treatment period cumulative quantile costs under $τ = 0.25, 0.5$ and $0.75$ , and explains similar features with those in Liu et al.⁸ Note that the fitted residuals (marked by “Res” in Table 5) illustrate that additive quantile mixed effect model (AQMM) is more suitable as it contributes the between-subject variability by means of random effects, the change of cumulative costs is relatively flat among various quantile levels. On the other hand, EAW estimator based on either Liu et al. or AQMM could maintain the optimistic result among different probability levels, while candidate models performs differently. It is consistent with the conclusions of simulation studies.

Table 5.

Fitting residuals, 5-year and total period estimated cumulative quantile costs for complete MADIT dataset with different specifications of $f (t)$ . Each pair $(a, b)$ represents the estimators of Liu et al. and the AQMM, respectively.

$τ$	$f (t)$	Res	Year 1	Year 2	Year 3	Year 4	Year 5	Total
0.25	$t$	(0.244, 0.165)	(14994.9, 25965.9)	(22126.3, 35858.4)	(30010.7, 46002.8)	(38714.6, 55162.6)	(50143.9, 67284.4)	(50477.0, 67502.7)
	$\log (t)$	(0.239, 0.156)	(17054.9, 25385.7)	(23851.2, 33152.1)	(31236.2, 41556.1)	(39218.9, 49856.2)	(49807.8, 61978.3)	(50122.1, 62233.1)
	EAW	(0.240, 0.163)	(15808.7, 24421.2)	(22403.7, 32936.1)	(29596.9, 41872.7)	(37403.5, 50268.9)	(47723.5, 61847.4)	(48034.7, 62075.8)
0.5	$t$	(0.295, 0.179)	(34057.1, 32324.2)	(49548.2, 47122.3)	(67216.7, 64547.2)	(87270.7, 82876.0)	(115124, 110822)	(115659, 111360)
	$\log (t)$	(0.291, 0.170)	(35683.4, 34328.1)	(49955.8, 48064.4)	(65798.0, 63780.7)	(83348.8, 79930.6)	(107456, 104107)	(107954, 104611)
	EAW	(0.293, 0.173)	(33559.3, 32111.3)	(47427.9, 45482.5)	(62916.5, 60892.3)	(80154.9, 76831.5)	(103880, 100760)	(104372, 101260)
0.75	$t$	(0.224, 0.169)	(75748.2, 38218.7)	(109885, 55962.3)	(152986, 76972.3)	(191616, 99362.4)	(263317, 133616)	(264146, 134270)
	$\log (t)$	(0.224, 0.164)	(76145.9, 44585.7)	(107538, 62092.7)	(146112, 82036.1)	(180552, 102626)	(242617, 133366)	(243389, 133970)
	EAW	(0.224, 0.164)	(71899.8, 39855.1)	(102233, 56616.5)	(139718, 75990.3)	(173305, 96243.2)	(234000, 126702)	(234763, 127308)

MADIT: multicenter automatic defibrillator implantation trial; AQMM: additive quantile mixed effect models; EAW: exponential aggregation weighting.

In order to inspect the efficiency of different models we randomly split 120 of the 181 subjects from the data set to train the specified models, and use the remaining 61 subjects to the out-of-sample test. The prediction test prediction risk (PT-PR) and the prediction test MSE (PT-RMSE) of the estimated $Q_{τ}$ are proposed to test the goodness of fit for the models of MADIT data, where

\begin{aligned} PT-PR (τ) = \frac{1}{61} \sum_{i = 1}^{61} \frac{1}{n_{i}} \sum_{t = 1}^{n_{i}} ρ_{τ} {y_{i} (t) - {\hat{q}}_{τ} (x_{i} (t)) - t {\hat{u}}_{1, i} - {\hat{u}}_{2, i}} \end{aligned}

\begin{aligned} PT-RMSE (τ) = {[\frac{1}{61} \sum_{i = 1}^{61} \frac{1}{n_{i}} \sum_{t = 1}^{n_{i}} {y_{i} (t) - {\hat{q}}_{τ} (x_{i} (t)) - t {\hat{u}}_{1, i} - {\hat{u}}_{2, i}}^{2}]}^{1 / 2} \end{aligned}

(y_{i} (t), x_{i} (t), t) \in {Test Dataset}

{\hat{u}}_{1, i}

{\hat{u}}_{2, i}

are the respective BLP of random effects for test data and

t {\hat{u}}_{1, i} + {\hat{u}}_{2, i} = 0

for Liu et al. The split procedure is repeated by 50 independent times and the results are presented in Table 6, which visually shows that the fitting effect of different models is discrepant in different probability levels. For instance, two AQMMs report less PT-PR and PT-RMSE than those of Liu et al. in most cases, and the horizontal comparison shows that the mixed effect models with different

f (t)

have their own merits for different values of

τ

(e.g. the model with

f (t) = t

is better fitted as

τ

values far away from the median), while Liu et al. with

f (t) = \log (t)

is more suitable than that with

f (t) = t

. The diversities are enlarged by importing the random effects. However, the EAW estimator overcomes the uncertainty, outperforms compared to a particular one in the case of large gap of different fitted models, the corresponding aggregated weights perform almost the same propensities as those of EAW results as well. Consequently, it can be considered as the “uniformly optimal fitting” among candidate specifications and different quantile levels for MADIT data.

Table 6.

PT-PRs, PT-RMSEs (in parentheses) and aggregated weights (the second row of each group) for MADIT data: linear quantile models constructed in Liu et al., AQMM, and EAW-estimation.

		Specification of $f (t)$
$τ$	Model	$t$	$\log (t)$	EAW-Estimation
0.1	Liu et al.	0.137(1.523)	0.136(1.504)	0.136(1.510)
		0.363	0.637	–
	AQMM	0.102(0.854)	0.103(0.873)	0.102(0.862)
		0.522	0.478	–
0.25	Liu et al.	0.235(0.789)	0.233(0.777)	0.234(0.779)
		0.295	0.705	–
	AQMM	0.154(0.417)	0.154(0.417)	0.154(0.414)
		0.437	0.563	–
0.5	Liu et al.	0.279(0.568)	0.278(0.560)	0.278(0.562)
		0.293	0.707	–
	AQMM	0.168(0.269)	0.165(0.257)	0.166(0.260)
		0.333	0.667	–
0.75	Liu et al.	0.226(0.799)	0.225(0.786)	0.225(0.790)
		0.290	0.710	–
	AQMM	0.159(0.408)	0.160(0.440)	0.160(0.417)
		0.461	0.539	–
0.9	Liu et al.	0.126(1.345)	0.126(1.333)	0.126(1.336)
		0.423	0.577	–
	AQMM	0.107(0.908)	0.107(0.948)	0.107(0.948)
		0.460	0.540	–

Note: The best indicator among candidates are presented in bold.

AQMM: additive quantile mixed effect models; PT-PR: prediction test prediction risk; PT-RMSE: prediction test root mean squared error; MADIT: multicenter automatic defibrillator implantation trial; EAW: exponential aggregation weighting.

5. Concluding remarks

This paper investigates the adaptive estimation for longitudinal quantile regression via EAW algorithm. Under time-dependent covariates and right-censored history process, EAW estimator and IPCW method are combined to derive $τ$ -CQF for mixed effect model. Based on the secondary smoothing approximation of the check loss function, the oracle inequalities are easily established. We show that the estimator of $τ$ -CQF is consistent as long as one of the candidate converges to the correct specification. Further, the BLP of the random effects are modified by EAW. The applicability of the method has emerged in MADIT data analysis.

It is worthy to point out that the secondary smoothing loss has some good theoretical properties: It supplies the differentiability at origin, guarantees the convexity and strong smoothness as well. On the other hand, the quadratic smooth pattern provides the analogous strong convexity as that of squared-type losses, which is essential for the risk bound under squared error loss, and simplifies the verification of the consistency of cumulative quantile function estimator. However, in practice, smoothing parameters should not be valued too large to affect the aggregated result, searching a convenient recommendation for $h$ and $a (h)$ among a real dataset is not an effortless work that remains to be further explored.

The idea of aggregation is not difficult to be extended into fixed effect models: when the individual effect is specified, candidates of $q_{τ} (x (t))$ are estimated by the corresponding methods (Kato et al.³⁰ and Koenker,⁹ among others) and EAW estimator is weighted from each ${\hat{q}}_{τ}^{(k)} (x (t))$ . Due to the layout we did not discuss this in detail, which will be potential in the sense of complex censoring mechanisms as well. Through the paper we use a nonparameter estimation for censoring and the methods in Galvao et al.,⁶ Harding and Lamarche⁷ are both under a specific procedure of the mechanism. One can naturally generalize it as a multiply robust case when both the conditional quantile and the response probability tend to be misspecified. Besides, the aggregated estimator of (4) depends on the linear structure of random effects and we wonder how sensitive is the performance to varying degrees of misspecification. These will become valuable study in our further research.

Supplemental Material

sj-zip-1-smm-10.1177_09622802231164730 - Supplemental material for Adaptive aggregation for longitudinal quantile regression based on censored history process

Supplemental material, sj-zip-1-smm-10.1177_09622802231164730 for Adaptive aggregation for longitudinal quantile regression based on censored history process by Wei Xiong, Dianliang Deng, Dehui Wang and Wanying Zhang in Statistical Methods in Medical Research

Footnotes

Acknowledgments

The authors thank the Editor, Associate Editor, and Referees for their insightful comments. They thank all who helped them in writing this LATEX sample file.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by National Natural Science Foundation of China (No. 12271231, 12001229, 11901053), China Scholarship Council and Natural Sciences and Engineering Research Council of Canada (NSERC).

ORCID iD

Dianliang Deng

Supplemental material

Supplementary material for this article is available online.

Appendix

In Appendix, we give the proof of theoretical results in main sections. As we considered the secondary approximation of $ρ_{τ} (\cdot)$ in aggregation procedure, the following lemma presents the $L$ -smoothness and strong convexity of $ρ_{τ, h}^{*} (\cdot)$ , which is essential to verify the risk bounds of EAW estimator.

References

Koenker

Bassett Jr.

. Regression quantiles. Economet: J Economet Soc 1978; 46: 33–50.

Yin

Cai

. Quantile regression models with multivariate failure time data. Biometrics 2005; 61: 151–161.

Fan

. Weighted quantile regression for longitudinal data. Comput Stat 2015; 30: 569–592.

Tang

Wang

Zhu

. Variable selection in quantile varying coefficient models with longitudinal data. Comput Stat Data Anal 2013; 57: 435–449.

Wang

Sun

. Efficient parameter estimation and variable selection in partial linear varying coefficient quantile regression model with longitudinal data. Stat Pap 2020; 61: 967–995.

Galvao

Lamarche

Lima

. Estimation of censored quantile regression for panel data with fixed effects. J Am Stat Assoc 2013; 108: 1075–1089.

Harding

Lamarche

. A panel quantile approach to attrition bias in big data: Evidence from a randomized experiment. J Econom 2019; 211: 61–82.

Liu

Deng

Wang

. Estimating the quantile medical cost under time-dependent covariates and right censored time-to-event variable based on a state process. Stat Methods Med Res 2020; 29: 2041–2062.

Koenker

. Quantile regression for longitudinal data. J Multivar Anal 2004; 91: 74–89.

10.

Geraci

Bottai

. Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 2007; 8: 140–154.

11.

Kulkarni

Biswas

Das

. A joint quantile regression model for multiple longitudinal outcomes. AStA Adv Stat Anal 2019; 103: 453–473.

12.

Geraci

. Linear quantile mixed models: The LQMM package for Laplace quantile regression. J Stat Softw 2014; 57: 1–29.

13.

Geraci

. Additive quantile regression for clustered data with an application to children’s physical activity. J R Stat Soc Ser C Appl Stat 2019; 68: 1071.

14.

Machado

. Robust model selection and M-estimation. Econ Theory 1993; 9: 478–493.

15.

Quantile

Koenker

Regression

. (2005). Econom Soc Mono. Cambridge: Cambridge University Press

16.

Hjort

Claeskens

. Frequentist model average estimators. J Am Stat Assoc 2003; 98: 879–899.

17.

Hansen

. Least squares model averaging. Econometrica 2007; 75: 1175–1189.

18.

. Jackknife model averaging for quantile regressions. J Econom 2015; 188: 40–58.

19.

Wang

Zhang

Wan

, et al. Jackknife model averaging for high-dimensional quantile regression. Biometrics 2021.

20.

Yang

. Adaptive regression by mixing. J Am Stat Assoc 2001; 96: 574–588.

21.

Shan

Yang

. Combining regression quantile estimators. Stat Sin 2009; 19: 1171–1191.

22.

Zou

. Aggregated expectile regression by exponential weighting. Stat Sin 2019; 29: 671–692.

23.

Jennings

Wong

Teo

. Optimal control computation to account for eccentric movement. The ANZIAM J 1996; 38: 182–193.

24.

Aravkin

Kambadur

Lozano

, et al. Sparse quantile huber regression for efficient and robust estimation. arXiv preprint, 2014; arXiv: 1402.4624.

25.

Koenker

Machado

JAF

. Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 1999; 94: 1296–1310.

26.

Geraci

Bottai

. Linear quantile mixed models. Stat Comput 2014; 24: 461–479.

27.

Deng

. Estimating the cumulative mean function for history process with time-dependent covariates and censoring mechanism. Stat Med 2016; 35: 4624–4636.

28.

Fan

. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 2004; 99: 710–723.

29.

Zeng

Cai

. Asymptotic results for maximum likelihood estimation in joint analysis of repeat measurements and survival time. Ann Stat 2005; 33: 2132–2163.

30.

Kato

Galvao

Montes-Rojas

. Asymptotics for panel quantile regression models with individual effects. J Econom 2012; 170: 76–91.

31.

Mkhadri

Ouhourane

Oualkacha

. A coordinate descent algorithm for computing penalized smooth quantile regression. Stat Comput 2017; 27: 865–883.

32.

Nesterov

. Introductory lectures on convex optimization: A basic course (Vol. 87). Springer Science

&

Business Media, 2003.

33.

Catoni

. Statistical learning theory and stochastic optimization: Ecole d’Eté de Probabilités de Saint-Flour, XXXI-2001 (Vol. 1851). Springer Science

&

Business Media, 2004.

34.

Vershynin

. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint, arXiv:1011.3027, 2010.

35.

Phadia

Van Ryzin

. A note on convergence rates for the product limit estimator. Ann Stat 1980; 8: 673–678.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

14.13 MB

0.00 MB