Sage Journals: Discover world-class research

Abstract

Predicting the risk of death for chronic patients is highly valuable for informed medical decision-making. This paper proposes a general framework for dynamic prediction of the risk of death of a patient given her hospitalization history. Predictions are based on a joint model for the death and hospitalization processes, thereby avoiding the potential bias arising from selection of survivors. The framework is valid for arbitrary models for the hospitalization process—it does not require independence of hospitalization times nor gap times. In particular, we study the prediction of the risk of death in a renewal model for hospitalizations—a common approach to recurrent event modeling. In the renewal model, the distribution of hospitalizations throughout the follow-up period impacts the risk of death. This result differs from the prediction of death when considering the Poisson model for the hospitalization process, previously studied, where only the number of hospitalizations matters. We apply our methodology to a prospective, observational cohort study of 512 patients treated for chronic obstructive pulmonary disease in one of six outpatient respiratory clinics run by the Respiratory Service of Galdakao University Hospital, with a median follow-up of 4.7 years. We find that more concentrated hospitalizations increase the risk of death and that the hazard ratio for death continuously increases as the number of hospitalizations increases during follow-up.

Keywords

Prediction joint model frailty renewal process hospitalization process chronic obstructive pulmonary disease (COPD)

1. Introduction

When studying disease evolution in chronic patients, data on the patient’s hospitalization history is often available. Recurrent hospitalizations may indicate worsening of the disease and are therefore expected to be informative about the risk of death. Hence, several works have studied the relationship between hospitalizations and death.^1–4 Developing prediction frameworks for the risk of death that account for the history of hospitalizations of the patient may be crucial for the effective management of patients. This paper provides a dynamic prediction framework under a joint frailty model for death and hospitalizations, considering that hospitalization times may be dependent within a patient. The joint modeling approach overcomes the informative censoring problem in the simultaneous analysis of death and hospitalization recurrences.⁵ Moreover, compared to the prediction of risk with baseline patient characteristics, dynamic prediction allows for continuously updated risk assessments that incorporate a patient’s evolving clinical history. It is therefore particularly suitable for predicting the risk of death of chronic patients, who are monitored during their disease’s evolution.

Our development is motivated by the need to better understand the relationship between recurrent hospitalizations and mortality in patients with chronic obstructive pulmonary disease (COPD), a connection that has been well-documented as a critical factor in disease progression and long-term prognosis. For instance, Soler-Cataluña et al.⁴ found that severe exacerbations requiring hospitalization are a significant predictor of mortality in COPD patients; Serra-Picamal et al.⁶ observed that the number of re-hospitalizations in COPD patients is directly related to increased all-cause mortality, emphasizing the prognostic significance of recurrent hospital admissions. Suissa et al.³ studied this relationship in a Cox-regression framework, finding that subsequent hospitalizations increased the risk of death and that the time between hospitalizations decreases over the course of follow-up. We contribute to the understanding of hospitalizations and death for COPD patients by estimating a joint model (avoiding the informative censoring bias) and predicting the risk of death given the history of hospitalizations. Specifically, our methodological contribution is guided by the following clinical hypothesis “The risk of death is larger for patients who experience hospitalizations that are concentrated in time.”

In this work, we develop a dynamic prediction framework that is valid for any model of the hospitalization process. In particular, we study and compare the risk of death when the Poisson (calendar timescale) and renewal (gap timescale) models are used to model the hospitalization process. In addition, we show that the distribution of hospitalizations throughout follow-up impacts the risk of death in the renewal model. This result differs from the prediction of death within a Poisson model, where only the number of hospitalizations matters.⁷ Therefore, we contribute to the study of the modeling implications of the choice of the timescale.⁸ In addition, we find that in the renewal model, monotonicity of the baseline hazard for hospitalizations characterizes whether concentrated or dispersed hospitalizations lead to a higher risk of death. Our results help link clinical knowledge with features of the hospitalization process. We are able to confirm the clinical hypothesis for COPD patients: concentrated hospitalizations lead to a higher risk of death.

Dynamic prediction is an active research area.^7,9–12 Mauguen et al.⁷ studied dynamic prediction of the risk of a terminal event given a Poisson model for recurrent events—that is, under the assumption that recurrent event times are independent. We relax this hypothesis and generalize their results to allow for arbitrary models for the recurrent event. The general model encompasses, among others, the renewal model for recurrent events, which is frequently used for the hospitalization process.^2,3,13 Several works have been done to relax the conditional independence assumption for joint models by introducing a copula to account for the dependence between the terminal and recurrent processes.^9,14 Following this strand, Liang et al.¹⁰ propose a Bayesian hierarchical model, where the copula parameter is allowed to be individual and recurrence specific. These works provide a more flexible structure for the dependence between the terminal and recurrent events, while maintaining independence between recurrent event gap times. The work proposed in this paper advances in a parallel path. We do not require independence between recurrent event times nor gap times, allowing for arbitrary models for the recurrent event, although we maintain conditional independence between the terminal and recurrent events.

This paper is organized as follows. Section 2 introduces the motivating case: the study of the risk of death of COPD patients. Section 3 presents the methodological contributions of the paper. In Section 4, we apply our results to predict the risk of death, given the hospitalization history, for COPD patients. Section 5 concludes with a discussion. A more detailed technical discussion of the methodological contributions and the results is presented in the Appendixes.

2. Motivating study: Risk of death of COPD patients

COPD’s influence on global health and healthcare policies is significant and expanding. It is currently one of the most paradigmatic chronic respiratory conditions, with projections indicating further escalation in the years ahead.¹⁵ COPD accounted for 3% of all deaths in 2021 across OECD countries,¹⁶ and it was the third leading cause of death in 2019 according to the World Health Organization March 2023 report. The burden of disease will continue to increase, especially in women and regions with low to medium gross domestic product or income.¹⁷

Exacerbations, and particularly hospitalizations, represent critical events in the management of COPD due to their substantial negative consequences.¹⁸ Hospitalizations have a significant clinical impact on patients with COPD. They further compromise already impaired pulmonary function and health-related quality of life.¹⁹ Moreover, hospitalizations increase the risk of subsequent readmissions, cardiovascular events, and mortality both during and after the acute episode.^4,20,21 The cycle of hospitalization and readmission not only elevates healthcare costs but also intensifies these adverse outcomes for patients.

Several prediction models have been developed for COPD readmissions or mortality risk.^22–25 Furthermore, the increasing hospitalization frequency has been shown to increase the risk of mortality.³ Therefore, given the importance of these two outcomes (hospitalization and mortality) and their intrinsic relationship, joint modeling of both processes is necessary for two reasons. On the one hand, it reduces the potential bias present in Cox regression models. The cause of the bias is the fact that long-lived patients tend to experience more hospitalizations, and this could mask the accentuating effect of hospitalizations on the risk of death. On the other hand, it allows us to evaluate the effect that hospitalization history (frequency and distribution) has on mortality and to make predictions of mortality risk based on the history of hospitalizations during follow-up. In the COPD literature, joint modeling has primarily focused on linking longitudinal biomarkers—such as forced expiratory volume trajectories—with the risk of exacerbations.²⁶ However, although recurrent exacerbations and hospitalizations are crucial for predicting mortality in COPD patients, the application of joint frailty models to examine the relationship between these events and death remains underexplored. This presents an opportunity to improve our understanding of disease progression and improve individualized risk prediction in COPD.

In this work, we considered a prospective, observational cohort study of 512 patients recruited after being treated for COPD in one of six outpatient respiratory clinics run by the Respiratory Service of Galdakao University Hospital. Patients were consecutively included in the study if they had been diagnosed with COPD for at least 6 months and had been stable for at least 6 weeks. The protocol was approved by the Ethics and Research Committees of the hospital (reference 16/14). All candidate patients were given detailed information about the study, and all those included provided written informed consent. Sociodemographic, smoking habits, and clinical variables were recorded. Pulmonary function tests included forced spirometry and body plethysmography, and measurements of carbon monoxide diffusing capacity in percentage (DLCO). These tests were performed in accordance with the standards of the European Respiratory Society.²⁷ For theoretical values, we considered those of the European Community for Steel and Coal.²⁸ All variables were measured at baseline. The median follow-up time was 4.7 years (interquartile range: 2.66 to 5.13 years). Patient hospitalizations were reviewed during the follow-up period. A brief description of the main variables used for this study is presented in Table 1. More detailed information regarding the dataset can be found elsewhere.²⁹

Table 1.
Descriptive statistics for the whole sample.

Number NA Median Interquartile range

Patients 512

Death events 95 (18.5%)

Follow-up time (years) 4.70 2.66–5.13

Hospitalization events (total) 496

Hospitalizations per patient 0 0–1

Hospitalization length (days) 4 3–7

Variables:

Age (years) 0 65 59–71

Female 130 (25.4%) 0

DLCO 5 61.0 45.8–76.6

${FEV}_{1}$ 0 55.6 44.9–68.0

Previous hosp. $= 1$ 100 (19.5%) 0

Previous hosp. $\geq 2$ 72 (14.1%) 0

	Number	NA	Median	Interquartile range
Patients	512
Death events	95 (18.5%)
Follow-up time (years)			4.70	2.66–5.13
Hospitalization events (total)	496
Hospitalizations per patient			0	0–1
Hospitalization length (days)			4	3–7
Variables:
Age (years)		0	65	59–71
Female	130 (25.4%)	0
DLCO		5	61.0	45.8–76.6
${FEV}_{1}$		0	55.6	44.9–68.0
Previous hosp. $= 1$	100 (19.5%)	0
Previous hosp. $\geq 2$	72 (14.1%)	0

DLCO: carbon monoxide diffusing capacity in percentage; ${FEV}_{1}$ : forced expiratory volume in one second in percentage; Previous hosp: hospitalizations during the 2 years prior to the start of follow-up.

3. Dynamic prediction of the risk of death given hospitalization history

3.1. Joint frailty model for death and hospitalization

We briefly introduce the joint frailty model for death, the terminal event, and hospitalization, the recurrent event. A more detailed description can be found in previous work.^1,5,30,31 For each patient $i$ in the sample, we observe the following: (i) the death or censoring time $T_{i}^{d}$ , (ii) $δ_{i}$ an indicator of whether the individual was censored, (iii) $J_{i} \geq 0$ the number of hospitalizations prior to censoring or death, (iv) $(T_{i j}^{r})_{j = 1}^{J_{i}}$ the times for each hospitalization (an empty vector if $J_{i} = 0$ ), and (v) $Z_{i}$ a vector of baseline covariates.

Estimation and prediction are based on conditional independence of the death and hospitalization processes given a frailty variable $u_{i}$ , with density $g$ supported on $[0, \infty)$ . The history of the processes for patient $i$ up to just before time $t$ , $F_{i} (t)$ , consists of knowledge about $T_{i}^{d} \geq t$ , the frailty variable $u_{i}$ , the covariates $Z_{i}$ , and the history of hospitalizations up to just before time $t$ . Let $H_{i} (t)$ denote the history of hospitalizations prior to $t$ . This consists of the number and timing of patient $i$ ’s hospitalizations before time $t$ :

\begin{aligned} H_{i} (t) = (J_{i} (t), (T_{i j}^{r})_{j = 1}^{J_{i} (t)}), where J_{i} (t) = \sum_{j = 1}^{J_{i}} 1 (T_{i j}^{r} < t) \end{aligned}

The risks for the death and hospitalization processes,

α^{d}

and

α^{r}

, respectively, follow a proportional hazards model with frailty:

\begin{aligned} \begin{aligned} α^{d} (t | F_{i} (t)) & = u_{i}^{γ} \cdot \exp (β_{d}^{'} Z_{i}^{d}) \cdot λ_{0}^{d} (t), and \\ α^{r} (t | F_{i} (t)) & = u_{i} \cdot \exp (β_{r}^{'} Z_{i}^{r}) \cdot λ^{r} (t | H_{i} (t)) \end{aligned} \end{aligned}

(3.1)

In the above equations,

Z_{i}^{d}

and

Z_{i}^{r}

are two (possibly overlapping) collections of the variables in

Z_{i}

β_{d}

and

β_{r}

are their corresponding coefficients, and

γ

is a parameter that characterizes the dependence between the processes. Estimation of the model is conducted with maximum likelihood, once a distribution for the frailty variable

u_{i}

has been specified.³⁰

The function $λ_{0}^{d}$ is the baseline hazard function for the death process, and $λ^{r}$ is the hazard function for the next hospitalization given their history. Various models may be considered for $λ^{r}$ , depending on the desired approach for modeling the time intervals between hospitalizations. In this work, we focus on the non-homogeneous Poisson model (calendar timescale) and the renewal model (gap timescale). In the non-homogenous Poisson model, $λ^{r} (t | H_{i} (t)) = λ_{0}^{r} (t)$ , being $λ_{0}^{r}$ the baseline hazard for hospitalizations. In the renewal model, $λ^{r} (t | H_{i} (t)) = λ_{0}^{r} (t - T_{i, J_{i} (t)})$ (cf. with equation (1.5) in Cook and Lawless³²).

3.2. Dynamic prediction in joint frailty models

We have developed an expression for the probability of a patient dying between $T$ and $T + w$ , given the observed hospitalization history, as implied by the model in (3.1). The key fact of this new expression is that it is valid for any model for the hospitalization process, that is, for any specification of $λ^{r}$ .

Consider that we have followed up a patient for $T$ years. We have thus information regarding her hospitalization history during those years—we know that she was hospitalized exactly in $J \geq 0$ occasions and that these happened at times $t_{1}, \dots, t_{J}$ . We refer to this realization of the hospitalization history during follow-up by $h (T) = (J, (t_{j})_{j = 1}^{J})$ . That is, $h (T)$ is a specific realization of the random history $H_{i} (T)$ . Also, let $z$ be a realization of the random covariates $Z_{i}$ . For $T, w \geq 0$ , the probability of interest is

\begin{aligned} P (T, w | z, h (T)) & = P (T_{i}^{d} \leq T + w | T_{i}^{d} \geq T, Z_{i} = z, H_{i} (T) = h (T)) \\ = \frac{\int_{0}^{\infty} [S_{0}^{d} {(T)}^{C_{d} u^{γ}} - S_{0}^{d} (T + w)^{C_{d} u^{γ}}] \cdot u^{J} \cdot S^{r} (T | h (T))^{C_{r} u} \cdot g (u) d u}{\int_{0}^{\infty} S_{0}^{d} {(T)}^{C_{d} u^{γ}} \cdot u^{J} \cdot S^{r} (T | h (T))^{C_{r} u} \cdot g (u) d u} \end{aligned}

(3.2)

In the above expression,

C_{d} = \exp {β_{d}^{'} z^{d}}

and

C_{r} = \exp {β_{r}^{'} z^{r}}

are the proportionality indexes for the death and hospitalization processes, respectively. The function

S_{0}^{d} (t) = \exp {- \int_{0}^{t} λ_{0}^{d} (s) d s}

is the baseline survival function for death. The function

\begin{aligned} S^{r} (t | h (t)) = \exp {- \int_{0}^{t} λ^{r} (s | h (s)) d s} \end{aligned}

is the survival function for hospitalizations given their history. Note that for each

s \in [0, t]

h (s)

only considers the hospitalizations that happened prior to

s

Derivation of equation (3.2) is based on the independence of the death and hospitalization processes given frailty (and covariates) and applications of Bayes’ rule. The result relies on the density of the hospitalization history provided in Theorem 2.1 in Cook and Lawless.³² Mauguen et al.,⁷ $^{, p .5370}$ in contrast, derive their prediction results using “independence of patient recurrent event times.” In our general model, we relax the assumption of independence so that hospitalization times may be dependent. We refer to Appendix A for more details.

Equation (3.2) is valid for any model of the hazard for the next hospitalization given their history: $λ^{r}$ . This hazard may depend on the history of hospitalizations $h (t)$ in various ways. The choice of the hospitalization model enters into the prediction through the shape of the survival function for hospitalizations given their history $S^{r} (t | h (t))$ . Below, we discuss the Poisson and renewal models.

Poisson model for hospitalizations

The hazard for the next hospitalization at time $t \in [0, T]$ is $λ^{r} (t | h (t)) = λ_{0}^{r} (t)$ , being $λ_{0}^{r}$ the baseline hazard for hospitalizations. That is, the hazard for the next hospitalization is independent of the history. Therefore, the survival function for hospitalizations at the end of the follow-up period is $S^{r} (T | h (T)) = S_{0}^{r} (T)$ , where $S_{0}^{r} (t) = \exp {- \int_{0}^{t} λ_{0}^{r} (s) d s}$ is the baseline survival function for hospitalizations. Plugging in this into equation (3.2) gives equation (5) in Mauguen et al.⁷

Renewal model for hospitalizations

The hazard for the next hospitalization given their history at time $t \in [0, T]$ is $λ^{r} (t | h (t)) = λ_{0}^{r} (t - t_{J (t)})$ , where $J (t)$ is the number of hospitalizations before time $t$ . For convenience, set $t_{0} = 0$ and $t_{J + 1} = T$ . Partitioning the follow-up period $[0, T]$ into $J + 1$ inter-hospitalization intervals leads to

\begin{aligned} S^{r} (T | h (T)) & = \prod_{j = 1}^{J + 1} \exp {- \int_{t_{j - 1}}^{t_{j}} λ_{0}^{r} (s - t_{J (s)}) d s} = \prod_{j = 1}^{J + 1} \exp {- \int_{t_{j - 1}}^{t_{j}} λ_{0}^{r} (s - t_{j - 1}) d s} \\ = \prod_{j = 1}^{J + 1} \exp {- \int_{0}^{t_{j} - t_{j - 1}} λ_{0}^{r} (s) d s} = \prod_{j = 1}^{J + 1} S_{0}^{r} (t_{j} - t_{j - 1}) \end{aligned}

In the Poisson model, since $S^{r}$ does not depend on $(t_{1}, \dots, t_{J})$ , only the number of hospitalizations $J$ matters to predict the risk of death. In contrast, in the renewal model, the survival function for hospitalizations given their history depends on the hospitalization gap times $t_{j} - t_{j - 1}$ . Hence, following equation (3.2), the risk of death for two patients, both having $J = 2$ hospitalizations, may be different. This result adds to the study of the modeling implications of the choice of the timescale.⁸ Indeed, “the choice of the timescale has to be made with the advice of clinicians” (Mauguen et al.⁷ $^{,p.5373}$ ) and “may be driven by features of the underlying process, the objectives of analyses, or the results of model checking (Cook and Lawless³² $^{,p.10}$ ). This result and the upcoming section contribute to these ends by linking clinical knowledge with features of the model for the hospitalization process.

3.3. Renewal model: Relevance of the distribution of hospitalizations

Under the renewal model, the distribution of hospitalization plays a significant role in predicting the risk of death. A natural question to ask then is which pattern of hospitalization times leads to the highest risk of death. Is it when hospitalizations are concentrated or spread out through the follow-up period? For instance, the clinical hypothesis in the study of COPD patients is that concentrated hospitalizations lead to a higher risk of death. In this section, we show that the answer depends on two features of the model: the relationship between the two processes ( $γ$ ) and the shape of the baseline hazard function for hospitalizations ( $λ_{0}^{r}$ ). A formal proof of the results is presented in Appendix B.

We consider a patient with $J \geq 1$ hospitalizations at times $t_{1}, \dots, t_{J}$ . The patient must have at least one hospitalization to discuss the effect of its timing on the risk of death. In the renewal model, the survival function of hospitalizations given their history at time $T$ is given by

S^{r} (T | h (T)) = \prod_{j = 1}^{J + 1} S_{0}^{r} (t_{j} - t_{j - 1})

(3.3)

For a fixed number of hospitalizations

J

, the value of the survival function

S^{r} (T | h (T))

is determined by the gap times

t_{j} - t_{j - 1}

. We say that hospitalizations are dispersed when they are equispaced in the

[0, T]

interval. For instance, a patient with two hospitalizations during the first year of follow-up, one at month 4 and the other at month 8, has dispersed hospitalizations. This translates into three equal gap times of 4 months. Conversely, we regard hospitalizations as concentrated when they are close to each other. For instance, a patient with two hospitalizations at months 8 and 10 has concentrated hospitalizations. This translates into a large gap time of 8 months, followed by two small gap times of 2 months.

Two patients with different distributions of hospitalizations will have a different value of the survival function $S^{r} (T | h (T))$ , even if both have experienced the same number of hospitalizations. To better understand the relationship between the hospitalization pattern and the risk of death, it is worth focusing on the cumulative hazard function for hospitalizations:

\begin{aligned} Λ^{r} (T | h (T)) = \int_{0}^{T} λ^{r} (s | h (s)) d s = \sum_{j = 1}^{J + 1} \int_{t_{j - 1}}^{t_{j}} λ_{0}^{r} (s - t_{J (s)}) d s \end{aligned}

The cumulative hazard function is the area under the hazard function of hospitalizations given their history. Note that there is an inverse relationship between the cumulative hazard and the survival function:

S^{r} (T | h (T)) = \exp {- Λ^{r} (T | h (T))}

, cf. equations (1.3) and (1.5) in Aalen et al.³³ The goal is to find which distribution of hospitalizations leads to larger values of the cumulative hazard (i.e. smaller values of the survival function) at time

T

. We show that this depends on whether the baseline hazard for hospitalizations is increasing or decreasing.

Let us consider two patients under two different scenarios for the baseline hazards. During the first year of follow-up ( $T = 12$ months), Patient A has dispersed hospitalizations at times $(t_{1}, t_{2}) = (4, 8)$ . Patient B has concentrated hospitalizations at times $(t_{1}, t_{2}) = (8, 10)$ . In Scenario 1, the baseline hazard for hospitalizations is increasing: $λ_{0}^{r} (t) = t$ , $t \in [0, T]$ . In Scenario 2, the baseline hazard is decreasing: $λ_{0}^{r} (t) = 12 - t$ , $t \in [0, T]$ .

The first column of Figure 1 plots the “timing of the last hospitalization before $t$ ” variables $t_{J_{A} (t)}$ and $t_{J_{B} (t)}$ . For instance, for Patient A (top row), this variable jumps from 0 to 4 at time $t = 4$ , since from that time onwards the last hospitalization happened at time $t_{1} = 4$ . These are the building blocks to compute the hazard for hospitalizations given their history in the renewal model.

Figure 1.

First column: Timing of the last hospitalization before time $t$ for Patient A (top row) and Patient B (bottom row). Second and third columns: Hazard and cumulative hazard for hospitalizations given their history. The second column corresponds to an increasing hazard $λ_{0}^{r} (t) = t$ (Scenario 1) and the third column to a decreasing hazard $λ_{0}^{r} (t) = 12 - t$ (Scenario 2). Top row corresponds to Patient A (dispersed hospitalizations at $t_{1} = 4$ and $t_{2} = 8$ ) and the bottom row to Patient B (concentrated hospitalizations at $t_{1} = 8$ and $t_{2} = 10$ ).

The second and third columns of Figure 1 plot the hazard for hospitalizations given their history $λ^{r} (t | h (t)) = λ_{0}^{r} (t - t_{J (t)})$ , for $t \in [0, T]$ . The second column corresponds to an increasing baseline hazard $λ_{0}^{r} (t) = t$ (Scenario 1) and the third column to a decreasing baseline hazard $λ_{0}^{r} (t) = 12 - t$ (Scenario 2). The top row corresponds to Patient A (dispersed hospitalizations) and the bottom row to Patient B (concentrated hospitalizations). The cumulative hazard (the area under the hazard) is highlighted in all plots. We see that, when the baseline hazard is increasing (Scenario 1), Patient A accumulates less hazard than Patient B. In particular, the values for the cumulative hazard at time $T$ are $24$ and $28$ for Patients A and B, respectively. The opposite happens when the baseline hazard is decreasing (Scenario 2). Patient A accumulates more hazard than Patient B; in particular, the values for the cumulative hazard at time $T$ are $120$ and $116$ for Patients A and B, respectively.

The fact that the relationship between the distribution of hospitalizations and the cumulative hazard (and hence the survival function) depends on the shape of the baseline hazard can be generalized beyond the example (cf. Proposition 1 in Appendix B):

If the baseline hazard for hospitalizations is increasing, then dispersed hospitalizations lead to larger values of the survival function for hospitalizations at time $T$ . Conversely, if the baseline hazard is decreasing, then concentrated hospitalizations lead to larger values of the survival function.

Up to this point, we have studied the impact of the distribution of hospitalizations on the value of the survival function for hospitalizations at time $T$ . But what is the effect of the latter on the risk of death? Equation (3.2) characterizes the dependence of the risk of death on the value of the survival function for hospitalizations. We argue that the dependence is determined by the sign of parameter $γ$ . That is, on whether the death and hospitalization processes are positively or negatively related.

As we have already done, it is better to study the problem through the lenses of the cumulative hazard for hospitalizations. Consider Scenario 1, where Patient A accumulates less hazard than Patient B. When confronted with this fact, how does the model update the expected value of the frailty variable for each patient? Note that Patient B has been “exposed” to a larger cumulative hazard than Patient A, so Patient B was expected to experience more hospitalizations. However, they both experienced the same number of hospitalizations. Thus, Patient B must be less fragile than Patient A. In conclusion, patients with larger values of the cumulative hazard are expected to have smaller frailty values.

The preceding discussion may be rephrased as “when comparing two patients with the same number of hospitalizations, patients with larger values of the survival function for hospitalizations ( $S^{r} (t | h (t))$ ) are expected to have larger frailty values”. When $γ > 0$ , this translates into a higher risk of death. The opposite happens when $γ < 0$ . Note that this finding does not rely on a specific distribution for the frailty variable (e.g. the gamma distribution), since it rests on how the distribution is updated once hospitalizations are observed. The following result summarizes the above discussion (cf. Proposition 2 in Appendix B):

When the death and hospitalization processes are positively related ( $γ > 0$ ), larger values of $S^{r} (T | h (T))$ imply a higher risk of death. When the relation is negative ( $γ < 0$ ), larger values of $S^{r} (T | h (T))$ imply a lower risk of death. If the processes are unrelated ( $γ = 0$ ), the history of hospitalizations has no effect on the risk of death.

The two results introduced in this section describe how the distribution of hospitalizations affects the risk of death in the renewal model. Table 2 summarizes the main findings (see also Theorem 1 in Appendix B). For instance, when death and hospitalizations are positively related, and the baseline hazard for hospitalizations is decreasing, the risk of death is higher for patients with concentrated hospitalizations.

Table 2.

The hospitalization pattern that leads to a higher risk of death according to the renewal model is shown based on the distribution of hospitalizations and the relationship between death and hospitalization processes.

		Relationship between death and hospitalizations
		Positive ( $γ > 0$ )	Negative ( $γ < 0$ )
Baseline hazard ( $λ_{0}^{r}$ )	Increasing	Dispersed	Concentrated
	Decreasing	Concentrated	Dispersed

We would like to highlight that the results in Table 2 indicate that the renewal model for hospitalizations is flexible enough to provide a wide range of predictions for the risk of death. Depending on the parameters of the model— $γ$ and the shape of the baseline hazard for hospitalizations—either concentrated or dispersed hospitalizations will be beneficial regarding the risk of death. In practice, this result can be exploited in several ways. In the modeling process, one may incorporate clinical knowledge by imposing parameter restrictions on the model. Alternatively, one can estimate an unrestricted renewal model and then check whether the results match the prior clinical knowledge. We follow the latter approach in Section 4.1.

4. Risk of death given hospitalization history for COPD patients

We fit a joint frailty model for hospitalization and death of COPD patients. We consider Weibull baseline hazards for death and hospitalization. These are given by $λ_{0}^{e} (t) = σ_{e} t^{σ_{e} - 1} / b_{e}^{σ_{e}}$ for $e = d$ (death) and $e = r$ (hospitalizations). Weibull hazards are specifically suited for this problem: they allow us to rapidly understand how the distribution of hospitalizations will affect the risk of death. Indeed, if the shape parameter satisfies $σ_{r} > 1$ , the baseline hazard for hospitalizations is strictly increasing. If $σ_{r} < 1$ , the baseline hazard is strictly decreasing. We specified a gamma distribution for the frailty variable $u_{i}$ , with density $g (u) = u^{1 / θ - 1} \exp (- u / θ) / (θ^{1 / θ} Γ (1 / θ))$ . That is, $u_{i}$ has mean one and variance $θ$ . In Appendix C, we provide results for alternative baseline hazards (log-logistic and Gompertz) and frailty distributions (log-normal). Results are similar across all specifications.

We consider both a Poisson (calendar timescale) model and a renewal (gap timescale) model for hospitalizations. In the Poisson model, time is measured in “days since inclusion in the study.” In the renewal model, time is measured in “days since inclusion in the study” for the first hospitalization and “days since last hospitalization” for the following ones. We note that COPD patients tend to have short hospital stays: median of 4 days (see Table 1). Therefore, we keep the timescale as “days,” as opposed to “days out of hospital.”² Prediction results are more interpretable this way.

Table 3 shows the results obtained for the estimated joint models, considering both renewal and Poisson specifications for the hospitalization process. We have included age, sex, DLCO, and forced expiratory volume in 1 second in percentage ( ${FEV}_{1}$ ) as common covariates for modeling hospitalization and death processes, and the number of hospitalizations 2 years prior to the first evaluation, for the hospitalization process. This decision was made on the basis of results obtained in previous studies, as well as a statistical significance test at the 0.05 level. Similar estimates for all the covariates have been obtained in both renewal and Poisson models. In particular, in the renewal model, patients with lower values of ${FEV}_{1}$ or DLCO have a higher risk for hospitalization and dying: ${FEV}_{1}$ and DLCO hazard ratios (HRs) are 0.962 and 0.985 for hospitalization and 0.982 and 0.961 for death, respectively. Patients face a higher risk as age increases (HR = 1.106 and HR = 1.050 for hospitalization and death, respectively). The results are adjusted by sex, which was not statistically significant (the confidence interval for the HR includes 1). The number of hospitalizations in the 2 years prior to joining the study was statistically significant for the risk of hospitalization (2 or more HR = 2.283). Hospitalization and death risks are positively related ( $γ = 0.730$ and $γ = 0.715$ , in renewal and Poisson models, respectively).

Table 3.
Fit results.

Renewal model Poisson model

HR/Estimate 95% CI HR/Estimate 95% CI

Death Age (years) 1.106 (1.069, 1.143) 1.107 (1.07, 1.146)

Female 0.645 (0.326, 1.277) 0.657 (0.33, 1.309)

DLCO 0.961 (0.947, 0.976) 0.960 (0.946, 0.975)

${FEV}_{1}$ 0.982 (0.966, 0.999) 0.982 (0.965, 0.998)

Baseline hazard:

Scale ( $b_{d}$ ) 4487.937 (3297.038, 5678.837) 4313.069 (3195.404, 5430.735)

Shape ( $σ_{d}$ ) 1.723 (1.41, 2.036) 1.748 (1.431, 2.065)

Hospitalizations Age (years) 1.050 (1.03, 1.071) 1.056 (1.035, 1.078)

Female 1.387 (0.934, 2.059) 1.406 (0.923, 2.141)

DLCO 0.985 (0.976, 0.994) 0.983 (0.973, 0.992)

${FEV}_{1}$ 0.962 (0.951, 0.973) 0.959 (0.947, 0.971)

Prev. hosp $=$ 1 1.750 (1.215, 2.523) 1.859 (1.264, 2.733)

Prev. hosp $\geq$ 2 2.283 (1.547, 3.368) 2.632 (1.745, 3.97)

Baseline hazard:

Scale ( $b_{r}$ ) 3520.435 (2485.132, 4555.739) 2651.037 (2055.216, 3246.859)

Shape ( $σ_{r}$ ) 0.857 (0.789, 0.925) 1.142 (1.042, 1.241)

Frailty $γ$ 0.730 (0.359, 1.101) 0.715 (0.381, 1.049)

Variance ( $θ$ ) 1.268 (0.851, 1.685) 1.567 (1.141, 1.993)

		Renewal model	Poisson model
Death	Age (years)	1.106	(1.069, 1.143)	1.107	(1.07, 1.146)
	Female	0.645	(0.326, 1.277)	0.657	(0.33, 1.309)
	DLCO	0.961	(0.947, 0.976)	0.960	(0.946, 0.975)
	${FEV}_{1}$	0.982	(0.966, 0.999)	0.982	(0.965, 0.998)
	Baseline hazard:
	Scale ( $b_{d}$ )	4487.937	(3297.038, 5678.837)	4313.069	(3195.404, 5430.735)
	Shape ( $σ_{d}$ )	1.723	(1.41, 2.036)	1.748	(1.431, 2.065)
Hospitalizations	Age (years)	1.050	(1.03, 1.071)	1.056	(1.035, 1.078)
	Female	1.387	(0.934, 2.059)	1.406	(0.923, 2.141)
	DLCO	0.985	(0.976, 0.994)	0.983	(0.973, 0.992)
	${FEV}_{1}$	0.962	(0.951, 0.973)	0.959	(0.947, 0.971)
	Prev. hosp $=$ 1	1.750	(1.215, 2.523)	1.859	(1.264, 2.733)
	Prev. hosp $\geq$ 2	2.283	(1.547, 3.368)	2.632	(1.745, 3.97)
	Baseline hazard:
	Scale ( $b_{r}$ )	3520.435	(2485.132, 4555.739)	2651.037	(2055.216, 3246.859)
	Shape ( $σ_{r}$ )	0.857	(0.789, 0.925)	1.142	(1.042, 1.241)
Frailty	$γ$	0.730	(0.359, 1.101)	0.715	(0.381, 1.049)
	Variance ( $θ$ )	1.268	(0.851, 1.685)	1.567	(1.141, 1.993)

Sample: 507 patients, 91 death events, and 473 hospitalization events. Columns: HR = hazard ratio (computed for covariates); Estimate = computed for scale, shape, and frailty parameters; CI = confidence interval. Covariates: Age, female, DLCO = carbon monoxide diffusing capacity in percentage; ${FEV}_{1}$ = forced expiratory volume in one second in percentage; Prev. hosp = hospitalizations during the 2 years prior to the start of follow-up. Age, DLCO, and ${FEV}_{1}$ expressed as deviations from the baseline patient. Baseline patient (median): age = 65, sex = male, DLCO = 61.0, timescale = 55.6, Prev. hosp = 0. Log-likelihoods: $- 4452.081$ (renewal) and $- 4455.676$ (Poisson).

Figure 2 displays predictions of the risk of death given the history of hospitalizations for four different patients. All of them resemble the median patient (65-year-old, male, DLCO of 61.0, ${FEV}_{1}$ of 55.6, and with no hospitalizations prior to follow-up) but experienced a different number of hospitalizations during follow-up: from zero to three (91% of the sample has at most three hospitalizations during follow-up). The figure also shows the “unconditional” risk of death, that is, not conditioning on the hospitalization history.⁷ Predictions were made at the $T = 2$ and $T = 4$ year marks after follow-up began. Each panel displays the risk of death between $T$ and $T + w$ . For instance, the risk of death of the median patient who experiences one hospitalization in the first 2 years of follow-up, as predicted by the renewal model, is given by the green line in panel (a). Its value at $T + w = 5$ is $0.1974$ . This means that the risk of death between the second and fifth year, conditional on surviving up to the second, is 19.74%. This risk increases to 32.01% for patients with the same characteristics but three hospitalizations in the first 2 years of follow-up (Figure 2(a)). That is, conditioning on the number of hospitalizations has an impact on the predicted risk of death, with an increasing number of hospitalizations increasing the risk of death. In addition, we see that the predicted risk is smaller under the renewal model than under the Poisson model (see, e.g., panels (a) and (c) in Figure 2). The difference is remarkable for patients experiencing a large number of hospitalizations: two or three, corresponding to the red and purple lines, respectively. In the Poisson model, hospitalizations are independent events. Therefore, a large number of hospitalizations can only be seen as a sign of a large frailty. In the renewal model, the risk of hospitalizations is not only explained by the frailty variable, but also by the time since the last hospitalization. Then, if a patient just had a hospitalization, it is expected that the risk of another one is high, even if the patient’s frailty is not. This can also be seen in the variance of the frailty variable, which is larger for the Poisson model ( $θ = 1.268$ and $θ = 1.1567$ , for the renewal and Poisson models, respectively, as shown in Table 3).

Figure 2.

Prediction of the risk of death given the hospitalization history at different follow-up times. Results for the median patient (65-year-old, male, DLCO of 61.0, ${FEV}_{1}$ of 55.6, and with no hospitalizations prior to follow-up) differed on the number of hospitalizations (0 to 3). “Unconditional” means not conditioning on the hospitalization history. The shaded area gives the 90% confidence interval for “Unconditional.” (a) Renewal model. Prediction after 2 years of follow-up. (b) Renewal model. Prediction after 4 years of follow-up. (c) Poisson model. Prediction after 2 years of follow-up. (d) Poisson model. Prediction after 4 years of follow-up. DLCO: carbon monoxide diffusing capacity in percentage; ${FEV}_{1}$ : forced expiratory volume in one second in percentage.

The dynamic prediction approach allows us to approximate the HR for death between different patients at different follow-up times. At follow-up time $T$ , the HR between patients with covariates $z$ and $z^{'}$ and histories $h (T)$ and $h^{'} (T)$ is approximated by $P (T, w | z, h (T)) / P (T, w | z^{'}, h^{'} (T))$ , where $w$ is a short window (7 days). Table 4 shows the HRs for the median patient at different follow-up times of $T$ , depending on the number of hospitalizations. The reference is the median patient with one hospitalization. We find that, under both models, the HRs increase as the patients experience more hospitalizations. This contrasts with the results reported in Table 4 in Suissa et al.,³ where HRs for death tend to stabilize around 1.5 after the third hospitalization. We believe that the difference comes from the attenuation bias caused by informative censoring for the Cox model fitted in Suissa et al.³ Our joint modeling approach does not suffer from this bias.

Table 4.

Hazard ratios at different follow-up times for patients with different numbers of hospitalizations. The reference (denominator) has one hospitalization prior to $T$ . Results for the median patient (65-year-old, male, DLCO of 61.0, ${FEV}_{1}$ of 55.6, and with no hospitalizations prior to follow-up).

		Follow-up time in years ( $T$ )
		1	2	4	6	8
2 hosp.	Renewal	1.395	1.394	1.392	1.393	1.397
	Poisson	1.421	1.434	1.445	1.452	1.458
3 hosp.	Renewal	1.740	1.743	1.741	1.743	1.750
	Poisson	1.768	1.807	1.841	1.857	1.870
4 hosp.	Renewal	2.042	2.056	2.060	2.064	2.074
	Poisson	2.046	2.124	2.203	2.232	2.252
5 hosp.	Renewal	2.300	2.337	2.357	2.363	2.377
	Poisson	2.263	2.387	2.532	2.583	2.612

DLCO: carbon monoxide diffusing capacity in percentage; ${FEV}_{1}$ : forced expiratory volume in one second in percentage.

4.1. Dependence on the distribution of hospitalizations

The key result from Section 3.3 is that, in a renewal model for the hospitalization process, the risk of death given hospitalizations depends on the distribution of the latter. The dependence is characterized by the relationship between the processes (parameter $γ$ ) and the shape of the baseline hazard for hospitalizations (which, in the Weibull case, corresponds to the parameter $σ_{r}$ ). The estimates of these parameters are (see Table 3)

\begin{aligned} \hat{γ} = 0.730 and {\hat{σ}}_{r} = 0.857 \end{aligned}

That is, the death and hospitalization processes are positively related, and the baseline hazard for hospitalizations is decreasing in time. Recall that in the renewal model time refers to the time since the start of follow-up for the first hospitalization and to the time since the last hospitalization for the remaining hospitalizations. According to our results, this means that the risk of death is lowest for dispersed hospitalizations (see Section 3.3).

Figure 3 shows the prediction of the risk of death given the hospitalization history for four patients with median covariate values. The patients experienced a different number of hospitalizations (two or three) during follow-up. Moreover, the distribution of the hospitalizations is dispersed (in orange) or concentrated (in green). As can be observed in all the panels, different predicted risks are estimated (under the renewal model) when hospitalizations occur in a dispersed manner, when compared to all of them occurring in a concentrated pattern, with a higher risk of death for those patients with concentrated hospitalizations. In Appendix D, we show how these differences in risk of death between patients with dispersed and concentrated hospitalizations vary across different values of $σ_{r}$ .

Figure 3.

Prediction of the risk of death given the hospitalization history at different follow-up times. Results for the median patient (65-year-old, male, DLCO of 61.0, ${FEV}_{1}$ of 55.6, and with no hospitalizations prior to follow-up) differ in the timing of the hospitalizations. Letting $T$ denote the follow-up time, dispersed hospitalizations occur at points $(T / 3, 2 T / 3)$ and $(T / 4, T / 2, 3 T / 4)$ for the cases of two and three hospitalizations, respectively. Concentrated hospitalizations occur at points $(99 T / 100, T)$ and $(98 T / 100, 99 T / 100, T)$ for the cases of two and three hospitalizations, respectively. For the Poisson model, the distribution of hospitalizations is irrelevant (the number matches each case). (a) Two hospitalizations. Prediction after 2 years of follow-up. (b) Two hospitalizations. Prediction after 4 years of follow-up. (c) Three hospitalizations. Prediction after 2 years of follow-up. (d) Three hospitalizations. Prediction after 4 years of follow-up. DLCO: carbon monoxide diffusing capacity in percentage; ${FEV}_{1}$ : forced expiratory volume in one second in percentage.

The differences in terms of risk of death between concentrated and dispersed hospitalizations are sizeable. Table 5 shows the HRs at different follow-up times for patients with different distributions of hospitalizations. Note that, even after conditioning for clinical and sociodemographic variables, the distribution of the hospitalizations is an important risk factor for death. For instance, at the fourth year, the HR of three concentrated hospitalizations versus three dispersed hospitalizations (HR = 1.050) is equivalent to the HR associated with being 6 months older, holding the remaining clinical characteristics fixed. In addition, for a given number of hospitalizations, the risk of concentrated versus dispersed hospitalizations increases as the follow-up time increases.

Table 5.

Hazard ratios at different follow-up times for patients with different distributions of hospitalizations (renewal model). Fixed the number of hospitalizations, we compute the hazard ratio of concentrated hospitalizations (numerator) over dispersed hospitalizations (denominator). Results for the median patient (65-year-old, male, DLCO of 61.0, ${FEV}_{1}$ of 55.6, and with no hospitalizations prior to follow-up).

	Follow-up time in years ( $T$ )
	1	2	4	6	8
2 hosp.	1.017	1.027	1.041	1.048	1.052
3 hosp.	1.019	1.033	1.050	1.059	1.065
4 hosp.	1.02	1.035	1.056	1.068	1.074
5 hosp.	1.019	1.035	1.06	1.074	1.081

DLCO: carbon monoxide diffusing capacity in percentage; ${FEV}_{1}$ : forced expiratory volume in one second in percentage.

We propose to statistically test whether the distribution of hospitalization determines the shape of the risk of death. The distribution of hospitalizations is irrelevant if $γ = 0$ (no dependence between death and hospitalizations) or if the baseline hazard for hospitalizations $λ_{0}^{r} (t)$ is constant. For a Weibull hazard, the second condition translates into $σ_{r} = 1$ . That is, the null hypothesis of no relevance of the distribution of hospitalizations is $H_{0} : γ = 0 or σ_{r} = 1$ . We construct a Wald test, based on the $Δ$ -method covariance matrix, for the equivalent hypothesis $H_{0} : γ \cdot (σ_{r} - 1) = 0$ versus $H_{1} : γ \cdot (σ_{r} - 1) \neq 0$ . Let $θ$ include all the parameters of the model, with $γ$ and $σ_{r}$ in the first and second position, respectively. Following Shao³⁴ $^{,p.433}$ , the test statistic is

\begin{aligned} \hat{W} \equiv \frac{R (\hat{θ})^{2}}{\nabla R (\hat{θ})^{⊤} \hat{Σ} \nabla R (\hat{θ})} \end{aligned}

where

R (θ) = γ \cdot (σ_{r} - 1)

\nabla R (θ) = (σ_{r} - 1, γ, 0, \dots, 0)^{⊤}

is the gradient of

R

\hat{θ}

is the maximum likelihood estimator of the parameters, and

\hat{Σ}

is an estimator of the asymptotic covariance matrix of

\hat{θ}

. Under

H_{0}

, it holds that

\hat{W} \overset{P}{\to} χ_{1}^{2}

. We get that, for our fit of the renewal model,

\hat{W} = 6.3

(

p

-value of

0.012

). Thus, we reject the null hypothesis of no effect of the distribution of hospitalizations on the risk of death. This test can be extended to other baseline hazards provided one can test for

λ_{0}^{r} (t) = c, \forall t \in [0, \infty)

4.2. Evaluation of prediction accuracy

We evaluate the prediction accuracy of both the renewal and the Poisson models. To that end, we compute the time-dependent Brier score, which measures the square loss between the “survived up to $T + w$ ” indicator and the probability of surviving up to $T + w$ , conditional on being alive at time $T$ .³⁵ We use an inverse probability of censoring weighted estimator to account for censoring.³⁶ In addition, we evaluate the area under the curve (AUC) of the time-dependent receiving operating characteristic (ROC).¹⁰ The AUC measures, for two patients, the probability that the predicted risk of a patient who died before $T + w$ exceeds that of a patient who survived up to $T + w$ . We consider the conditional inverse probability of censoring weighting (CIPCW) estimator in Blanche et al.³⁷. The AUC measures the performance of the predicted risk $P (T, w | z, h (T))$ when used to classify patients as “high death risk” or “low death risk.” To estimate both metrics, we divide the sample into 10 folds and estimate the model 10 times, each time leaving out the patients in one of the folds. To predict the risk of death for a patient, we use the model estimated without the observations in the corresponding fold.⁷

Figure 4 shows the results for the renewal and Poisson models. As a benchmark, we have included a Weibull failure time regression model with the number of hospitalizations $J_{i} (t)$ as a time-dependent covariate (on top of age, sex, DLCO, and ${FEV}_{1}$ ).³⁸ This model resembles the one considered in Suissa et al.,³ but with a parametric baseline hazard. Predictions are made after 1 and 2 years of follow-up ( $T$ ). The prediction horizon is 4 years (58% of our sample is alive and uncensored at that time). Smaller values of the Brier score indicate a higher prediction accuracy. For comparison, the Brier score of the constant predictor $P (T, w | z, h (T)) = 0.5$ is $0.25$ .³⁵ On the other hand, higher values of the AUC indicate a higher discrimination capacity. For comparison, the AUC of a random classifier (independent of the risk of the given patient) is $0.5$ .

Figure 4.

Time-dependent Brier score and AUC for three models: renewal, Poisson, and regression model with number of hospitalizations as time-dependent covariate. Predictions are made in intervals of 4 months, from follow-up times $T = 1$ and $T = 2$ years up to $T + w = 4$ years. (a) Brier score and (b) AUC–ROC. AUC: area under the curve; ROC: receiving operating characteristic.

Figure 4 shows that prediction under the joint-modeling approach outperforms the inclusion of the number of hospitalizations as a time-dependent covariate, both in terms of Brier score and AUC. We see that the renewal and the Poisson models perform similarly. For the dataset at hand, the renewal model has slightly better Brier scores (prediction accuracy), while the Poisson model has slightly better AUC (discrimination capacity). Prediction accuracy decreases with the prediction window $w$ for all methods (larger Brier scores). Discrimination capacity mildly increases with the prediction window. Moreover, the larger follow-up period of $T = 2$ leads to a small decrease in accuracy (larger Brier scores) and a small increase in discrimination (larger AUC). Nonetheless, results regarding the evolution of performance over the prediction horizon ( $w$ ) should be taken with care, as the available sample decreases with the prediction horizon.

5. Discussion

We have developed a general dynamic prediction framework for the risk of a terminal event (death) given the recurrent event (hospitalization) history in a joint frailty model. The result is valid for any model for the recurrent event process, allowing for any dependence structure between recurrence times. This contributes to the literature on the dynamic prediction of terminal events given recurrent events. In particular, Mauguen et al.⁷ obtained a prediction result solely for a Poisson process for the recurrent event (assuming independence between the recurrent event times). On the other hand, Liang et al.¹⁰ obtained prediction results complementary to this proposal, relaxing the conditional independence between the terminal and recurrent events. Nevertheless, they assumed independence between recurrence gap times (renewal model).

We have studied how the distribution of hospitalizations throughout the follow-up period determines the risk of death when hospitalizations are modeled following a renewal process. In contrast to the Poisson case, where solely the number of hospitalizations during follow-up matters, we have found that the risk of death depends on the gap times between hospitalizations. The dependence between the risk of death and the distribution of hospitalizations is characterized by two features: whether the two processes are positively or negatively related (parameter $γ$ ) and whether the baseline hazard for hospitalizations is increasing or decreasing. These results link clinical knowledge with features of the hospitalization process and can therefore be helpful in the modeling process. In fact, Suissa et al.³ observed that the time between hospitalizations of patients with COPD decreases over time, and therefore, the most concentrated hospitalizations occur at the end of follow-up. In this work, we contribute a methodological proposal that allows us to demonstrate that those patients with concentrated hospitalizations have a higher risk of death than those with dispersed hospitalizations.

Time-dependent external covariates could be included in the hazard for any of the two processes. Note that time-dependent covariates are usually assumed to be constant in between hospitalizations (see Cook and Lawless³² $^{,p.65}$ ). Under that assumption, one could generalize our results to allow for external time-dependent covariates in the hospitalization process. Regarding time-dependent covariates in the death process, one faces an additional obstacle: the value of the covariates must be known in the time interval $[T, T + w]$ . This information is not generally available at the prediction point $T$ . So, to predict with external time-dependent covariates in the death process, the researcher must be willing to assume that the value of these covariates remains unchanged for the whole time interval $[T, T + w]$ .

We applied our methodology to a dataset of patients with COPD. We found that the risk of death is generally higher when the recurrent hospitalization process is modeled using the Poisson specification. We believe that this is a consequence of hospitalization times being independent in the Poisson model. We see that, in both models, the number of hospitalizations greatly impacts the risk of death, in line with previous results.^3,4,20 In this study, we advance the understanding of COPD’s evolution through several contributions. First, within the renewal modeling framework, we identify the distribution of hospitalizations as a significant risk factor. Second, by employing a joint modeling approach, we effectively mitigate attenuation bias—arising from the fact that patients who survive longer are inherently more likely to accumulate hospital admissions—thus revealing HRs that increase in the number of hospitalizations over the follow-up. Finally, the use of a dynamic prediction framework enables real-time estimation of mortality risk, continuously updated with each new recurrent event throughout the follow-up period.

In our application to COPD data, we considered a Weibull model for the baseline hazards. The hazard of this model is monotone, with a single parameter determining whether it is increasing or decreasing. This eases the characterization of the dependence between the risk of death and the distribution of hospitalizations, as it is reduced to two parameters. Appendix C provides fits for various combinations of the Weibull, Gompertz, and log-logistic hazards. In all the renewal models, the baseline hazard for hospitalization appears to be monotone decreasing.

If the practitioner is uncertain about the baseline hazard being monotonic, we recommend fitting a semiparametric joint model (e.g. with spline approximations of the baseline hazards).³⁰ If the fit for the baseline hazard for hospitalizations is non-monotonic, a general conclusion about whether a concentrated or dispersed history of hospitalizations leads to a higher risk cannot be drawn. Nevertheless, our modeling framework remains applicable and valuable: it allows for individualized risk predictions based on each patient’s characteristics and event history. These personalized predictions can be used to compare the death risk of two patients with differing hospitalization trajectories.

In the COPD study, hospitalizations were modeled as instantaneous events due to their short duration in our dataset (median length of stay: 4 days; interquartile range: 3–7 days). If hospital stays were longer or more variable, an alternative approach could be considered within the renewal framework. It consists of measuring time from the discharge of the previous hospitalization to the admission of the next. The primary distinction between this approach and the one used in the paper lies in the interpretation of the inter-hospitalization period: whether it includes or excludes time spent hospitalized.

There are different avenues for future work. Our results correspond to Setting 1 in Mauguen et al.⁷—we believe that our results could be readily extended to cover Setting 2. Moreover, to increase prediction accuracy, it may be of interest to consider the evolution of different biomarkers alongside the history of hospitalizations. The present paper has solely considered a joint model for the death and hospitalization process. Nevertheless, since biomarker data is often available for chronic patients, it is appealing to include it in the model. This requires proposing a joint model for a terminal, a recurrent, and a longitudinal outcome, as in Król et al.³⁹ A promising line of research is to simultaneously relax conditional independence between the terminal and recurrent events, as in Liang et al.,¹⁰ and allow for dependence of the recurrence times.

Additionally, our study of the renewal model for hospitalizations has disclosed some of its limitations. In the renewal model, the distribution of hospitalizations affects the risk of death through the gap times between hospitalizations. Say, for instance, that we have followed up two patients for one year. The first was hospitalized at months 1 and 2. The second was hospitalized at months 10 and 11. These two patients have the same gap times between hospitalizations: a large gap of 10 months and two short gaps of 1 month. The model thus predicts the same risk of death for both patients, which may not be completely plausible from a clinical perspective.

In view of this result, it may be of interest to study more complex models for the hospitalization process. Indeed, one may think that the hospitalization process should have characteristics from both the Poisson and a renewal model.^8,40 For instance, the hazard for the next hospitalization may be modeled as:

\begin{aligned} λ^{r} (t | h (t)) = \sum_{k = 1}^{K} c_{k} λ_{0}^{r} (t - t_{J (t)}) 1 (τ_{k - 1} \leq t < τ_{k}) \end{aligned}

where

c_{k}

are proportionality constants and

τ_{k}

are thresholds. Alternatively, one could consider that the hazard for hospitalization depends on the number of previous hospitalizations:

\begin{aligned} λ^{r} (t | h (t)) = λ_{0, J (t)}^{r} (t - t_{J (t)}) \end{aligned}

where

(λ_{0 j}^{r})_{j = 0}^{\bar{J}}

is a collection of

\bar{J} + 1

baseline hazard functions. These models incorporate differences between hospitalizations at the beginning and at the end of the follow-up period. Our prediction result in equation (3.2) is valid for arbitrary models of the hospitalization process and is therefore amenable to such extension. The study of these extensions and the characterization of the risk of death given different distributions of hospitalizations is left for future work.

Finally, it is worth mentioning that the software developed to implement the methodological proposal presented in this paper is available at the corresponding author’s GitHub page.

Footnotes

ORCID iDs

Telmo Pérez-Izquierdo

Irantzu Barrio

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financially supported in part by grants from the Departamento de Educación, Política Lingüística y Cultura del Gobierno Vasco IT1456-22, IT1866-26 and BERC 2022-2025 program, by the Ministry of Science and Innovation through BCAM Severo Ochoa accreditation CEX2021-001142-S/MICIN/AEI/10.13039/501100011033, by MICIU/AEI/10.13039/2501100011033 and FEDER, UE, PID2024-156800OB-I00, by the BMTF “Mathematical Modeling Applied to Health” Project and the Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS RD21/0016/0017 and RD21/0016/001) and PI13/02352 of the Instituto de Salud Carlos III (PI13/02352). The COPD study received an unrestricted grant from Laboratorios Menarin.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix A Risk of death given the hospitalization history

The probability of interest is

\begin{aligned} P (T, w | z, h (T)) = P (T_{i}^{d} \leq T + w | T_{i}^{d} \geq T, Z_{i} = z, H_{i} (T) = h (T)) \end{aligned}

for

T, w \geq 0

and hospitalization history

h (T) = (J, (t_{j})_{j = 1}^{J})

. As in Mauguen et al.,⁷ conditioning on the frailty variable one gets

\begin{aligned} P (T, w | z, h (T)) \\ = \int_{0}^{\infty} P (T_{i}^{d} \leq T + w | T_{i}^{d} > T, Z_{i} = z, H_{i} (T) = h (T), u_{i} = u) \cdot g (u | T_{i}^{d} > T, Z_{i} = z, H_{i} (T) = h (T)) d u \end{aligned}

We deal with the terms inside the integral separately.

First, using the conditional independence of the two processes given frailty and covariates:

\begin{aligned} P (T_{i}^{d} \leq T + w | T_{i}^{d} > T, Z_{i}, H_{i} (T), u_{i}) & = \frac{P (T < T_{i}^{d} \leq T + w | Z_{i}, H_{i} (T), u_{i})}{P (T_{i}^{d} > T | Z_{i}, H_{i} (T), u_{i})} \\ = \frac{P (T < T_{i}^{d} \leq T + w | Z_{i}, u_{i})}{P (T_{i}^{d} > T | Z_{i}, u_{i})} \\ = \frac{P (T_{i}^{d} > T | Z_{i}, u_{i}) - P (T_{i}^{d} > T + w | Z_{i}, u_{i})}{P (T_{i}^{d} > T | Z_{i}, u_{i})} \end{aligned}

Furthermore, the survival function for death given covariates and the frailty term takes the following shape:

\begin{aligned} P (T_{i}^{d} > T | Z_{i}, u_{i}) = \exp {- \int_{0}^{T} α^{d} (s | F_{i} (s)) d s} \end{aligned}

This can be obtained by applying Theorem 2.1 in Cook and Lawless³² to the “0 death events before

T

” case, noting that (i) “0 death events before

T

” is equivalent to

T_{i}^{d} > T

and (ii)

F_{i} (0)

is the

σ

-algebra generated by

(Z_{i}, u_{i})

. By the model in (3.1), the above equation becomes (A.1)

\begin{aligned} P (T_{i}^{d} > T | Z_{i}, u_{i}) & = \exp {u_{i}^{γ} \cdot \exp (β_{d}^{'} Z_{i}^{d}) \cdot (- \int_{0}^{T} λ_{0}^{d} (s) d s)} \\ = \exp {u_{i}^{γ} \cdot \exp (β_{d}^{'} Z_{i}^{d}) \cdot \log S_{0}^{d} (T)} \\ = \exp {\log (S_{0}^{d} {(T)}^{C_{d i} u_{i}^{γ}})} = S_{0}^{d} {(T)}^{C_{d i} u_{i}^{γ}} \end{aligned}

where

C_{d i} = \exp (β_{d}^{'} Z_{i}^{d})

. Thus, the first part of the integral is

\begin{aligned} P (T_{i}^{d} \leq T + w | T_{i}^{d} > T, Z_{i}, H_{i} (T), u_{i}) = \frac{S_{0}^{d} {(T)}^{C_{d i} u_{i}^{γ}} - S_{0}^{d} (T + w)^{C_{d i} u_{i}^{γ}}}{S_{0}^{d} {(T)}^{C_{d i} u_{i}^{γ}}} \end{aligned}

To compute the second part of the integral, we use Bayes’ rule to write the conditional density of the frailty variable as (A.2)

\begin{aligned} g (u | T_{i}^{d} & > T, Z_{i} = z, H_{i} (T) = h (T)) \\ = \frac{P (T_{i}^{d} > T, H_{i} (T) = h (T) | Z_{i} = z, u_{i} = u) g (u)}{\int_{0}^{\infty} P (T_{i}^{d} > T, H_{i} (T) = h (T) | Z_{i} = z, u_{i} = u) g (u) d u} \end{aligned}

where we rely on

Z_{i}

and

u_{i}

being independent. Moreover, by the conditional independence of the two processes given frailty and covariates:

\begin{aligned} P (T_{i}^{d} > T, H_{i} (T) = h (T) | Z_{i}, u_{i}) = P (T_{i}^{d} > T | Z_{i}, u_{i}) \cdot P (H_{i} (T) = h (T) | Z_{i}, u_{i}) \end{aligned}

The first term in the product follows from equation (A.1). We apply Theorem 2.1 in Cook and Lawless³² to obtain the second term:

\begin{aligned} P (H_{i} (T) = h (T) | Z_{i}, u_{i}) = (\prod_{j = 1}^{J} α^{r} (t_{j} | F_{i} (t_{j}))) \cdot \exp {- \int_{0}^{T} α^{r} (s | F_{i} (s)) d s} \end{aligned}

Then, by the model in (3.1)

\begin{aligned} \prod_{j = 1}^{J} α^{r} (t_{j} | F_{i} (t_{j})) & = u_{i}^{J} C_{r i}^{J} \prod_{j = 1}^{J} λ^{r} (t_{j} | H_{i} (t_{j})) and \\ \exp {- \int_{0}^{T} α^{r} (s | F_{i} (s)) d s} & = S^{r} (T | h (T))^{C_{r i} u_{i}} \end{aligned}

where

C_{r i} = \exp (β_{r}^{'} Z_{i}^{r})

. The result in the second row follows mutatis mutandis from equation (A.1).

We can now plug in the above results into equation (A.2). A key aspect of the model is that, since $C_{r i}^{J} \prod_{j = 1}^{J} λ^{r} (t_{j} | H_{i} (t_{j}))$ does not depend on $u_{i}$ , it cancels out. Thus

\begin{aligned} g (u | T_{i}^{d} > T, Z_{i} = z, H_{i} (T) = h (T)) = \frac{S_{0}^{d} {(T)}^{C_{d} u^{γ}} \cdot u^{J} \cdot S^{r} (T | h (T))^{C_{r} u} \cdot g (u)}{\int_{0}^{\infty} S_{0}^{d} {(T)}^{C_{d} u^{γ}} \cdot u^{J} \cdot S^{r} (T | h (T))^{C_{r} u} \cdot g (u) d u} \end{aligned}

We conclude by putting all the results together to obtain equation (3.2):

\begin{aligned} P (T, w | z, h (T)) \\ = \int_{0}^{\infty} P (T_{i}^{d} \leq T + w | T_{i}^{d} > T, Z_{i} = z, H_{i} (T) = h (T), u_{i} = u) \cdot g (u | T_{i}^{d} > T, Z_{i} = z, H_{i} (T) = h (T)) d u \\ = \frac{\int_{0}^{\infty} [S_{0}^{d} {(T)}^{C_{d} u^{γ}} - S_{0}^{d} (T + w)^{C_{d} u^{γ}}] \cdot u^{J} \cdot S^{r} (T | h (T))^{C_{r} u} \cdot g (u) d u}{\int_{0}^{\infty} S_{0}^{d} {(T)}^{C_{d} u^{γ}} \cdot u^{J} \cdot S^{r} (T | h (T))^{C_{r} u} \cdot g (u) d u} \end{aligned}

References

Liu

Wolfe

Huang

. Shared frailty models for recurrent events and a terminal event. Biometrics 2004; 60: 747–756.

González

Fernandez

Moreno

, et al. Sex differences in hospital readmission among colorectal cancer patients. J Epidemiol Commun Health 2005; 59: 506–511.

Suissa

Dell’Aniello

Ernst

. Long-term natural history of chronic obstructive pulmonary disease: Severe exacerbations and mortality. Thorax 2012; 67: 957–963.

Soler-Cataluña

Martínez-García

MÁ

Sánchez

, et al. Severe acute exacerbations and mortality in patients with chronic obstructive pulmonary disease. Thorax 2005; 60: 925–931.

Huang

Wang

. Joint modeling and estimation for recurrent event processes and failure time data. J Am Stat Assoc 2004; 99: 1153–1165.

Serra-Picamal

Roman

Escarrabill

, et al. Hospitalizations due to exacerbations of COPD: A big data perspective. Respir Med 2018; 145: 219–225.

Mauguen

Rachet

Mathoulin-Pélissier

, et al. Dynamic prediction of risk of death using history of cancer recurrences in joint frailty models. Stat Med 2013; 32: 5366–5380.

Duchateau

Janssen

Kezic

, et al. Evolution of recurrent asthma event rate over time in frailty models. J R Stat Soc Ser C: Appl Stat 2003; 52: 355–363.

Emura

Nakatochi

Matsui

, et al. Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: Meta-analysis with a joint model. Stat Methods Med Res 2018; 27: 2842–2858.

10.

Liang

, et al. Tackling dynamic prediction of death in patients with recurrent cardiovascular events. Stat Med 2023; 42: 3487–3507.

11.

Ren

Wang

Luo

. Dynamic prediction using joint models of longitudinal and recurrent event data: A Bayesian perspective. Biostat Epidemiol 2021; 5: 250–266.

12.

Loe

Murray

. Random forest for dynamic risk prediction of recurrent events: A pseudo-observation approach. Biostatistics 2025; 26: kxaf007.

13.

Louzada

Macera

Cancho

. A gap time model based on a multiplicative marginal rate function that accounts for zero-recurrence units. Stat Methods Med Res 2017; 26: 2000–2010.

14.

Emura

Nakatochi

Murotani

, et al. A joint frailty-copula model between tumour progression and death for meta-analysis. Stat Methods Med Res 2017; 26: 2649–2666.

15.

Soriano

Kendrick

Paulson

, et al. Prevalence and attributable health burden of chronic respiratory diseases, 1990–2017: A systematic analysis for the global burden of disease study 2017. The Lancet Respir Med 2020; 8: 585–596.

16.

OECD . Health at a glance 2023: OECD indicators. Paris: OECD Publishing, 2023.

17.

Boers

Barrett

, et al. Global burden of chronic obstructive pulmonary disease through 2050. JAMA Netw Open 2023; 6: e2346598.

18.

Agustí

Celli

Criner

, et al. Global initiative for chronic obstructive lung disease 2025 report: Gold executive summary, 2025. https://goldcopd.org/2025-gold-report/.

19.

Esteban

Quintana

Moraza

, et al. Impact of hospitalisations for exacerbations of COPD on health-related quality of life. Respir Med 2009; 103: 1201–1208.

20.

Esteban

Quintana

Aburto

, et al. Predictors of mortality in patients with stable COPD. J Gen Intern Med 2008; 23: 1829–1834.

21.

Müllerova

Maselli

Locantore

, et al. Hospitalized exacerbations of COPD: Risk factors and outcomes in the eclipse cohort. Chest 2015; 147: 999–1007.

22.

Quintana

Anton-Ladislao

Orive

, et al. Predictors of short-term COPD readmission. Intern Emerg Med 2022; 17: 1481–1490.

23.

Shah

Nwaru

Sheikh

, et al. Development and validation of a multivariable mortality risk prediction model for COPD in primary care. NPJ Prim Care Respir Med 2022; 32: 21.

24.

Aramburu

Arostegui

Moraza

, et al. COPD classification models and mortality prediction capacity. Int J Chron Obstruct Pulmon Dis 2019; 14: 605–613.

25.

Arostegui

Legarreta

Barrio

, et al. A computer application to predict adverse events in the short-term evolution of patients with exacerbation of chronic obstructive pulmonary disease. JMIR Med Inform 2019; 7: e10773.

26.

Zhudenkov

Palmér

Jauhiainen

, et al. Longitudinal

{FEV}_{1}

and exacerbation risk in COPD: Quantifying the association using joint modelling. Int J Chron Obstruct Pulmon Dis 2021; 16: 101–111.

27.

. Lung volumes and forced expiratory flows. Report working party standardization of lung function tests. European community for steel and coal. Official statement of the European Respiratory Society. Eur Respir J 1993; 16: 5–40.

28.

Stanojevic

Graham

Cooper

, et al. Official ERS technical standards: Global lung function initiative reference values for the carbon monoxide transfer factor for Caucasians. Eur Respir J 2020; 56: 1750010.

29.

Esteban

Aguirre

Aramburu

, et al. Influence of physical activity on the prognosis of COPD patients: The HADO.2 score—health, activity, dyspnoea and obstruction. ERJ Open Res 2024; 10: 00488-2023.

30.

Rondeau

Mathoulin-Pelissier

Jacqmin-Gadda

, et al. Joint frailty models for recurring events and death using maximum penalized likelihood estimation: Application on cancer events. Biostatistics 2007; 8: 708–721.

31.

Huang

Liu

. A joint frailty model for survival and gap times between recurrent events. Biometrics 2007; 63: 389–397.

32.

Cook

Lawless

. The statistical analysis of recurrent events. New York: Springer, 2007.

33.

Aalen

Borgan

Gjessing

. Survival and event history analysis: A process point of view. New York: Springer, Science & Business Media, 2008.

34.

Shao

. Mathematical statistics. New York: Springer Science & Business Media, 2003.

35.

Graf

Schmoor

Sauerbrei

, et al. Assessment and comparison of prognostic classification schemes for survival data. Stat Med 1999; 18: 2529–2545.

36.

Gerds

Schumacher

. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometr J 2006; 48: 1029–1040.

37.

Blanche

Dartigues

Jacqmin-Gadda

. Review and comparison of ROC curve estimators for a time-dependent outcome with marker-dependent censoring. Biomet J 2013; 55: 687–704.

38.

Kalbfleisch

Prentice

. The statistical analysis of failure time data. Hoboken, NJ: John Wiley & Sons, 2002.

39.

Król

Tournigand

Michiels

, et al. Multivariate joint frailty model for the analysis of nonlinear tumor kinetics and dynamic predictions of death. Stat Med 2018; 37: 2148–2161.

40.

Cook

. Modeling two-state disease processes with random effects. Lifetime Data Anal 1997; 3: 315–335.

		Renewal model		Poisson model
		HR/Estimate	95% CI	HR/Estimate	95% CI
Death	Age (years)	1.106	(1.069, 1.143)	1.107	(1.07, 1.146)
	Female	0.645	(0.326, 1.277)	0.657	(0.33, 1.309)
	DLCO	0.961	(0.947, 0.976)	0.960	(0.946, 0.975)
	${FEV}_{1}$	0.982	(0.966, 0.999)	0.982	(0.965, 0.998)
	Baseline hazard:
	Scale ( $b_{d}$ )	4487.937	(3297.038, 5678.837)	4313.069	(3195.404, 5430.735)
	Shape ( $σ_{d}$ )	1.723	(1.41, 2.036)	1.748	(1.431, 2.065)
Hospitalizations	Age (years)	1.050	(1.03, 1.071)	1.056	(1.035, 1.078)
	Female	1.387	(0.934, 2.059)	1.406	(0.923, 2.141)
	DLCO	0.985	(0.976, 0.994)	0.983	(0.973, 0.992)
	${FEV}_{1}$	0.962	(0.951, 0.973)	0.959	(0.947, 0.971)
	Prev. hosp $=$ 1	1.750	(1.215, 2.523)	1.859	(1.264, 2.733)
	Prev. hosp $\geq$ 2	2.283	(1.547, 3.368)	2.632	(1.745, 3.97)
	Baseline hazard:
	Scale ( $b_{r}$ )	3520.435	(2485.132, 4555.739)	2651.037	(2055.216, 3246.859)
	Shape ( $σ_{r}$ )	0.857	(0.789, 0.925)	1.142	(1.042, 1.241)
Frailty	$γ$	0.730	(0.359, 1.101)	0.715	(0.381, 1.049)
	Variance ( $θ$ )	1.268	(0.851, 1.685)	1.567	(1.141, 1.993)

Dynamic prediction of death risk given a renewal hospitalization process

Abstract

Keywords

1. Introduction

2. Motivating study: Risk of death of COPD patients

3.1. Joint frailty model for death and hospitalization

Poisson model for hospitalizations

Renewal model for hospitalizations

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

Appendix A Risk of death given the hospitalization history

References