Dynamic prediction by landmarking with data from cohort subsampling designs

Abstract

Longitudinal data are often available in cohort studies and clinical settings, such as covariates collected at cohort follow-up visits or prescriptions captured in electronic health records. Such longitudinal information, if correlates with the health event of interest, may be incorporated to dynamically predict the probability of a health event with better precision. Landmarking is a popular approach to dynamic prediction. There are well-established methods for landmarking using full cohort data, but collecting data on all cohort members may not be feasible when resource is limited. Instead, one may select a subset of the cohort using subsampling designs, and only collect data on this subset. In this work, we present conditional likelihood and inverse-probability weighted methods for landmarking using data from cohort subsampling designs, and discuss considerations for choosing a particular method. Simulations are conducted to evaluate the applicability of the methods and their predictive performance in different scenarios. Results show that our methods have similar predictive performance to the full cohort analysis but only use small fractions of the full cohort data. We use real nested case-control data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial to illustrate the methods.

Keywords

Case-cohort study cohort subsampling design Cox proportional hazards model dynamic prediction inverse probability weighting landmarking nested case-control study

Get full access to this article

View all access options for this article.

References

Phung

Tin Tin

Elwood

. Prognostic models for breast cancer: a systematic review. BMC Cancer 2019; 19: 1–18.

Klatte

Rossi

Stewart

. Prognostic factors and prognostic models for renal cell carcinoma: a literature review. World J Urol 2018; 36: 1943–1952.

Viganò

Dorgan

Buckingham

, et al. Survival prediction in terminal cancer patients: a systematic review of the medical literature. Palliat Med 2000; 14: 363–374.

Kim

Mannalithara

Heimbach

, et al. MELD 3.0: The model for end-stage liver disease updated for the modern era. Gastroenterology 2021; 161: 1887–1895.

Kamath

Kim

. The model for end-stage liver disease (MELD). Hepatology 2007; 45: 797–805.

Kamath

Wiesner

Malinchoc

, et al. A model to predict survival in patients with end–stage liver disease. Hepatology 2001; 33: 464–470.

Fisher

Lin

. Time-dependent covariates in the Cox proportional-hazards regression model. Annu Rev Public Health 1999; 20: 145–157.

Suresh

Taylor

Spratt

, et al. Comparison of joint modeling and landmarking for dynamic prediction under an illness-death model. Biom J 2017; 59: 1277–1300.

Putter

, et al. Landmarking. In: Klein

Van Houwelingen

Ibrahim

(eds) Handbook of survival analysis. New York: Chapman and Hall/CRC, 2013, pp.441–456.

10.

Tsiatis

Davidian

. Joint modeling of longitudinal and time-to-event data: an overview. Stat Sin 2004; 14: 809–834.

11.

Ibrahim

Chu

Chen

. Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol 2010; 28: 2796–2801.

12.

Lawrence Gould

Boye

Crowther

, et al. Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group. Stat Med 2015; 34: 2181–2195.

13.

Anderson

Cain

Gelber

. Analysis of survival by tumor response. J Clin Oncol 1983; 1: 710–719.

14.

Astor

. A comparison of two approaches to dynamic prediction: joint modeling and landmark modeling. Stat Med 2023; 42: 2101–2115.

15.

Rizopoulos

Molenberghs

Lesaffre

. Dynamic predictions with time-dependent covariates in survival analysis using joint modeling and landmarking. Biom J 2017; 59: 1261–1276.

16.

Wacholder

. Practical considerations in choosing between the case-cohort and nested case-control designs. Epidemiology 1991; 2: 155–158.

17.

Gilbert

. Joint modeling of longitudinal and survival data with the Cox model and two-phase sampling. Lifetime Data Anal 2017; 23: 136–159.

18.

Yoon

Vandal

Rivera-Rodriguez

. Weight calibration in the joint modelling of medical cost and mortality. Stat Methods Med Res 2024; 33: 728–742.

19.

Van Houwelingen

. Dynamic prediction by landmarking in event history analysis. Scandinavian J Statistics 2007; 34: 70–85.

20.

Zheng

Heagerty

. Partly conditional survival models for longitudinal data. Biometrics 2005; 61: 379–391.

21.

Zhu

Huang

. On the landmark survival model for dynamic prediction of event occurrence using longitudinal data. In: Zhao

Chen

(eds) New Front Biostat Bioinform. Switzerland: Springer, 2018, pp.387–401.

22.

Hossain

Khondoker

Initiative

ADN

. Comparison of joint modelling and landmarking approaches for dynamic prediction using bootstrap simulation. Bull Malays Math Sci Soc 2022; 45: 301–314.

23.

Borgan

Samuelsen

. Cohort sampling for time-to-event data: an overview. In: Borgan

Breslow

Chatterjee

, et al. (eds) Handb Statist Methods Case-Control Stud. New York: Chapman & Hall/CRC, 2018, pp.285–301.

24.

Thomas

. Appendum to “methods of cohort analysis: appraisal by application to asbestos mining,” by Liddell, FDK, McDonald, JC and Thomas, DC. J R Statist Soc A (General) 1977; 140: 469–491.

25.

Thomas

. Design issues in case-control studies. In: Borgan

Breslow

Chatterjee

, et al. (eds) Handbook of statistical methods for case-control studies. New York: Chapman and Hall/CRC, 2018, pp.15–37.

26.

Samuelsen

. A psudolikelihood approach to analysis of nested case-control studies. Biometrika 1997; 84: 379–394.

27.

Prentice

. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986; 73: 1–11.

28.

Borgan

Langholz

Samuelsen

, et al. Exposure stratified case-cohort designs. Lifetime Data Anal 2000; 6: 39–58.

29.

Kang

Cai

. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 2009; 96: 887–901.

30.

Kalbfleisch

Lawless

. Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med 1988; 7: 149–160.

31.

Goldstein

Langholz

. Asymptotic theory for nested case-control sampling in the Cox regression model. Ann Statist 1992; 20: 1903–1928.

32.

Langholz

Borgan

. Estimation of absolute risk from nested case-control data. Biometrics 1997; 53: 767–774.

33.

Chen

S-H

. Case-cohort and case-control analysis with Cox’s model. Biometrika 1999; 86: 755–764.

34.

Borgan

Samuelsen

, et al. Nested case-control and case-cohort studies. In: Klein

Van Houwelingen

Ibrahim

(eds) Handbook of survival analysis. Boca Raton: Chapman and Hall/CRC, 2013, pp.343–367.

35.

Shin

Pfeiffer

Graubard

, et al. Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort. Biometrics 2020; 76: 1087–1097.

36.

Støer

Samuelsen

. multipleNCC: inverse probability weighting of nested case-control data. R J 2016; 8: 5.

37.

Chang

Ivanova

Albanes

, et al. Pooling controls from nested case–control studies with the proportional risks model. Biostatistics 2025; 26: kxae032.

38.

Maziarz

Heagerty

Cai

, et al. On longitudinal prediction with time-to-event outcome: comparison of modeling options. Biometrics 2017; 73: 83–93.

39.

Robin

Turck

Hainard

, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 2011; 12: 1–8.

40.

Graf

Schmoor

Sauerbrei

, et al. Assessment and comparison of prognostic classification schemes for survival data. Stat Med 1999; 18: 2529–2545.

41.

Benedetti

. Scoring rules for forecast verification. Mon Weather Rev 2010; 138: 203–211.

42.

Ahn

Peters

Albanes

, et al. Serum vitamin D concentration and prostate cancer risk: a nested case-control study. J National Cancer Inst 2008; 100: 796–804.

43.

Gao

Hudgens

Zou

. Case-Cohort studies with time-dependent covariates and interval-censored outcomeEmerging topics in modeling interval-censored survival data. Switzerland: Springer, 2022, pp.221–234.

44.

Vinogradova

Coupland

Hippisley-Cox

. Exposure to statins and risk of common cancers: a series of nested case-control studies. BMC Cancer 2011; 11: 1–12.

45.

Putter

van Houwelingen

. Landmarking 2.0: bridging the gap between joint models and landmarking. Stat Med 2022; 41: 1901–1917.

46.

Signorelli

Retif

. Benchmarking multi-step methods for the dynamic prediction of survival with numerous longitudinal predictors. arXiv preprint arXiv:240314336v2.

47.

Keogh

Seaman

Barrett

, et al. Dynamic prediction of survival in cystic fibrosis: a landmarking analysis using UK patient registry data. Epidemiology 2019; 30: 29–37.

48.

Gomon

Putter

Fiocco

, et al. Dynamic prediction of survival using multivariate functional principal component analysis: a strict landmarking approach. Stat Methods Med Res 2024; 33: 256–272.

49.

Putter

Spitoni

. Non-parametric estimation of transition probabilities in non-Markov multi-state models: the landmark Aalen–Johansen estimator. Stat Methods Med Res 2018; 27: 2081–2092.

50.

Hoff

Putter

Mehlum

, et al. Landmark estimation of transition probabilities in non-Markov multi-state models with covariates. Lifetime Data Anal 2019; 25: 660–680.

51.

Van Houwelingen

Putter

. Dynamic prediction in clinical survival analysis. Boca Raton: CRC Press, 2011.

52.

Tseng

Liu

. Joint modeling of survival data and longitudinal measurements under nested case-control sampling. Stat Biopharm Res 2009; 1: 415–423.

53.

Baart

Boersma

Rizopoulos

. Joint models for longitudinal and time-to-event data in a case-cohort design. Stat Med 2019; 38: 2269–2281.

54.

Pickett

Suresh

Campbell

, et al. Random survival forests for dynamic predictions of a time-to-event outcome using a longitudinal biomarker. BMC Med Res Methodol 2021; 21: 1–14.

55.

Wang

Xie

Zhao

. Deepsurv landmarking: a deep learning approach for dynamic survival analysis with longitudinal data. J Stat Comput Simul 2025; 95: 186–207.