Abstract
The case-cohort design is an economical approach to estimate the effect of risk factors on the survival outcome when collecting exposure information or covariates on all patients is expensive in a large cohort study. Variables often have group structure such as categorical variables and highly correlated continuous variables. The existing literature for case-cohort data is limited to identifying non-zero variables at individual level only. In this article, we propose a bi-level variable selection method to select non-zero group and within-group variables for case-cohort data when variables have group structure. The proposed method allows the number of variables to diverge as the sample size increases. The asymptotic properties of the estimator including bi-level variable selection consistency and the asymptotic normality are shown. We also conduct simulations to compare our proposed method with some existing method and apply them to the Busselton Health data.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
