Abstract
The identification of risk features associated with disease plays a crucial role in biomedical fields. These features are often used to provide evidence for clinical decision-making. However, in the presence of between-center heterogeneity, covariate effects across data centers may exhibit inconsistent directions, making feature selection challenging. In this work, we propose a novel framework to select reproducible risk features whose underlying effects are consistent across different centers. We quantify the feature reproducibility based on the sign-consistency criterion, which provides an acceptable level of heterogeneity in effect sizes and ensures the reasonable similarity of reproducible signals. Compared with the existing feature selection methods, our proposed method effectively protects data privacy and does not rely on the assumption of data homogeneity. Extensive simulations demonstrated that the proposed method has greater power than existing methods do. We apply the proposed approach to analyze data from the China Health and Retirement Study Longitudinal Study (CHARLS) and identify nine important risk factors that show reproducible associations with depression.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
