Abstract
Purpose:
Cardiovascular risk factors (CVRFs) later in life potentiate risk for late cardiovascular disease (CVD) from cardiotoxic treatment among survivors. This study evaluated the association of baseline CVRFs and CVD in the early survivorship period.
Methods:
This analysis included patients ages 0–29 at initial diagnosis and reported in the institutional cancer registry between 2010 and 2017 (n = 1228). Patients who died within 5 years (n = 168), those not seen in the oncology clinic (n = 312), and those with CVD within one year of diagnosis (n = 17) were excluded. CVRFs (hypertension, diabetes, dyslipidemia, and obesity) within 1 year of initial diagnosis were constructed and extracted from the electronic health record based on discrete observations, ICD9/10 codes, and RxNorm codes for antihypertensives.
Results:
Among survivors (n = 731), 10 incident cases (1.4%) of CVD were observed between 1 and 5 years after the initial diagnosis. Public health insurance (p = 0.04) and late effects risk strata (p = 0.01) were positively associated with CVD. Among survivors with public insurance (n = 495), two additional cases of CVD were identified from claims data with an incidence of 2.4%. Survivors from rural areas had a 4.1 times greater risk of CVD compared with survivors from urban areas (95% CI: 1.1–15.3), despite adjustment for late effects risk strata.
Conclusion:
Clinically computable phenotypes for CVRFs among survivors through informatics methods were feasible. Although CVRFs were not associated with CVD in the early survivorship period, survivors from rural areas were more likely to develop CVD.
Introduction
Remarkable progress in the overall survival of children, adolescents, and young adults (CAYAs) with cancer comes at a significant cost with late therapy-associated toxicities, and there is a critical need for risk-based care. Seventy-four percent of survivors developed at least one chronic health condition in adulthood with a 30-year cumulative incidence of 42% for a severe or life-threatening condition or death. 1 Adolescents and young adults (AYA) also face unique challenges after cancer treatment.2,3 Survivorship-focused, evidence-based care is essential to mitigate sequelae, such as heart failure and other cardiovascular diseases (CVD), to optimize the health of survivors and promote health equity. 4 Nevertheless, adherence to guideline recommendations poses a significant challenge, particularly among survivors with public insurance.5–7
Cardiovascular risk factors (CVRFs) potentiate cardiotoxicity from anthracyclines and chest radiation among survivors. CVRFs later in life elevate the risk for CVD in a near multiplicative fashion, with an excess relative risk for heart failure of 44.5 due to the interaction of hypertension and anthracyclines. 8 The inclusion of CVRFs, when added to treatment-related exposures, further refines risk prediction of subsequent heart failure among survivors. 9 For adults with cancer, baseline cardiovascular risk assessment before initiation of cardiotoxic chemotherapy aims to ameliorate cardiovascular complications. 10 As the prevalence of obesity, hypertension, and diabetes increases among CAYAs, consideration of CVRFs before diagnosis or during the early survivorship period is becoming more critical to personalized therapy aiming to reduce CVD risk.11–15 CVRFs during childhood are associated with an increased risk of fatal and nonfatal cardiovascular events in adulthood. 16 Moreover, disparities in CVRF burden by race/ethnicity have been observed among adult survivors of childhood cancer. 17
Previously identified disparities among adolescents and survivors from nonurban areas in survivorship care, such as optimal subspecialty follow-up and receipt of a survivorship care plan, may also influence downstream treatment-related CVD. 18 Vulnerable populations, such as those from rural areas, are already at increased risk for CVRFs and CVD later in life.19–21 Moreover, fragmented health care systems for AYA survivors and those from nonurban areas underscore the role of data standards in surmounting siloed data challenges through interoperability to improve patient care. 22 Recent advances in data science, such as clinically computable phenotypes for hypertension and diabetes, offer strategies to leverage real-world data and accelerate population health research.23,24 Conceptually, a clinically computable phenotype refers to the use of discrete, structured data, such as diagnosis codes and medications, that are interpretable by computer processes to represent a disease concept.
The objectives of this study were to implement a clinical informatics approach to identify CVRFs before or during cancer treatment among CAYAs with cancer and then analyze the impact of CVRFs on the subsequent development of CVD in the early survivorship period. As a secondary aim, survivors were linked to the Oklahoma Health Care Authority (OKHCA) data to evaluate potential inequities among survivors from nonurban areas and ameliorate underdetection bias from institutional data.
Methods
Survivor cohort construction
The institutional cancer registry at the Stephenson Cancer Center at the University of Oklahoma reports all newly diagnosed cases to the National Cancer Database (NCDB).25,26 The cancer registry contained the necessary demographic information (age at diagnosis, gender, and ZIP code to determine rurality). The cohort included children (ages 0–12 years), adolescents (ages 13–18 years), and young adults (ages 19–29 years) evaluated in the academic pediatric oncology or medical oncology clinics and received their first course of treatment at their respective centers (Fig. 1). To reflect the reliability of cancer registry data for these age groups and ensure a longitudinal follow-up of 5 years, 5-year survivors diagnosed between January 1, 2010, and December 31, 2017, were included. This research was submitted to and approved by the University of Oklahoma Health Sciences Review Board (IRB# 14731) on June 15, 2022.

Childhood, adolescent, and young adult cancer survivorship cohort construction.
Disease classification and late effects risk stratification
As part of NCDB standards, the International Classification of Diseases–Oncology, third edition (ICD-O3) was used to group diagnoses into primary malignancy categories based on the International Classification of Childhood Cancer, third edition (ICCC-3). 27 Coding for bone tumors, central nervous system tumors, Hodgkin’s lymphoma, non-Hodgkin’s lymphoma, leukemia, neuroblastoma, retinoblastoma, sarcoma, Wilms tumor, and other categories were previously reported. 28 The cancer registry captures whether patients received chemotherapy, surgery, radiation, or transplants as dichotomous variables. 25 Late effects risk stratification, based on primary diagnosis and dichotomous treatment exposures, was conducted based on the British Childhood Cancer Survivor Study risk groups. 29
Cardiovascular risk factors and cardiovascular disease
The Clinical Research Data Warehouse Team at the University of Oklahoma Health Sciences Center used standard query language to extract key data elements for CVRFs and race/ethnicity from the electronic health record (EHR). The primary CVRFs for this analysis included hypertension, diabetes, obesity, and hyperlipidemia. The Common Terminology Criteria for Adverse Events (CTCAE, v5.0) were used to classify CVRFs. 30 For hypertension, CTCAE Grade ≥2 was defined as a diagnosis consistent with hypertension and an outpatient prescription for an antihypertensive medication (Supplementary Data S1). The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) provides a critical framework for reliable data standards and supports research with real-world data.31,32 For medication data, OMOP CDM utilizes RxNorm codes, and previous research supports the utility of this model to classify antihypertensive medications automatically extracted from the EHR. 33 We leveraged the RxNorm Concept Unique Identifier to ascertain survivors with outpatient prescriptions for antihypertensive medications before diagnosis or within the first year of initial diagnosis. Grade ≥2 diabetes was defined as an ICD-9/10 code consistent with diabetes or HgbA1C ≥6.5% from discrete observational lab data. For obesity, discrete data elements were extracted from the EHR to classify survivors as obese according to CTCAE Grade ≥3 with a body mass index ≥30 (or >95th percentile based on age- and sex-specific distributions) prior to diagnosis or within one year of initial diagnosis. Finally, dyslipidemia was defined based on ICD-9/10 coding. For all CVRFs, the before diagnosis or within 1 year of initial diagnosis was chosen to ensure temporality for subsequent CVD and as an estimate for the on-therapy period before survivorship (Fig. 2).

Temporal relationship between cardiovascular risk factors (CVRFs) at baseline or during treatment and the development of cardiovascular disease (CVD) in the early survivorship period.
Heart failure or cardiomyopathy was the primary CVD outcome for this analysis (Supplementary Data S2).16,34 The date of initial diagnosis was used to landmark the date of diagnosis for CVD. Survivors with CVD before diagnosis or within one year of diagnosis were excluded from the analysis. An incident case of CVD was defined as an ICD-9/10 code consistent with cardiomyopathy or heart failure 1–5 years after the initial cancer diagnosis to reflect the early survivorship period.
OKHCA data
The cancer registry and institutional EHR data were linked to Medicaid records from OKHCA. Medicaid number was used as the primary identifier for linkage and supplemented by other identifiers such as date of birth, and first, and last names (for survivors without a match). The ICD-9/10 codes for CVD, as described above, were used and landmarked by date of cancer and CVD diagnosis to detect incident cases during the early survivorship period.
Statistical analyses
Descriptive statistics including mean, standard deviation, median, and interquartile range were calculated for continuous variables (age at diagnosis). Percentages and counts were calculated for categorical variables (age group, sex, race/ethnicity, rurality, primary diagnosis, late effects risk group, hypertension, diabetes, dyslipidemia, and obesity). The chi-square test was used to examine the association between each predictor and CVD status if all cell counts were >5. Fisher’s exact test was used for cell counts ≤5. Unadjusted risk ratio (RR), RR adjusted for late effects risk strata, and the corresponding 95% confidence intervals (CI) for examining the association between rurality and CVD in the early survivorship period were calculated using a modified Poisson regression model with robust error variance. Manual backward variable selection was used with an alpha threshold of 0.05. Confounding was assessed between predictors if the removal of one characteristic influenced a change of 20% or more in remaining characteristics. Collinearity was assessed with a threshold of 0.70. 35 Missing values were excluded from the analysis, all of which were in the group without a cardiac event (0.3% were missing race/ethnicity, 0.7% were missing rurality, and 3.3% were missing late effects risk stratification due to incomplete exposure documentation). All analyses were performed by using SAS 9.4.
Results
Cardiovascular risk factors and cardiovascular disease among survivors
Between 2010 and 2017, there were 1228 CAYAs with cancer reported to the institutional cancer registry who completed their first course of treatment at the Jimmy Everest Center or the Stephenson Cancer Center. Among those with established oncology-related care (n = 916), an overall 5-year survival of 82% was observed. The analytic cohort excluded those with early documented death (n = 168) and those not seen in an oncology-related clinic (n = 312) (Fig. 1). To establish a temporal relationship between CVRFs and the detection of CVD, survivors with CVD before or during the first year of treatment (n = 17) were also excluded (Fig. 2). Among the analytic survivor cohort, 10 incident cases (1.4%) of CVD were observed between 1 year and 5 years after treatment. Grade ≥2 hypertension was observed in 106 survivors (14.5%), 37 met the criteria for diabetes (5.1%), 8 survivors with dyslipidemia (1.1%), and 226 were obese (30.9%). All 10 of the cardiac events were observed in survivors with OKHCA coverage, while 67% of survivors without an event had OKHCA coverage (p = 0.04). Survivors at high, moderate, and low risk had a cumulative incidence of cardiac events of 5.2%, 1.1%, and 0.4%, respectively; the percent of patients in each risk category differed significantly between survivors with and without CVD (p = 0.01). There were no statistically significant associations between CVRFs and CVD in the early survivorship period (Table 1).
Cardiovascular Risk Factors During Treatment and Cardiovascular Disease in the Early Survivorship Period
Variables collapsed due to small numbers and concern for confidentiality.
Based on the British Childhood Cancer Survivor Study risk groups. 29
CNS, central nervous system; IQR, interquartile range; SD, standard deviation.
OKHCA data analysis
Data linkage of the analytic survivor cohort (n = 731) with OKHCA claims data showed that 67.7% of survivors had Medicaid coverage (n = 495). The inclusion of claims data identified two additional cases of CVD 1–5 years after initial diagnosis that were not captured by institutional data, which yielded a cumulative incidence of 2.4% (n = 12). Among survivors with OKHCA coverage, those from small towns/isolated rural areas accounted for 50% of the incident cases of CVD, despite representing 17.5% of all survivors. Survivors from rural areas had a cumulative CVD incidence of 6.9% compared with 1.9% and 1.3% of survivors from large towns and urban areas, respectively (p = 0.02). Similar to the full cohort, there was a significant association between late effects risk strata and CVD (p = 0.006). Demographics, such as age, gender, and race/ethnicity, as well as CVRFs, were not significantly associated with CVD in the early survivorship period among those with OKHCA coverage (Table 2).
Cardiovascular Risk Factors During Treatment and Cardiovascular Disease in the Early Survivorship Period Among Survivors with Oklahoma Health Care Authority Coverage
Variables collapsed due to small numbers and concern for confidentiality.
Based on the British Childhood Cancer Survivor Study risk groups. 29
Individually significant predictors related to CVD were age, age group, rurality, and late effects risk strata. Age and age group at diagnosis had high collinearity (r = 0.83); thus continuous age was chosen as the preferred indicator for the model. However, age was not significantly related to CVD when included alongside other predictors and was dropped. Patient late effects risk strata were determined to be a confounder and were retained in the final model. Therefore, multivariable modified Poisson regression modeling showed that there was a persistent association between rurality and CVD, as survivors from small town/isolated rural areas had a 4.1 times greater risk (95% confidence interval: 1.1–15.3) of CVD compared with survivors from urban areas after adjustment for late effects risk strata (Table 3).
Modeling for Association Between Rurality and Cardiovascular Disease in the Early Survivorship Period
Adjusted for late effects risk strata.
A comparison of survivors with and without OKHCA coverage yielded several noteworthy differences (Table 4). Although primary diagnosis and late effects risk groups were not associated with OKHCA coverage, rurality was significantly associated with OKHCA coverage, as 78.4% and 75.5% of survivors from small town/isolated rural areas and large towns had OKHCA coverage, respectively, compared with 63.7% of survivors from urban areas (p < 0.01). Moreover, race/ethnicity (p < 0.01), female sex (p = 0.03), and young adult age group at diagnosis (p = 0.02) significantly differed by OKHCA coverage status. Regarding CVRFs, 17% of survivors with coverage had hypertension compared with 9% of those without OKHCA coverage (p < 0.01). For obesity, prevalence was 33% and 26% among survivors with and without OKHCA coverage, respectively (p = 0.03).
Comparison of Survivors with and Without Oklahoma Health Care Authority Coverage
Discussion
In this single institution cohort of CAYA survivors, clinical informatics tools based on discrete data elements from the EHR were leveraged to construct clinically computable phenotypes and evaluate the prevalence of CVRFs prior to diagnosis and during treatment. This represents a feasible approach to identify CVRFs on a population health level for at-risk survivors. No significant associations were observed between CVRFs and CVD in the early survivorship period for this cohort, yet this analysis and methods inform efforts to harness real-world data to drive improvement in survivorship-focused care. Furthermore, the presented analyses identified survivors at high risk for late effects in general, and those with OKHCA coverage were at increased risk of CVD in the early survivorship period. Claims data augmented the detection of cardiac events among survivors with OKHCA coverage and the analysis from this subcohort suggested that those from rural areas were at increased risk of CVD even after adjustment for late effects risk strata. Rural–urban differences, particularly inequities in cardiovascular health, in the general population underscore the need to prevent CVD, particularly for CAYA survivors at risk for late morbidity and mortality.
The disproportionate burden of CVRFs and CVD in rural areas in the United States is well documented. In 2020, the American Heart Association released a call to action to reduce longstanding inequities in CVD among rural populations with a focus on individual factors, social determinants of health, and health delivery systems. Indeed, adults in rural areas demonstrate a higher risk of mortality from heart failure, at an individual and a community level, from population-based studies throughout the United States.20,36–38 The evidence of recent progress on closing the rural–urban gap is mixed, and the persistence of these geographic disparities in the general population should inform health care delivery interventions for survivors.21,39 Stratification by OKHCA coverage, particularly with the highlighted differences in race/ethnicity, rurality, hypertension, and obesity, further controls for these potential confounders and helps characterize this vulnerable population. The observed increased risk of CVD among survivors of CAYA cancer from rural areas in Oklahoma with public insurance suggests that, even as soon as 1–5 years after the initial diagnosis, there is an opportunity to act and mitigate risk. Guidelines and evidence-based interventions to target modifiable CVRFs are critical to optimize screening and promote cardiovascular health equity.40–43
Data science and the development of clinical informatics tools have the potential to catalyze improvements in health services research, guide population health management, and drive systems-level changes to promote equity for all survivors of CAYA. The presented methodology, derived from data standards such as RxNorm’s RxCUI codes for antihypertensive medications, and the novel creation of clinically computable phenotypes support the feasibility of such tools to characterize modifiable risk factors among survivors at a population health level. 33 The analyses of the Oklahoma cohort failed to identify significant associations between CVRFs and CVD, which may reflect limitations in this cohort or perhaps suggest further refinement of clinically meaningful phenotypes to predict CVD is needed. Nevertheless, data standards are foundational to ensure the interoperability of key information between health systems, both from a research and clinical operations viewpoint. 44 Moreover, the Childhood Cancer Data Initiative seeks to address the fragmented data ecosystem and has made progress toward an infrastructure to facilitate data sharing to learn from every child, adolescent, and young adult with cancer. 45 More than a decade after the Health Information Technology for Economic and Clinical Health Act, lessons across the health care field in various specialties and domains offer insights to adapt evidence-based technologies for oncology and survivorship-focused care.46,47 Reproducible clinically computable phenotypes for CVRFs, such as those presented herein, allow for both scalability with real-world data to drive future research and direct applicability for population health management (e.g., targeting survivors at high risk for CVD to ensure adherence to echocardiogram guideline adherence or engagement with healthy lifestyle interventions). Leveraging this population-based, informatics-supported framework also guides survivorship clinic leaders in the implementation of effective programs to benefit survivors most at risk for CVD.
The observations and analyses from this CAYA survivor cohort require contextualization for potential limitations. First, this cohort represented a single institution. While the majority of children in the state are treated at Oklahoma Children’s Hospital, young adults may have received treatment at community-based oncology centers, and there is one other site in Oklahoma that cares for children with cancer. Therefore, the data may not be representative of the state of Oklahoma or generalizable to other regions. Data linkage with claims data uncovered rural–urban differences in CVD, which likely reflects detection bias from institutional data as the absence of diagnosis records does not necessarily mean the absence of disease. 48 Alternatively, the observed differences may only exist in the Medicaid population.
Underdetection of CVRFs, such as dyslipidemia or diabetes, is also possible if they are not routinely assessed or documented from EHR-based data. The lack of robust historical data before 2009 and moderate cohort size may have contributed to insufficient power to detect potential associations between CVRFs and CVD. Additionally, in this cohort, acute cardiotoxicity was observed, and events within a year of diagnosis were excluded from analysis, as assessment of baseline CVRFs before diagnosis was likely incomplete and would have muddled the temporal relationship. The lack of granularity of treatment-level data, such as cumulative anthracycline and chest radiation exposures used in more sophisticated CVD risk prediction, and potential differential completion of treatment among rural and young adult cohorts highlights the need for improved data quality to enhance population-based survivorship analyses.9,49–52 Indeed, tools such as Passport for Care offer opportunities to leverage treatment-specific data elements to guide long-term follow-up based on the Children’s Oncology Group guidelines.49,53
The long latency period for heart failure, specifically, poses a significant challenge to capture enough events to facilitate real-world evidence for the association between baseline CVRFs and subsequent CVD in the early survivorship period.54–56 One approach to circumvent this long latency period is to identify early markers of cardiac dysfunction, such as echocardiogram parameters and cardiac biomarkers, which are useful predictors of subsequent CVD risk.57–59 Previously developed and validated natural language processing (NLP) algorithms, such as EchoExtractor, serve as an example of open-source informatics to automatically extract echocardiogram parameters. 60 Left ventricular ejection fraction was the most commonly extracted echocardiogram measurement and the system has subsequently provided key data for population health studies on cardiac function, including the scalability of this system at multiple hospital sites.61–63 The sole reliance on ICD-9/10 coding, while based on methods from large multi-institutional cohorts, may also lead to misclassification of cardiac events, which could be amenable to more precise measurements from echocardiograms. 16 Even with the implementation of such tools, underdetection bias may persist if echocardiogram reports are unavailable. Adolescent survivors in Oklahoma were previously identified as approximately five times more likely to receive suboptimal guideline-adherent echocardiogram surveillance. 49
In conclusion, clinical informatics tools to integrate data from various sources for cohort construction and apply data standards to characterize CVRFs highlight opportunities to leverage data to improve survivorship-focused care for CAYAs impacted by cancer. Survivors from rural areas may be at increased risk for CVD, even in the early survivorship period. Modifiable CVRFs at baseline and during treatment merit additional investigation to determine their impact on later CVD for survivors. This study provides a framework to adapt clinical informatics-based approaches for CAYA survivors to promote interoperability based on data standards, facilitate interinstitutional collaborations to detect relevant predispositions to CVD, and, ultimately, improve care for equitable outcomes among all survivors.
Authors’ Contributions
Each person listed as an author is aware of the content of the article and has participated in the study to a significant extent. D.H.N., A.J., A.B., and D.B. developed the concept and drafted the article. D.H.N., S.C., A.B., and W.B. contributed to the data preparation, modeling, and prepared tables and figures. All other authors provided guidance on the methodology, reviewed the article, and provided critical revisions.
Footnotes
Author Disclosure Statement
The authors declare no competing interests.
Funding Information
D.N., A.J., and T.R. were supported by a Team Science Grant from the Presbyterian Health Foundation. D.N. received additional funding from the Oklahoma Shared Clinical and Translational Resources Pilot Award. A.J. was supported through a grant from the National Institute for Minority Health and Health Disparities (U19MD020537). S.C. was partially supported by the Oklahoma Shared Clinical and Translational Resources (U54GM104938) with an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Consent to Participate
The IRB determined that the study met the criteria for a waiver of informed consent and was approved to be conducted without obtaining consent.
Availability of Data and Materials
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Supplemental Material
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
