Abstract
Introduction
The incidence of early-onset liver cancer (EOLC) has been increasing in many countries, yet evidence on its etiology remains limited, particularly outside the Asian population. This case-control study explores the comorbidity patterns of EOLC and develops race/ethnicity-specific machine learning (ML) models to predict liver cancer risk.
Methods
We included patients diagnosed with primary liver cancer between ages 18 and 49 from the University of California Health Data Warehouse, matching each patient with five controls. ML classification methods, including decision trees, random forests, logistic regression, XGBoost, and LightGBM, were used to assess liver cancer risk based on demographics and comorbidities. Model performance was evaluated using F1 scores, and SHapley Additive exPlanations (SHAP) was applied to identify the most influential comorbidities within each racial group.
Results
A total of 1574 patients and 7870 controls were identified. Asian and Pacific Islanders (API) had significantly higher rates of Hepatitis B virus (HBV) infection, while Hispanics had higher prevalences of cirrhosis, hypertension, diabetes, and Hepatitis C virus (HCV) infection. Whites showed higher rates of anxiety, asthma, hypothyroidism, and cholangitis. Race/ethnicity-specific models for API (F1 score = 0.77, AUC = 0.90) and Hispanics (F1 score = 0.77, AUC = 0.92) outperformed the model for Whites (F1 score = 0.64, AUC = 0.87) in the validation dataset. The SHAP results indicated that HBV infection was the dominant comorbidity for API, and HCV and metabolic disorders were notable among Hispanics. In contrast, the White population showed a broader and less concentrated comorbidity pattern.
Conclusions
Our study highlights significant racial disparities in comorbidity patterns for early-onset liver cancer, demonstrating the potential of ML models to identify high-risk populations and inform targeted prevention strategies.
Introduction
Primary liver cancer, mainly including hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC), is the sixth most frequently diagnosed cancer worldwide. 1 Liver cancer exhibits distinct incidence rates across different populations, with a range of factors contributing to its development within specific racial groups.2-4 From 1998 to 2015, the incidence rates of liver and intrahepatic bile duct cancer in the U.S. increased, peaking with an annual percent change (APC) of 4.5%. This was followed by a stabilization period, with a marginal APC of 0.3% from 2015 to 2021. 5
Early-onset liver cancer (EOLC) is often defined as liver cancer diagnosed in individuals younger than 50 years of age. According to data from the Surveillance, Epidemiology, and End Results registries, the incidence of early-onset HCC decreased in the United States from 2010 to 2019, whereas the incidence of early-onset ICC increased during the same period. 6 Additionally, the incidence of EOLC has been rising in regions such as East Asia, Australia, Slovakia, and Uganda.7,8 The reasons for these different incidence patterns across different populations are unknown.
Clinical characteristics of EOLC differ from the liver cancer diagnosed at older ages. For example, compared to 80-90% of cirrhosis in all HCC patients, only 12.7-33.3% of young-onset HCC cases had liver cirrhosis.8,9 In addition, young HCC patients had a significantly higher rate of Hepatitis B surface antigen (HBsAg) positivity, better liver function, and a more advanced tumor stage at diagnosis compared with the older group. 9 This indicates a distinct precancerous disease pattern in EOLC. So far, a few studies have identified risk factors and precancerous diseases for EOLC, including male gender, Hepatitis B virus (HBV), smoking, family history, and previous chronic liver disease.10-14 However, the previous studies were primarily conducted among the Asian populations, and research evidence in other races is sparse, leaving gaps in understanding the etiologies of EOLC.10-14 A deeper understanding of comorbidity patterns may help identify risk factors and high-risk populations across different racial groups.
In virtue of the medical record data from the University of California Health Data Warehouse (UCHDW), we have an opportunity to initiate a retrospective case-control study to examine the comorbidity patterns among racial/ethnic groups. The UCHDW is a research data warehouse aggregating electronic health records (EHR) data from 6 UC Health campuses (Davis, San Francisco, Los Angeles, Riverside, Irvine, and San Diego. It contains high-quality clinical information, including diagnoses, lab tests, prescriptions, and more. In addition, advances in analytical methods, especially the development of machine learning (ML) approaches, make it possible to analyze large-scale, real-world EHR data. 15 Different from traditional statistical models, which rely on strict assumptions regarding data distributions and face challenges with missing data, ML techniques are more flexible and better suited to handle complex and incomplete datasets.
Therefore, we conducted this study to leverage the power of both UCHDW and ML to examine the patterns of comorbidities across different races, develop race/ethnicity-specific ML models to predict liver cancer, and identify the most important comorbidities in each racial/ethnic group. The goal is to identify high-risk groups and promote targeted prevention and control of liver cancer among younger populations.
Methods
Study Population
We initiated a matched case-control study based on the UCHDW, which contained de-identified data on over 10 million patients dating back to 2012. 16 The dataset reflects California's diverse population, including substantial representation across various racial and ethnic groups. The UCHDW is encoded according to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), which standardizes data structure and coding terminologies to facilitate data sharing across healthcare institutions. 17 Patients diagnosed with primary liver cancer (SNOMED code 95214007, ICD-O-3 and ICD-10 code C22, and/or ICD-9 code 155), aged 18 to 49 years, were included. We further excluded codes for secondary cancer, metastasis, and hepatoblastoma, the latter having a distinct etiology as the most common childhood liver cancer. Then, we identified standard concept IDs and retrieved relevant descendant codes to capture all patients with liver cancer. In addition, we excluded patients with prior diagnoses of other cancer types to minimize the inclusion of metastatic liver cancer cases. We also removed individuals with missing demographic information (e.g., age, gender, race/ethnicity) from the analysis. To enhance the completeness of the patient cohort, we only included patients with two or more hospital visits in the UCHDW, which provides a more reliable representation of disease prevalence by reducing the likelihood of including patients with incomplete or sporadic data. 18 For each liver cancer case, we identified all eligible controls matched on sex, race, and birth year—demographic factors known to influence liver cancer risk—to ensure balanced representation across racial and ethnic groups. The further eligibility of controls included no history of any cancers, having at least two hospital visits, and the observation period encompassing the diagnosis date of the corresponding cases, which ensured comparable exposure windows. In the pool of eligible controls for each case, we applied a random sampling approach using PySpark window functions with randomized row ordering to select five controls. A 1:5 case-to-control ratio was chosen to maximize the information gained from the limited number of cases, enhance statistical efficiency, and improve model stability. 19 The index date for each case was defined as the first date of diagnosis of primary liver cancer in the UCHDW, and the corresponding index date for the five matched controls was the same as the index date of the matched case.
This study uses a limited data set (LDS) version of the UCHDW that has all patient identifiers removed except for service dates and year of birth. The use of the LDS UCHDW in secure research computing enclaves was approved jointly by the Institutional Review Boards (IRB) of all UC Health campuses as non-human subject research. The reporting of this study conforms to STROBE guidelines. 20
Data Retrieval
Patients were grouped based on self-reported race and ethnicity groups, identified using the following SNOMED codes: non-Hispanic Asian and Pacific Islanders (API) (race: 8515, 8557; ethnicity: 38003564), Hispanic (ethnicity: 38003563), non-Hispanic White (race: 8527; ethnicity: 38003564), and other or unknown races, the latter contained liver cancer cases in non-Hispanic Black, American Indian/Alaska Native, and other racial/ethnic groups and those with unknown races/ethnicities. Further demographic information of year of birth, gender, and socioeconomic status (SES) was identified from the UCHDW as follows. Age was calculated by subtracting the year of birth from the year of the first liver cancer diagnosis (the index date) for each case-control matched group. SES was assessed using the Area Deprivation Index (ADI) scores, which combine various socioeconomic indicators at the census block group level—such as income, education, employment, and housing quality—to quantify neighborhood-level socioeconomic disadvantage. 21 It ranks from least to most deprived and is divided into deciles represented by integers from 1 to 10. 22 Missing ADI scores were imputed using MissForest, an ML algorithm based on random forests for handling missing data. Compared to traditional imputation methods, MissForest can handle both continuous and categorical variables and capture complex relationships and interactions between variables, and it does not rely on assumptions about the underlying data distribution. 23
Comorbidities were defined as preexisting conditions diagnosed before or at the onset of liver cancer, with symptoms or complications of certain diseases excluded, as they are signs or adverse events associated with comorbidities. 24 To ensure adequate statistical power, we included comorbidities with a prevalence greater than 1% among cases or those previously reported as risk factors for liver cancer or liver diseases. For example, Crohn’s disease, one of the inflammatory bowel diseases, and colon polyps are closely associated with several liver and biliary diseases.25-27 Although the prevalence of these conditions was slightly lower than 1%, we included them in this study. Patients were identified as having specific comorbidities if they had corresponding diagnoses (identified by SNOMED or ICD codes), abnormal lab test results, or records of specific medications before or on the index date. Detailed criteria for each comorbidity are described in Table S1. Comorbidities were further classified by subtype of the diseases and whether the condition had begun within or more than one year before the index date. The comorbidity classification categories are provided in Table S2.
Statistical Analysis
Demographic Characteristics of Cases and Controls
Note: ADI, Area Deprivation Index; IQR, interquartile range.
In this study, we leveraged the AutoML modules provided on the Databricks Platform (Databricks, Inc, San Francisco, CA) to train and construct race/ethnicity-specific predictive ML models for the risk of EOLC using information on age, sex, ADI scores, and comorbidities. The dataset was randomly split into training (60%), validation (20%), and test (20%) datasets for model development, tuning, and evaluation, respectively. Classification methods utilized by Databricks AutoML, including decision trees, random forests, logistic regression, XGBoost, and LightGBM, were employed. The effectiveness of the selected models in predicting cancer risk has been demonstrated in previous studies.28,29 To ensure comparability, all models were trained and evaluated using the same predefined set of features. Feature selection and hyperparameter tuning were automatically managed by Databricks AutoML for each model. Databricks facilitated the hyperparameter tuning by integrating distributed optimization libraries like Optuna and Ray Tune with MLflow for tracking, enabling scalable and efficient model selection across clusters. AutoML also addressed class imbalance in the dataset by down-sampling the majority classes and applying class weights when an imbalance was detected. 30 The F1 scores for all classification methods are presented in Table S3, and the method achieving the highest F1 score on the validation dataset was selected for final evaluation and reporting. The F1 score was used as the primary evaluation metric due to its effectiveness in balancing precision and recall, particularly in imbalanced datasets in which each case was matched with five controls. 31 The area under the curve (AUC) values were also reported to evaluate the models’ performance. A summary of the model training and validation process is provided in Table S4. We also used SHapley Additive exPlanations (SHAP) to interpret the machine learning model outputs. SHAP is a unified approach based on Shapley values from cooperative game theory that quantifies the average marginal contribution of each feature across all possible feature combinations. This allows for transparent and consistent interpretation of how each comorbidity influences the predicted liver cancer risk while accounting for complex interactions with other variables. 32 Mean SHAP values were calculated to summarize the overall contribution of each feature to model prediction results using the SHAP package. A summary of the SHAP method, including its mathematical foundation, model compatibility, and application in this study, is provided in Table S5.
Subgroup analysis among patients with HCC diagnosis, the major histological type of primary liver cancer, was further conducted. For comparison purposes, a similar data extraction and analysis process was applied to analyze the comorbidity patterns among late-onset liver cancer (LOLC) patients who were first diagnosed at age 50 years or older, along with five matched controls. All data extractions and analyses were performed on Databricks via Amazon Web Services (AWS, Amazon.com, Inc, Seattle, WA) 13.3 LTS, SQL 3.5.1, and Python 3.10.12.
Results
Among 9,447,655 patients included in the UCHDW dataset between Jan 1, 2012, and August 5, 2024, we identified 2288 patients who were first diagnosed with liver cancer between the ages of 18 and 49 years. Of these, we excluded 605 patients with a diagnosis of another cancer before liver cancer, one patient with unknown gender information, 105 patients with fewer than two visits, and three patients matched with only two controls. Finally, 1574 EOLC patients remained in the analysis, and 7870 matched controls were identified. The process of data collection is shown in Figure S1.
The baseline demographic characteristics among cases and controls are shown in Table 1. The median age of the study population was 43 years (IQR 35-47 years), and 59.1% were male. Among the liver cancer patients, 22.2% were API, 28.3% were Hispanic, 29.6% were White, and 19.9% were of other or unknown races. The median ADI was 5 (IQR 3-7) among cases and 4 (IQR 2-7) among controls. We identified 31 comorbidities in the analyses, including liver diseases (HBV infection, Hepatitis C virus (HCV) infection, cirrhosis, steatosis of the liver, autoimmune liver disease), biliary diseases (gallstone, cholesterolosis of the gallbladder, cholangitis), metabolic disorders (diabetes, hyperlipidemia, hypertension), mental health disorders (anxiety, depressive disorder), gastrointestinal diseases (gastroesophageal reflux disease or peptic ulcer, ulcerative colitis, Crohn’s disease, polyp of the large intestine), renal conditions (chronic kidney disease, kidney stone), substance use disorders (alcohol dependence, nicotine dependence), respiratory or allergic diseases (asthma, obstructive sleep apnea, allergic rhinitis), cardiovascular disease (congenital heart disease, coronary arteriosclerosis), and other health conditions (vitamin D deficiency, hypothyroidism, anemia, Human Immunodeficiency Virus [HIV] infection). Comorbidities included in the LOLC models were different due to the different prevalence of the diseases in the older patients. For example, congenital heart disease, HIV infection, ulcerative colitis, and Crohn’s disease were not included in the LOLC models because of the extremely low prevalence. In contrast, cerebrovascular disease, myocardial infarction, peripheral vascular disease, chronic obstructive pulmonary disease, prostatic hyperplasia, gout, osteoarthritis, osteoporosis, cataract, and diverticular disease of the colon were included in the model for the LOLC.
Top Ten Comorbidities With Highest Prevalence Among Early- and Late-Onset Liver Cancer Cases by Race
Note: OR, odds ratio; HBV, Hepatitis B Virus; HCV, Hepatitis C Virus; GERD, Gastroesophageal Reflux Disease.
aNo controls with HCV infection in the API group.
Early-Onset Liver Cancer and HCC Prediction Model Performance
Note: HCC, Hepatocellular carcinoma; AUC, Area under the curve.
aClassification methods, including decision trees, random forests, logistic regression, XGBoost, and LightGBM, were employed. The method with the highest F1 score on the validation dataset was selected for final evaluation and reporting.
The feature importance plots based on SHAP values reveal the most important comorbidities by race/ethnicity associated with EOLC and LOLC (Figure 1). For API patients, HBV and cirrhosis showed the highest mean SHAP values, with HBV reaching up to 1.0 in younger populations. However, in older populations, the mean SHAP value for HBV was around 0.5, and the importance of other comorbidities—such as cirrhosis, GERD/ulcer, hypertension, and hyperlipidemia—became more pronounced. Vitamin D deficiency and asthma also appear among the important factors in the EOLC, although they have much lower mean SHAP values compared to HBV and cirrhosis. In Hispanic patients, cirrhosis, GERD/ulcer, hypertension, hyperlipidemia, and HCV showed high mean SHAP values in both early and later stages, suggesting that HCV infection and metabolic syndrome-related risks are more prominent in this population. White patients exhibit a more discrete importance of comorbidity pattern, with generally lower mean SHAP values across comorbidities. Apart from cirrhosis, metabolic disease, HCV infection, cholangitis, mental health disorders, and nicotine dependence were also important predictors in young White liver cancer patients. The relatively lower and more evenly spread SHAP values across conditions in White patients suggest a more heterogeneous comorbidity profile, consistent with the model’s lower F1 and AUC score for this group. Race/ethnicity-specific feature importance plots associated with early-onset HCC are displayed in Figure S2. Race/ethnicity-specific feature importance plots based on SHAP values. Panels A and B represent the EOLC and LOLC models for Asian/Pacific Islanders, Panels C and D for Hispanics, Panels E and F for Whites, and Panels G and H for Other/Unknown. Note. EOLC, early-onset liver cancer; LOLC, late-onset liver cancer; HBV, Hepatitis B Virus; GERD, Gastroesophageal Reflux Disease; VD, Vitamin D; ADI, Area Deprivation Index; HCV, Hepatitis C Virus; SHAP, SHapley Additive exPlanations
Discussion
EOLC patients exhibited distinct comorbidity profiles by race/ethnicity groups, with HBV infection as the predominant comorbidity in API patients, HCV infection and metabolic disorder-related comorbidities playing significant roles in Hispanic patients, and a more diverse, less concentrated comorbidity profile in White patients. To our knowledge, this is the first study to comprehensively evaluate the comorbidity patterns of EOLC patients by race/ethnicity, providing further insights into the etiology of EOLC and supporting targeted strategies for liver cancer prevention in young populations.
Racial and ethnic disparities in hepatitis virus infections contribute to varying levels of liver cancer risk among different racial/ethnic groups. HBV infection is the most significant comorbidity among API patients. Although chronic HBV infection rates are generally low in the U.S. (<1%), the increased immigration from HBV-endemic regions, such as East Asia and the Pacific Islands, might have led to the rising prevalence of HBV in this population.34,35 Immigrants from these areas face chronic HBV risks similar to those in their home countries, where hepatitis B surface antigen prevalence exceeds 2%.36-39 Screening and vaccination for HBV in this community is also inadequate. A survey of Asian American primary care providers revealed that 50% did not routinely screen all their Asian patients for HBV. Additionally, over 80% of these providers reported that less than half of their adult Asian patients had received the HBV vaccine. 40 In addition, the stigma of HBV infection can further prevent efforts to improve vaccination coverage and early screening, exacerbating the risk of chronic infection and liver cancer. 41 Etiologically, HBV can integrate near oncogenes, altering gene expression or function and promoting malignant transformation without cirrhosis, which may contribute to the early onset of liver cancer. 42 In contrast, HCV infection is the most prevalent in Hispanics, with a prevalence of 1.5%, which might be associated with higher rates of illicit drug use and limited access to testing and treatment services in this group.43,44 Additionally, socioeconomic disadvantages—such as lower income levels and reduced access to healthcare—impede early diagnosis and treatment of HCV, further contributing to elevated infection rates. 45 We also observed that HCV prevalence is relatively lower in EOLC cases than in LOLC cases. This finding aligns with the National Health and Nutrition Examination Survey (NHANES) data, which indicates that individuals aged 55-64 are 6.4 times more likely to have active HCV infection than those aged 18-40. 46 The lower rate of spontaneous viral clearance among older adults may partly explain this discrepancy. 47 Furthermore, as blood screening for HCV began in 1990, many older individuals may have acquired the virus through medical procedures or intravenous drug use before the implementation of widespread preventive measures. 48
Apart from hepatitis virus infections, non-infectious comorbidities also display distinct racial disparities. Metabolic conditions, such as hypertension and diabetes, are most prevalent among Hispanics, which results from a combination of genetic, lifestyle, and socioeconomic factors. 49 For instance, the R230 C variant in Hispanic individuals has been linked to low High-Density Lipoprotein Cholesterol (HDL-C) levels, while a rare Adiponectin, C1Q And Collagen Domain Containing (ADIPOQ) gene mutation is associated with increased risks of heart disease and insulin resistance. 50 These factors may contribute to the high prevalence of metabolic disorders and the high obesity rate of 43.7-47% among Hispanic adults.51-53 Socioeconomic barriers further exacerbate the status of metabolic disorders, as many Hispanic individuals face challenges such as lack of health insurance, limited English proficiency, and low education or literacy levels.51,54 In addition, communities with lower socioeconomic status often experience reduced access to nutritious food and safe living environments, increasing the risk of developing chronic metabolic diseases. 55 In contrast, mental disorders are more prominent in Whites, particularly among younger individuals. This pattern aligns with findings from the general population, where lifetime prevalence rates of mental disorders were highest among Whites (45.6%), followed by Latinos (38.8%) and Blacks (37.0%).56-60 Additionally, cultural differences and stigma can affect how mental disorders are reported, potentially leading to lower rates in the racial minorities but higher rates in Whites.56,61,62 Moreover, asthma and hypothyroidism are more common among Whites, consistent with their higher prevalence rates of 9.4% and 8.1%, respectively, in this population.63,64 Cholangitis, particularly primary sclerosing cholangitis (PSC)—a significant risk factor for hepatobiliary cancer—also shows a higher prevalence among younger White patients.65-67 Notably, the incidence of PSC has been rising in several countries, which may contribute to the increasing trend of EOLC.68-70 Intriguingly, many of those comorbidities involve autoimmune and inflammatory pathological processes, with their heightened prevalence likely influenced by a combination of genetic susceptibility, socioeconomic factors, and environmental influences. For example, the HLA-Cw*0701 allele is associated with genetic susceptibility to primary sclerosing cholangitis in Whites. 71 Furthermore, previous studies have identified that higher socioeconomic status and education levels of White individuals are independently associated with increased risks of thyroid disease. 64 Collectively, these results underscore the etiological heterogeneity of EOLC and support the need for risk assessment and clinical surveillance strategies that reflect the predominant comorbidity patterns within each racial group. Understanding that liver cancer may develop through different pathways—such as viral, metabolic, autoimmune, or psychosocial—depending on the population context is essential for informing precision prevention efforts.
The use of machine learning and SHAP values enabled us to quantify racial disparities in comorbidity risk profiles. Our findings showed that race- and ethnicity-specific models for API and Hispanic patients outperformed those for White patients. Additionally, the HCC model demonstrated superior performance compared to the general liver cancer models, highlighting the importance of tailored ML approaches that account for specific racial groups and cancer subtypes. Such models may enable more accurate risk assessments and provide insights for targeted prevention efforts. For example, despite the global implementation of universal HBV vaccination since 2008, HBV remains a dominant risk factor for liver cancer among API patients, particularly in younger populations. 72 This highlights the need for enhancing vaccination coverage and early screening in API communities, as well as continued efforts to identify and treat chronic HBV infections. Validating HBV screening results and ensuring timely follow-up care is also crucial for effectively managing chronic HBV infections and reducing the risk of liver cancer in these populations. 73 For Hispanic communities, targeted interventions aimed at increasing access to HCV screening and treatment, along with preventing metabolic diseases and obesity, are essential. 74 Notably, culturally sensitive interventions have been shown to improve the metabolic health of Hispanic participants, as evidenced by reductions in body mass index (BMI), blood pressure, lipid levels, and hemoglobin A1c. 75 In contrast, the more diverse comorbidity patterns observed in White patients may reflect broader genetic, socioeconomic, and lifestyle diversity, which contributes to a range of conditions impacting early diagnosis. These findings suggest a need for public health interventions that address a broader spectrum of risk factors, extending beyond hepatitis virus infections and metabolic disorders. In particular, psychosocial and immune-mediated risk factors warrant further exploration and targeted prevention efforts.
The performance metrics, including F1 scores and AUC values, highlighted the effectiveness of our machine learning models in predicting EOLC risk across diverse racial and ethnic groups. High F1 scores reflect a balance between precision and sensitivity, while robust AUC values underscore the models’ discriminative power. These findings are consistent with the results from the existing liver cancer prediction studies. For example, a machine learning model combining soft ensembles of random forest, XGBoost, and logistic regression achieved an AUC of 0.872 for predicting HCC risk in chronic hepatitis B patients on antiviral therapy. 76 Another study, using data from 377,065 participants in the NIH-AARP Diet and Health Study, applied a RUSBoosted Trees model and reported an AUC of 0.72 in the training sample and 0.65 in the validation sample for HCC risk prediction. 77 While differences in datasets and study designs limit direct comparisons, our models showed competitive performance and underscored the value of incorporating racial and ethnic factors to develop more equitable, population-specific prediction strategies.
There are several limitations to this study. The UCHDW dataset is not population-based and only captures patient care data generated at the UC Health system. This limits our ability to access the complete medical history of all included patients. To address this limitation, we employed a comprehensive strategy to identify comorbidities, including diagnosis codes, lab test results, and history of specific medications. We also included only those patients who stayed relatively persistent within the UC Health system, defined as having at least two visits recorded in the dataset, to ensure sufficient periods of exposure. Furthermore, the missing rates for behavioral variables (e.g., smoking, alcohol consumption) and BMI were high among young patients, preventing us from assessing the impact of those factors on EOLC prediction. To mitigate this, we included diagnoses of comorbidities such as alcohol and nicotine dependence, as well as metabolic syndrome-related comorbidities, as proxy variables. Additionally, there is potential for bias related to SES and access to healthcare services. Patients with limited access or inadequate insurance coverage may be underrepresented, which could affect the observed prevalence and detection of comorbidities. Although we included the ADI as a proxy for neighborhood-level SES, residual confounding may persist. This limitation should be considered when interpreting the generalizability of our findings.
Conclusions
Collectively, our study underscores the disparity in EOLC risk profiles across racial and ethnic groups and the value of ML in identifying these complex patterns. The results show that HBV infection is the primary comorbidity among API patients, and Hispanic patients are notably affected by HCV and metabolic disorders. In addition, White patients exhibit a broader, less concentrated comorbidity pattern, with mental health disorders and inflammatory conditions also playing important roles. Targeted strategies for those comorbidities are needed to prevent liver cancer in young populations.
Supplemental Material
Supplemental Material - Racial Disparities in Comorbidity Patterns of Early-Onset Liver Cancer: A Machine Learning Analysis
Supplemental Material for Racial Disparities in Comorbidity Patterns of Early-Onset Liver Cancer: A Machine Learning Analysis by Bingya Ma, Kai Zheng, Fa-Chyi Lee, Yunxia Lu in Cancer Control.
Footnotes
Ethical Considerations
This study uses a limited data set (LDS) version of the UC Health Data Warehouse (UCHDW) that has all patient identifiers removed except for service dates and year of birth. The use of the LDS UCHDW in secure research computing enclaves was approved jointly by the Institutional Review Boards (IRB) of all UC Health campuses as non-human subject research. As such, formal ethics approval and informed consent were not required.
Author Contributions
Bingya Ma: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Visualization, Writing - original draft, Writing - review & editing. Kai Zheng: Resources, Writing - review & editing. Fa-Chyi Lee: Writing - review & editing. Yunxia Lu: Conceptualization, Funding acquisition, Project administration, Supervision, Methodology, Writing - review & editing.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Bingya Ma was funded in part by the Cancer Epidemiology Education in Special Populations (CEESP) Program of the National Cancer Institute, Grant #R25 CA112383 and the UCI Presidential Funding of Dr Oladele Ogunseitan.
Declaration of Conflicting Interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data used in this study contain de-identified patient information and are not publicly available.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
