Abstract
Cancer comorbidities often reflect the complex pathogenesis of cancers and provide valuable clues to discover the underlying genetic mechanisms of cancers. In this study, we systematically mine and analyze cancer-specific comorbidity from the FDA Adverse Event Reporting System. We stratified 3,354,043 patients based on age and gender, and developed a network-based approach to extract comorbidity patterns from each patient group. We compared the comorbidity patterns among different patient groups and investigated the effect of age and gender on cancer comorbidity patterns. The results demonstrated that the comorbidity relationships between cancers and non-cancer diseases largely depend on age and gender. A few exceptions are depression, anxiety, and metabolic syndrome, whose comorbidity relationships with cancers are relatively stable among all patients. Literature evidences demonstrate that these stable cancer comorbidities reflect the pathogenesis of cancers. We applied our comorbidity mining approach on colorectal cancer and detected its comorbid associations with metabolic syndrome components, diabetes, and osteoporosis. Our results not only confirmed known cancer comorbidities but also generated novel hypotheses, which can illuminate the common pathophysiology between cancers and their co-occurring diseases.
Keywords
Introduction
Disease phenotype relationship often reflects overlapping pathogenesis,1–3 thus has been used to predict genetic origins of diseases4–7 and discover drug treatments.8,9 Disease comorbidity is an important aspect of disease phenotype. The comorbidity patterns often lead to unexpected disease links 10 and offer novel insights to explain genetic mechanisms for diseases.11,12 Specifically, the comorbidity patterns of cancers have impacts on cancer prognosis,13,14 treatment decisions, 15 and cancer mechanism understanding. A few recent researches probed the underlying genetic factors to explain the co-occurrence between cancer and autoimmune diseases,16,17 metabolic diseases, 18 and inflammatory diseases. 19 The common genetic factors between cancer and comorbidity have also been applied to develop cancer treatments. 20
In this study, our goal is systematical mining and analyzing cancer-specific comorbidities. Systematic comorbidity studies have been conducted previously, but not focusing on cancer comorbidities. Rzhetsky et al developed a statistical model to analyze a database of hospital medical records. They identified co-occurrence relationships among 160 diseases and emphasized on psychiatry disorders. 21 Park et al and Hidalgo et al detected comorbidity patterns from the Medicare claims with statistical measures. Their study focused on elderly patients aged 65 years or older.22,23 Roque et al mined disease correlations from the free text in electrical medical records of a psychiatric hospital. 24 Different from existing works, we extracted comorbidity patterns specifically for cancers without restricting ages and genders of the patients. We also investigated the effects of age and gender on cancer comorbidity patterns.
We extracted cancer-specific comorbidities from the FDA Adverse Event Reporting System (FAERS) with a data mining approach. The FAERS database contains records of 3,354,043 patients (male and female at all age levels), 1,138 cancers of different types and stages, and 8,974 non-cancer health problems. These data offer rich resources for the network-based analysis of cancer comorbidity patterns among diverse patient populations. FAERS has been extensively mined for detecting post-market drug safety signals, but its use in mining disease comorbidity patterns has not been explored. We first investigated the demographics of patients and demonstrated that the data are valuable for comorbidity mining. Then, we stratified the patients based on age and gender, and developed an automatic approach to extract comorbidity patterns from each patient group. Different from previous studies, which used statistical approaches to calculate pairwise disease commodity measures, we applied association rule learning to mine comorbidity patterns among multiple diseases. Comparing the comorbidity patterns among different patient groups, we were able to extract population-specific cancer comorbidities and investigate the effect of age and gender on comorbid relationships.
Data and Methods
Data sets
We extracted the patient-disease pairs from the adverse event reports for comorbidity mining. The adverse event reports contain records of 3,354,043 patients. Among all patients, 2,213,399 (66%) and 3,153,795 (94%) have their age and gender information available. Figure 1(a,b) shows the distributions of age and gender. Different from the Medicare claims, which only contain patients of age 65 years or older, the adverse event reports have patients aged from one day to hundreds of years. With both the disease and demographics data for millions of patients, we were able to study the potential effects of age and gender on the change of disease comorbidity patterns. For comorbidity extraction, we stratified patients into five groups based on their ages (Fig. 1a) and two groups based on their genders (Fig. 1b).

(
The adverse event reporting system represents patient diseases by the indications of drugs that patients take. These indication terms include not only disorders, but also treatment procedures, such as surgery; common symptoms, such as pain; and ill-defined events, such as un-evaluable events. We mapped the indication terms to the concept unique identifiers (CUIs) in the Unified Medical Language System (UMLS), combined the synonyms into unique concepts, and extracted the concepts with semantic types of human disorders. From the 10,122 indication terms, we extracted 8,224 disorder concepts, including terms of the 11 semantic types listed in Figure 1(c). Among the disorder concepts, we found 1,138 different cancers, which have the semantic type of neoplastic process (T191).
Extract stratified cancer comorbidities
Using the patient-disease data in each stratified group, we mined cancer comorbidities by the following three steps (Fig. 2). First, we applied an association rule mining algorithm on patient-disease pairs, and mined strong co-occurrence patterns among all possible disease combinations. Then, we constructed a comorbidity network using the resulting patterns. Finally, to extract comorbidities for cancers, we initiated a random walk on the network from a set of interested cancer nodes, and ranked the non-cancer diseases with the probabilities of being reached by the random walk. After repeating the three steps for each patient group, we traced the changes of cancer comorbidities across different age or gender groups. The following subsections describe each step in detail.

Extract cancer comorbidities for each stratified patient group.
Mine comorbidity patterns
Most previous studies used relative risk and ɸ-correlation to mine comorbidity patterns. However, both these measures are intrinsically biased toward rare diseases and exclusively considered pairwise relationships. We applied an association rule mining approach, which flexibly detects strong co-occurrence relationships not only between disease pairs, but also among multiple diseases. Because of the large number of patients and diseases in the data, we implemented the association rule mining with the frequent pattern growth algorithm 25 based on the Weka java package 26 to efficiently search for possible association patterns. This algorithm has been successfully applied in biomedical domain to extract drug adverse events. 27
The result of the algorithm is a list of patterns between two sets of diseases, represented in the form
The frequent pattern growth algorithm requires a few parameters: the minimum support was set to 5, which means at least five patients should have all the diseases in each pattern at the same time; the maximum number of diseases in each pattern was limited to three; and confidence was chosen to measure and rank the patterns. The confidence score of pattern
Construct comorbidity network
For each stratified patient group, we constructed a disease comorbidity network to model the results of association rule mining. Given a pattern
Rank cancer comorbidities
Given a set of any interested cancer nodes as the “seeds,” we applied the random walk with restart algorithm to estimate relevance scores for each node to the seeds. The random walk algorithm takes network structure into account without overemphasizing the connections through highly connected nodes. Assume
Results
We report the results of the three analyses on (1) relationships between cancers and non-cancer diseases; (2) relationships between cancers and diseases from two classes – nervous system diseases and metabolic diseases; and (3) colorectal cancer comorbidities. In each analysis, we show how the comorbidity patterns change with age and gender.
Cancer comorbidity patterns change with age and gender
We first combined all cancers regardless of their types and stages, and ranked non-cancer diseases by their relevance scores (obtained from random walk) within each age group. A higher relevance score indicates a stronger comorbidity association with cancers.
We found that cancers are associated with a broad spectrum of diseases, and cancer comorbidity patterns change with age. Across the five age groups (<20, 20–40, 40–60, 60–80, and >80 years), we extracted 73 cancer comorbidities, and categorized them into 12 different disease classes. Table 1 lists the 12 disease classes, which are sorted by the variations of their average relevance scores across different age groups. Cardiovascular diseases have greatly varying scores, which indicate that age largely affects the comorbidity relationships between cancers and cardiovascular diseases. On the contrary, liver diseases have the most stable score among the 12 disease classes, which indicate that their comorbidity relationships with cancers are relatively independent of patient ages. We also compared the relevance scores and the prevalence of the 12 disease classes among different age groups. Table 1 shows that the scores of cardiovascular diseases are highly correlated with their prevalence. Since the high prevalence of a disease increases the probability of co-occurring with other diseases among the population, the comorbidity relationship between cardiovascular diseases and cancers might be overestimated.
Score variations for 12 disease classes, which cover 73 cancer comorbidities, across all patient groups stratified by age.
Disease classes with non-zero relevance scores in all age groups.
Figure 3 shows the variation trends of cancer comorbidity patterns for six disease classes, which have non-zero relevance scores in all age groups (the disease classes with asterisks in Table 1). Cardiovascular diseases have a stronger association with cancers when patients become elder and the association peaks in the age group 60–80. Respiration disorders occur more frequently among younger cancer patients, particularly in the age group <20. The other disease classes have relatively stable comorbidity associations with cancer when patient ages increase.

The left panel shows the average comorbidity score of six diseases classes.
We repeated the analysis between cancers and non-cancer diseases among the two gender groups. The results show that gender has little impact on most disease classes except for cardiovascular diseases, which are more common among male cancer patients, and digestive system diseases, which have a stronger association with cancers among female (Fig. 4).

Compare the relevance scores for six disease classes between the two gender groups.
stratified comorbidities reflect known cancer pathogenesis and mechanisms
In this section, we study the comorbidity relationships between cancers and individual diseases in two disease classes – nervous system diseases and endocrine, nutritional, and metabolic disease. We choose these disease classes, since they contain the largest number of diseases, and their comorbidity relationships with cancers are less likely to be over-estimated compared with other disease classes.
We extracted 14 nervous system diseases from the top-ranked comorbidities in all age groups. Among them, anxiety, depression, and epilepsy are associated with cancers at most age levels; schizophrenia and bipolar disease tend to co-occur with cancers among patients younger than 60 years (Fig. 5). Gender has little impact on the comorbidity patterns of these diseases. Note that other mental problems, such as Alzheimer's disease and Parkinson's disease, do not have strong associations with cancers in any patient group, although both diseases are common among elderly patients in our data. Previous studies have demonstrated that they are inverse cancer comorbidities.28,29

Comorbidity relationships between nervous system diseases and cancers among five age groups.
Literature evidences support the frequent co-occurrence of cancers with depression,30–32 anxiety,33,34 and epilepsy.35,36 A few studies link their roles as cancer comorbidities with the impaired immune responses37,38: they found that multiple molecular immunological factors are compromised in chronic stress and depression, and these factors later contribute to the development and progression of some types of cancers. On the other hand, cancers also increase the risk of these nervous system disorders. For example, cancer patients who have developed brain metastases have greater risk of epilepsy. 35 For the association between cancers and serious mental illness, such as bipolar disorder, a recent study supports the increasing risk of cancers among bipolar disorder and schizophrenia patients. 39 This study also pointed out that the cancer incidence among patients with mental illness is relevant to their ages.
We repeated the same analysis on 16 endocrine, nutritional, and metabolic disorders. In most age groups, particularly patients older than 20 years, cancers frequently co-occur with metabolic syndrome components, such as hypercholesterolemia and hyperlipidemia; endocrine system diseases, such as hypothyroidism and hypokalemia; and diabetes mellitus (Fig. 6). Their cancer comorbidity patterns are independent of patient gender. Literature evidences show that several factors can explain the observed comorbidity between metabolic disorders and cancers. First, environment factors contribute to the disease comorbidity relationship. Previous studies show that metabolic disorders increase the risk of cancers, and the patients share similar lifestyles, such as high fat dietary and few exercises, with cancer patients.40,41 Second, common molecular mechanisms also play roles in explaining the disease comorbidity relationship. It was also demonstrated that insulin resistance, which contributes to the development of metabolic syndrome and type 2 diabetes, has associations with colon cancer.42,43 In addition, osteoporosis tends to occur among elderly cancer patients (Fig. 6). Researches on the link between osteoporosis and breast cancer show that elderly cancer patients are more likely to have lower estrogens, which has a protective effect on bone, and reduced the risk of bone loss. 44

Comorbidity relationships between endocrine, nutritional, and metabolic disorders and cancers among five patient groups with different age levels.
Study of colorectal cancer comorbidities generated novel hypotheses
Colorectal cancer is deadly, complex, and common around the world. We currently lack the knowledge to completely understand the mechanisms of colorectal cancer. 45 We applied our approach and extracted comorbidities for colorectal cancer. In the random walk with restart algorithm, we selected “colorectal cancer,” “colorectal cancer recurrent,” and “colorectal cancer metastatic” as the seeds. Since no patients younger than 40 years have colorectal cancer in our data, we only show results of the three elderly patient groups.
A total of 44 diseases of 6 classes associate with colorectal cancer across different age groups. These disease classes include the following: digestive system disorders, cardiovascular diseases, inflammatory disorders, metabolic diseases, respiration diseases, and nervous system disorders. We further investigated the metabolic diseases in detail, since this class contains the largest number of diseases as colorectal cancer comorbidities. Figure 7 shows part of the metabolic diseases that are strongly associated with colorectal cancer. Hypercholesterolemia, hypothyroidism, and diabetes mellitus have comorbidity associations with colorectal cancer in all age groups, although the strengths of the associations tend to decrease when ages increase. Gender has little impact on the comorbidity patterns of these three diseases. A large number of literature evidences support that metabolic syndrome and type 2 diabetes are among the risk factors of colorectal cancer.46,47 Researches also have demonstrated that insulin resistance may explain the co-occurrence between colorectal cancer and type 2 diabetes.48,49 In addition, osteoporosis is associated with colorectal cancer among elderly female patients. A recent retrospective study 50 confirmed our result and demonstrated that osteoporosis may increase the risk of colorectal cancer among postmenopausal women. Another study also showed that an osteoporosis oral drug reduced the risk of colorectal cancer. 51 Currently, the molecular basis that contributes to the observed comorbidity association between osteoporosis and colorectal cancer is not yet clear. Studies on the common molecular mechanisms between the two diseases have the potential to discover new knowledge.

Comorbidity relationships between endocrine, nutritional, and metabolic disorders and colorectal cancers among three age groups.
Discussion
In this study, we mined cancer-specific comorbidity from large-scale data in the adverse event reporting system. Our approach flexibly detects comorbidity patterns for one or multiple types of cancers based on network analysis. Comparisons of cancer comorbidities among stratified patient groups show that many comorbidity patterns for cancers depend on patient age and gender.
The resulting comorbidity relationships of our approach can be applied to detect cancer pathogenesis in the future work. Previous phenotype-based systematic gene prioritization approaches4–6 and genome-wide analyses12,52 usually assume that all patients are equal or only stratify patients by races. Our results demonstrate the importance of age and gender for cancer comorbidity, and suggest stratifying patients based on these two factors when incorporating cancer comorbidities in phenotype-driven approaches to identify cancer genetic mechanisms.
We currently detect cancer comorbidities based on disease co-occurrence patterns. These co-occurrence patterns may indicate that cancers and their comorbidities increase the risk of each other in a mutual way. In addition, the comorbidity patterns can be caused not only by common genetic basis between cancers and other diseases, but also by various factors, such as environmental factors, treatment-induced factors, and similar patient lifestyles. Incorporating more comprehensive patient-level data may help refine the disease relationships. For example, we may infer whether a drug treating cancers induce their comorbidities with the time series data that describe if the patients develop the diseases before or after taking the drugs.
Our result may be biased toward the diseases whose drugs have high toxicity. FAERS collects data based on the adverse event reports of drug from medical product manufacturers, health professionals, and the public. Hence, the diseases without drug treatments are not included in the data. In addition, if a drug has high toxicity and causes many adverse events, the disease treated by the drug tends to appear frequently in the database and has a higher chance to co-occur with other diseases. One advantage of the association rule mining approach is that the confidence scores of comorbidity patterns involving frequent diseases were automatically downweighted.
In addition, our study can be enhanced with a method to identify inverse cancer comorbidities, which also provide interesting clues of disease pathogenesis and mechanisms. Recent studies have used the inverse cancer comorbidity to gain insight into central nervous disorders.28,29 They are based on serendipitous epidemiological evidences of inverse comorbidities. A systematic analysis of all inverse cancer comorbidities may offer invaluable opportunities to understand cancers and other complex diseases.
Conclusions
We mined and analyzed cancer comorbidities through large-scale data mining among millions of patients. Our results show that cancers have comorbidity relationships with various kinds of diseases. Literature evidences demonstrated that comorbidity patterns reflect complex cancer pathophysiology and mechanisms. Also, cancer comorbidity patterns change with patient ages and genders. The stratified comorbidity patterns based on age and gender may lead to more reliable discoveries in understanding cancer pathogenesis.
Author Contributions
Conceived and designed the experiments: RX. Analyzed the data: YC. Wrote the first draft of the manuscript: YC. Contributed to the writing of the manuscript: RX. Agree with manuscript results and conclusions: YC, RX. Jointly developed the structure and arguments for the paper: YC, RX. Made critical revisions and approved final version: YC, RX. All authors reviewed and approved of the final manuscript.
