Sage Journals: Discover world-class research

Abstract

Background:

Expanded analysis of tumor genomics data enables current and future patients to gain more benefits, such as improving diagnosis, prognosis, and therapeutics.

Methods:

Here, we report tumor genomic data from 1146 cases accompanied by simultaneous expert analysis from patients visiting our oncological clinic. We developed an analytical approach that leverages combined germline and cancer genetics knowledge to evaluate opportunities, challenges, and yield of potentially medically relevant data.

Results:

We identified 499 cases (44%) with variants of interest, defined as either potentially actionable or pathogenic in a germline setting, and that were reported in the original analysis as variants of uncertain significance (VUS). Of the 7405 total unique tumor variants reported, 462 (6.2%) were reported as VUS at the time of diagnosis, yet information from germline analyses identified them as (likely) pathogenic. Notably, we find that a sizable number of these variants (36%–79%) had been reported in heritable disorders and deposited in public databases before the year of tumor testing.

Conclusions:

This finding indicates the need to develop data systems to bridge current gaps in variant annotation and interpretation and to develop more complete digital representations of actionable pathways. We outline our process for achieving such methodologic integration. Sharing genomics data across medical specialties can enable more robust, equitable, and thorough use of patient’s genomics data. This comprehensive analytical approach and the new knowledge derived from its results highlight its multi-specialty value in precision oncology settings.

1. INTRODUCTION

The field of precision oncology has been significantly transformed by advances in high-throughput sequencing and the identification of targetable tumor-specific molecular changes. As a result, tumor genetic testing has become an integral component of diagnosis, prognosis, and therapeutic decision-making (Malone et al., 2020). However, health care professionals face the complex task of translating and interpreting genetic reports obtained from in-house or commercial testing laboratories.

To aid in the interpretation of variants of uncertain significance (VUS), genomic tumor boards play a crucial role. The conclusions drawn by these boards naturally depend on the data aggregated for review. For instance, one tumor board may classify a variant as Tier II, utilizing established guidelines (Li et al., 2017), if they consider a recently published study surveying a rare cancer type, while another tumor board that does not consider the same study may classify it as a VUS. Thus, the determinations of tumor boards depend on the breadth of aggregated data and how it is presented, posing a practical and common data science requirement in genomics labs. Unfortunately, genetic VUS is often ignored and not iteratively reassessed as new knowledge emerges (Deignan et al., 2019). This process leads to a significant potential for overlooking essential genomic findings, resulting in the underutilization of precision medicine (Esteva et al., 2019; Norgeot et al., 2019). Furthermore, while tumor genomics predominantly focuses on identifying indications for therapies and clinical trials, many genetic variants relevant to patients and the precision oncology field fall outside these considerations. Hence, this study aims to evaluate the potential for gaining additional knowledge by reanalyzing tumor variants reported as VUS while incorporating less commonly used genomics data in oncology settings. We assembled a substantial cohort of 1146 tumors from patients evaluated by a multi-disciplinary Precision Oncology clinic. The reanalysis involved developing an enhanced variant interpretation approach, utilizing semantic-based algorithms that link knowledge of pathogenicity derived from heritable cancer syndromes, inherited non-cancer disorders, and mutations recurrent in other tumor types (i.e., “off-tumor” mutations). Our findings reveal that 44% of the cohort had tumor-reported VUS that align with potentially actionable pathways or are considered pathogenic in a germline setting. These results support the notion that integrating information on genetic variants across rare diseases and cancer enhances the classification of variants and leads to new insights relevant to precision oncology.

2. METHODS

2.1. Genomics data annotation

Reanalysis of tumor profiling data obtained from Foundation Medicine for the first 1146 patients seen at Froedtert Hospital (2007 and 2018) was performed. Extensible markup language files, a human-readable and computable format, for each case were obtained with all clinically reported elements identified with specific tags. The data received were reviewed in multiple ways including mapping of patient identifiers to ensure homogeneity and ability to link back to our electronic medical record (EMR)-based data warehouse. Our analytic pipeline leveraged the BioR annotation engine (Kocher et al., 2014) to annotate variants with population allele frequencies from gnomAD, histology and pathology from COSMIC (Catalog of Somatic Mutations in Cancer; Forbes et al., 2011), germline phenotype records from ClinVar (Landrum et al., 2014) and HGMD (Human Gene Mutation Database; Stenson et al., 2012), and disease inheritance patterns from OMIM (Online Mendielian Inheritance in Man; McKusick-Nathans Institute of Genetic Medicine, June 2019). Unless otherwise stated, gene and protein symbols are used according to HGNC (Human Genome Organization, Gene Nomenclature Committee) annotations. The annotated and deidentified dataset of 12,830 sample-variant pairs, from which the analyses of the current work are derived, is included (Supplementary Table S3).

2.2. Ethics statement

EMR data were deidentified for research using our in-house honest broker system that is sanctioned and monitored by the Medical College of Wisconsin IRB protocol PRO00013874. The Institutional Review Board (IRB) has waived individual consent for research studies leveraging this institutional deidentified research resource. This study was conducted with approval of IRB PRO00037572.

2.3. Defining cancer pathways

DNA damage repair (DDR) pathways were used as defined by The Cancer Genome Atlas Analysis Working Group (Knijnenburg et al., 2018). Cell cycle and checkpoint pathways were defined by cell cycle and checkpoint pathways from Reactome (Croft et al., 2011), BioCarta (Nishimura, 2001), and KEGG (Kyoto Encyclopedia of Genes and Genomes; Kanehisa et al., 2012), plus the genesets derived by Fischer (Fischer et al., 2016) and Whitefield (Whitfield et al., 2002), and indexed by MSigDB (Subramanian et al., 2005). We defined Growth Factor Receptor (GFR) genesets as Reactome, BioCarta, and KEGG pathways for EGFR (epidermal growth factor receptor), VEGFR (vascular endothelial), FGFR (fibroblast), PDGFR (platelet-derived), ERBB2 (erythroblastic oncogene B; also known as human epidermal growth factor receptor 2, HER2), and TGFBR (transforming growth factor-β receptor) signaling, indexed by MSigDB.

3. RESULTS

3.1. Tumor-reported VUS are common across the cohort, and a subset exhibits potential of germline origin

We initiated the current study due to the need to understand how to better analyze genomic data from tumor testing. The cohort of 1146 individuals averages 61 ± 13 years of age and evenly split between males and females (Table 1). The most common cancer types were colon (399 patients), lung (304 patients), and pancreas (191 patients), reflective of our catchment and practice (Supplementary Table S4). Significant clinical variables such as stage, grade, and cancer type were often unknown (not provided on the clinical report), emphasizing the need for patient data to be integrated within institutional information management systems. Importantly, we quantify the potential impact of re-annotation in three ways—the numbers of affected genetic variants, patients, and variant reports, which we defined as variant–patient pairs (Table 2). Across our cohort, 8818 distinct tumor genomic variants were clinically reported, with 1413 reported as actionable and 7405 as VUS. Some tumor variants were reported from multiple samples, leading to 2442 total actionable report instances and 8884 VUS report instances. Then, 548 (48%) patients received these actionable report instances and 595 (52%) patients received VUS report instances. Tumor mutation burden was assessed and moderately associated with the number of reported VUS (Spearman’s rho = 0.33 with p < 4.5 × 10⁻¹⁰; Supplementary Fig. S1 and Supplementary Data S1). Thus, this data formed the baseline for benchmarking the current and future methods for genomic interpretation in precision oncology.

Table 1.
Cohort Demographics Among 1146 Cases

Patients (n = 1146)

Mean age, years 60.3 ± 13.5 years

Sex

Male 581 (50.7%)

Female 565 (49.3%)

Grad^a

I 80

II 246

III 205

IV 45

Unknown 425

B-cell 16

T-cell 2

NK 1

No data 126

Pathological stage

0 8

1 53

2 96

3 120

4 182

Not applicable 114

Unknown 300

No data 273

Clinical stage

0 9

1 93

2 93

3 116

4 312

Not applicable 123

Unknown 226

No data 174

Organ of origin

Lung 195

Colon 114

Pancreas 70

Bone 61

Liver 39

Esophagus 37

Prostate 35

Kidney 29

Breast 29

Other 537

Tumor type^b Patients, variants

COAD 399, 1422

LUAD 304, 1316

PDAC 191, 518

LNSCC 146, 481

PAAD 108, 392

EAD 66, 343

RAD 85, 280

LCA 96, 274

Other^b 770, 7997

	Patients (n = 1146)
Mean age, years	60.3 ± 13.5 years
Sex
Male	581 (50.7%)
Female	565 (49.3%)
Grad^a
I	80
II	246
III	205
IV	45
Unknown	425
B-cell	16
T-cell	2
NK	1
No data	126
Pathological stage
0	8
1	53
2	96
3	120
4	182
Not applicable	114
Unknown	300
No data	273
Clinical stage
0	9
1	93
2	93
3	116
4	312
Not applicable	123
Unknown	226
No data	174
Organ of origin
Lung	195
Colon	114
Pancreas	70
Bone	61
Liver	39
Esophagus	37
Prostate	35
Kidney	29
Breast	29
Other	537
Tumor type^b	Patients, variants
COAD	399, 1422
LUAD	304, 1316
PDAC	191, 518
LNSCC	146, 481
PAAD	108, 392
EAD	66, 343
RAD	85, 280
LCA	96, 274
Other^b	770, 7997

Patient numbers are calculated from distinct patient/diagnosis combinations. That is, a patient may be counted for more than one diagnosis if they have multiple samples with different diagnoses.

Unique patients may have multiple tumors. The eight most frequent tumor types are shown; tumor types with fewer samples are grouped into Other.

COAD, colon adenocarcinoma; EAD, esophagus adenocarcinoma; LCA, Liver cholangiocarcinoma; LNSCC, lung non-small cell carcinoma; LUAD, lung adenocarcinoma; PAAD, prostate acinar adenocarcinoma; PDAC, pancreas ductal adenocarcinoma; RAD, rectum adenocarcinoma.

Table 2.

Number of Tumor Variants, Variant Reports, and Patients Affected by Reported Evidence and After Re-Annotation

	(Sample, variant) report counts			Unique tumor variant counts			Unique patient counts^b
	Reported Rx/CT^a	Reported VUS	Total	Reported Rx/CT^a	Reported VUS	Total	Reported Rx/CT^a	Reported VUS	Total
No annotation	2016	8174	10,190	1153	6943	8096	523	96	619
(Likely) Pathogenic or COSMIC	426	710	1136	260	462	722	25	499	524
Total	2442	8884	12,830	1413	7405	8818	548	595	1143

Variants are reported as actionable if they have potential to change therapies (Rx) or clinical trial (CT) recruitment. Otherwise, they are of uncertain significance (VUS).

Patients were counted in the most extreme table cell for which they had variants, with preference top right < bottom left < top left < bottom right.

Because germline disease data are less often used references for oncology laboratories, compared with other somatic resources, we first address the likelihood that a VUS observed in our tumor cohort could be of germline origin. Our purpose is to understand the nature of tumor-reported data and motivate inclusion of germline annotations. We analyzed the distribution of variant allele fraction (VAF) and found 53.6% of variants at >0.3 VAF. Further, previous large-scale studies (Mandelker et al., 2019) demonstrated that some genes have a high rate of germline validation compared with others. Using the subset of genes with high germline validation rates, we identified 538 variants (4.2%) from 426 patients (37%). Additionally, 54% of reported variants have 0.45 < VAF < 0.55. Therefore, we consider these variants as having a high likelihood of being from germline origins. To further test this likelihood, we assumed that germline variants reported in population genetics studies are more likely to be germline when reported in a tumor. Thus, we correlated tumor VAF with the variant minor allele frequency (MAF) in the general population. Pearson’s correlation between VAF and MAF was non-significant (p value = 0.61) while Spearman’s was significant(p = 4.5 × 10⁻³). Interestingly, we observed a strong trend toward a 50% VAF across the range of MAF (Fig. 1A). The ratio was most for variants also previously observed in germline diseases (Fig. 1B). All Mann–Whitney tests were statistically significant, indicating that novel VUS identified during tumor testing are more often of low VAF, supporting somatic origins, while VUS that are previously reported from germline genetic conditions have VAFs compatible with being heterozygous from population genetics databases. Specifically, previously reported variants from publicly available sources were closer to MAF = 0.5 compared with the entire distribution of tumor-reported variants(p < 1 × 10⁻¹⁶), previously reported variants with a disease association from public sources were closer to MAF = 0.5 compared with those with any previous report including those lacking a phenotype (p < 2 × 10⁻¹⁰), and previously reported variants in association with a germline condition were closer to MAF = 0.5 compared with those with any previous disease association(p = 9 × 10⁻³). Thus, the possibility of a subset of these tumor variants being within the germline of patients cannot be ignored, opening a door for future investigations.

FIG. 1.

Most genomic variants identified in tumor genomics testing are suggestive of germline origin. (A) We plot the tumor variant allele fraction (VAF) defined as the fraction of DNA sequencing reads carrying the alternate allele, by the population minor allele frequency (MAF) from gnomAD. While many variants are observed at a low VAF, suggesting that they are somatic in origin, most variants appear to be heterozygous in the tumor cell population, suggesting that they could be germline in origin. That many of these variants are observed in the currently healthy population supports that they may be of germline origin. Color indicates symmetric diverging difference from VAF = 0.5 (shown along the vertical axis). (B) We plot VAF among different subsets of tumor variants using a combined boxplot and violin plot. Width in the violin plot is proportional to data density. Here, we defined a disease association as variants that are reported in COSMIC or are pathogenic according to HGMD or ClinVar. Then, the germline disease subset is according to only HGMD and ClinVar. The tumor VUS that are associated with previously characterized germline disorders are significantly more likely to have VAF ≈ 0.5, compared with all reported variants. All unique pairwise two-sided Mann–Whitney tests are statistically significant with summary p values shown above the plot.

3.2. Knowledge-based variant classifications yields information of relevance to precision oncology

Subsequently, we evaluated which tumor-reported VUS were observed recurrently in cancer or previously classified as (likely)pathogenic germline disease alleles. This analysis highlighted 462 unique variants (6.2%; Table 2) across 499 patients (44% of the cohort; 710 report instances). Their presence in a sample was independent of the year the test was ordered, with a median of 42% of cases per year (Supplementary Fig. S2). We calculated the fraction of these variants that were reported in public databases at the testing time. We found 36%–79% across testing years were reported in the previous calendar year’s database. The 462 variants occur in specific genes more often than others, and the pattern is different when variants were previously observed in cancer versus heritable conditions (Fig. 2). We identified NTRK1, MAP3K1, ARID1A, AR, and MSH6 as the five most frequently altered genes (Fig. 2A). In addition, variants previously reported for germline cancer predisposition most often occurred in MAPK1, MSH6, and APC (Fig. 2B), which is essential information for families with unknown congenital risk. Variants previously reported for non-cancer germline conditions most often occurred in AR, ARID1B, and ARID1A (Fig. 2C), while NTRK1, SPEN, and EPHB1 were the most frequently detected recurrent somatic variants (Fig. 2D). Importantly, variants observed in some genes, such as MAP3K1, ARID1A, and MSH6, were previously observed as altered in both germline and somatic contexts.

FIG. 2.

Variants of uncertain significance (VUS) reported in tumor sequencing frequently alter critical genes. Different genetic resources contribute distinct information about variants, together building a more complete context for describing the tumor genome. We plot the most frequently occurring genes containing variants that were reported to be VUS, but upon reanalysis have known disease associations or have been reported in other cancers. Variants are colored by the class of disease associated with them. The affected genes are displayed as those most common (A) across all disease classes, where the most frequently affected genes contain contributions from all three disease contexts, or specifically observed (B) in inherited disorders with a cancer predisposition, (C) in inherited disorders that are non-cancer, and (D) in somatic alterations.

Among the 462 variants found in our cohort, 387 (83.7%) had been previously reported in cancer (Fig. 3A). These variants are observed across 362 samples (32% of the cohort). Notably, the organ of origin where the variant was previously observed was frequently not the same as our patient’s diagnosis (i.e., “off-tumor”; Fig. 3B). For instance, we find many variants in pancreas, colon, breast, or lung tumors that were previously reported in blood cancers. We find colon adenocarcinoma tumors harbored variants previously reported in renal, lymphoid, glial, and other cancers. We mapped specific diagnoses to a simplified histological ontology and found many such off-tumor scenarios (Fig. 3C). Thus, a conceptual framework combining germline and somatic data can significantly enhance information, which otherwise would remain obscured.

FIG. 3.

Diversity among patient diagnoses and the previously observed disease context of tumor VUS. (A) We show the cascade of tumor variants and affected samples by disease context (germline or cancer) and selected gene pathways, for the 710 variant report instances that are updated in our reanalysis. We first split the variants by whether they were previously observed in a germline or somatic context. If a variant was previously seen in both, we placed it on the germline side. Variants that were previously observed only in a germline context are colored red. Variants previously observed in both germline and somatic diseases are colored purple. We further broke down somatic variants by pathways and germline variants by mode of inheritance and selected diseases (see text for details). (B) We use a heatmap to indicate the relationship between patient diagnosis (y-axis) and the COSMIC histological type associated with their reported genomic variant (x-axis; see Supplementary Table S1). Only the 15 most common diagnoses are shown for brevity; Adenocarcinoma abbreviated as Adc. There is concordance among the most common associations. For example, our AML patients have tumor variants that are known in hematological neoplasms. However, there is a broad spread where variants previously seen in one tumor type are observed in patients with a different tumor type. Cosmic histological types are abbreviated as: RSC, rhabdomyosarcoma; PCH, pheochromocytoma; OSC, osteosarcoma; MFH, malignant fibrous histiocytoma; WT, Wilms tumor; MTA, mesothelioma; HGS, hemangioblastoma; ESP, Ewings sarcoma—PPNT; HN, hematopoetic neoplasm; LN, lymphoid neoplasm; MN, malignant neoplasm. (C) We mapped the specific patient diagnosis and histological terms from previous genetic reports to a simplified terminology and show the result as a heatmap with the number of observations in each pair shown as text.

3.3. Reclassifying variant of uncertain significance in tumors using germline information

Our cohort obtained highly heterogeneous germline testing, with many patients receiving none, a small subset receiving germline exome sequencing, and a larger subset receiving gene panel testing (Supplementary Fig. S3). Many of these tests either do not overlap or share only a subset of genes in common with each other or with the tumor test (Supplementary Fig. S4), leaving origins ambiguous. In total, we had germline results for 115 patients, 84 of which were negative reports (no variants reported), 35 revealed (likely) pathogenic variants, and 48 returned VUS. In this sub-cohort, 29 patients (25.2%) had tumor variants confirmed as germline. Each of these results was assisted by genetic counseling specialists. From this work, we describe three scenarios that epitomize possibilities for how these data were followed: (1) no germline follow-up left family members at potentially unknown risk, (2) the germline allele was confirmed but no verifiable follow-up with family members, or (3) the germline allele was communicated to and tested for across the family resulting in changes to family member’s clinical risk management. Thus, a higher-than-recognized proportion of tumor-reported variants may originate from the germline, underscoring the value of comparing tumor genomics with germline reference data.

3.4. Multi-disciplinary impact of clinical genomics results

We next considered selected case examples to further investigate the potential for added value of more broadly understanding tumor-reported genomic variants. These cases are additionally summarized in Table 3.

Table 3.
Three Selected Clinical Cases with Germline and Somatic Genomic Testing

Case Modality Family history Cancer type Findings Clinical impact

1 Germline NA NA Never tested. Family not evaluated for possibility for increased risk.

1 Somatic Gynecologic malignancies in two first-degree relatives EAC, MSI-intermediate Actionable variants in MSH6, p.E807* and splice site c.3439-2A>G. NA

2 Germline NA NA MHL1 p.S556T, Lynch syndrome History consistent with Lynch. Recommendation for family testing; no verified follow-up.

2 Somatic PDAC in parent COAD, MSI-high Loss of MLH1 and PMS2 by IHC. MHL1 p.S556T VUS. NA

3 Germline NA NA CDH1, p.Arg63, pathogenic; more variants including in TP53* and BLM. History consistent with hereditary diffuse gastric cancer. Family was tested, carriers identified, elected prophylactic surgery.

3 Somatic BRCA in two relatives STAD, MSI-stable CDH1 gene at p.Arg63* VUS. NA

Case	Modality	Family history	Cancer type	Findings	Clinical impact
1	Germline	NA	NA	Never tested.	Family not evaluated for possibility for increased risk.
1	Somatic	Gynecologic malignancies in two first-degree relatives	EAC, MSI-intermediate	Actionable variants in MSH6, p.E807* and splice site c.3439-2A>G.	NA
2	Germline	NA	NA	MHL1 p.S556T, Lynch syndrome	History consistent with Lynch. Recommendation for family testing; no verified follow-up.
2	Somatic	PDAC in parent	COAD, MSI-high	Loss of MLH1 and PMS2 by IHC. MHL1 p.S556T VUS.	NA
3	Germline	NA	NA	CDH1, p.Arg63, pathogenic; more variants including in TP53* and BLM.	History consistent with hereditary diffuse gastric cancer. Family was tested, carriers identified, elected prophylactic surgery.
3	Somatic	BRCA in two relatives	STAD, MSI-stable	CDH1 gene at p.Arg63* VUS.	NA

BRCA, breast carcinoma; COAD, colorectal adenocarcinoma; EAC, esophageal adenocarcinoma; IHC, immunohistochemistry; PDAC, pancreatic ductal adenocarcinoma; STAD, stomach/gastric adenocarcinoma; VUS, variants of uncertain significance.

3.4.1. Case example 1

A male in his 70s was diagnosed with advanced esophageal adenocarcinoma. Reoccurrence occurred after 4 years and pembrolizumab was started. Positron emission tomography (PET) scan after eight cycles demonstrated mild radiographical response and improvement in clinical symptoms. Tumor gene panel profiling was then ordered and identified two actionable variants in the MSH6 gene, p.E807* and splice site c.3439-2A>G. This tumor test also reported microsatellite instability (MSI)-intermediate and high tumor mutational burden (TMB). Germline pathogenic variants in MSH6 are associated with Lynch syndrome; a condition that significantly increases the risk for colorectal, stomach, endometrial, ovarian, and other cancers (Kohlmann and Gruber, 1993). Esophageal cancer is not commonly associated with Lynch syndrome. The patient’s family history was significant for tw0 first degree relatives with gynecological cancers at unknown ages. Without more information on his family history of cancer, this patient did not meet clinical criteria for germline genetic testing.

Our reanalysis revealed that the splice site c.3439-2A>G variant in MSH6 was seen at an allele fraction of 0.48, absent from the gnomAD database, and listed as pathogenic in ClinVar by multiple submitters. Further, the p.E807* variant is absent from the gnomAD database and listed as pathogenic in ClinVar by two submitters. In addition, recent studies found that MSH6 pathogenic variants are more likely to be of germline origin versus somatic (∼60% vs 40%; Meric-Bernstam et al., 2016). It is highly suspicious that this variant is of germline origin and would be recommended for germline genetic testing. Unfortunately, the patient was never referred to genetic counseling for confirmatory germline genetic testing and hence passed away. This case highlights the missed opportunity to identify a hereditary cancer syndrome that would significantly impact medical management recommendations for at-risk family members.

3.4.2. Case example 2

A male in his 60s was diagnosed with advanced colorectal adenocarcinoma and referred to genetic counseling due to his MSI-high tumor with immunohistochemistry staining showing loss of MLH1 and PMS2. Family history was significant for an unknown primary cancer in a parent and lung cancer in family members who were smokers. BRAF V600E and MLH1 promotor hypermethylation were ordered and neither marker was detected. Subsequently, a germline multi-gene hereditary cancer panel was ordered and revealed a pathogenic variant in the MLH1 gene, c.1667G>C (p.S556T), which is consistent with a diagnosis of Lynch syndrome. Following these results, pembrolizumab was initiated, but unfortunately there was disease progression. Subsequently, tumor gene panel profiling was ordered which also reported the S556T variant in MLH1 but as VUS. This tumor test also reported MSI-intermediate and high TMB. This case highlights the potential to miss clinically relevant variants due to reporting practices (no mention of potential germline significance) and use of tumor-only testing, rather than paired samples for somatic testing.

3.4.3. Case example 3

A male in his 70s, diagnosed with metastatic gastric invasive adenocarcinoma, poorly differentiated, with signet ring cell features, had tumor gene panel profiling ordered soon after diagnosis. Most notably, tumor profiling identified a variant in the CDH1 gene at p.Arg63* and reported the sample MSI-stable with low TMB. Germline pathogenic variants in CDH1 are associated with hereditary diffuse gastric cancer (HDGC)—a condition that is associated with an increased risk of diffuse gastric cancer and lobular breast cancer. The average age of HDGC onset is 38 years, with a range of 14–69 years (Kaurah and Huntsman, 1993). Although the age of onset in this patient was older than expected for HDGC, a referral was placed to genetic counseling for evaluation of this CDH1 variant. The patient reported two relatives diagnosed with breast cancer in their 50s, another diagnosed with pancreatic cancer in their 70s, and further three diagnosed with breast cancer at unknown ages. The patient did not know the type of breast cancer (lobular vs ductal) in these family members. A germline multi-gene hereditary cancer panel was ordered and revealed a pathogenic variant in CDH1, c.187C>T (p.Arg63*). Germline genetic testing also found a possibly mosaic, pathogenic variant in the TP53 gene at c.734G>A (p.Gly245Asp) and VUS in APC at c.2307A>T (p.Leu769Phe); in BLM at c.968A>G (p.Lys323Arg); in POLE at c.2773T>C (p.Ser925Pro); and in STK11 at c.464 + 5G>A (Intronic). Each of these variants were previously reported on the tumor profiling report except for the STK11 c.464 + 5G>A (Intronic) variant.

Subsequently, germline genetic testing was recommended for family members. One child tested positive for the familial pathogenic variant in CDH1 and a prophylactic gastrectomy was recommended. This case highlights the importance of recognizing potential germline variants on tumor profiling and the significant clinical impact on families when a hereditary cancer syndrome is identified.

3.5. Pathway mapping bears potential implications for therapeutics and genetic counseling

Because genomic information is used to decide cancer therapy, we investigated how many VUS from the original reports fell within actionable pathways (Fig. 3A). We considered DDR, cell cycle, and key growth factor receptors (GFRs). We found 26 samples affected by DDR pathway alterations. Three variants were specifically within the homology-directed repair, which may suggest therapies that inhibit single-strand repair. Additional variants within this group affected MSH6, MSH2, MSH3, and MLH1, which participate in microsatellite instability. Cell cycle variants were observed across 53 samples. Tyrosine kinase receptor pathways were more commonly altered, represented by 73 variants across 90 samples. They were most frequent in PDGFR (49 samples), FGFR (29), EGFR (16), ERBB2 (13), TGFBR (9), and VEGFR (8) pathways. Genes in these pathways have been the most represented in tissue-agnostic basket trials, making their further assessment highly relevant to research (Offin et al., 2018; Park et al., 2020). Therefore, VUS identified in tumors warrants further consideration for their potential to affect actionable pathways.

3.6. Finding associations with other genetic diseases for VUS reported in tumors

Additionally, among the 462 variants, 122 (26%) had been previously reported, at the time of cancer diagnoses, as pathogenic in a germline setting. These 122 variants are observed in 17% of our cohort. Interestingly, 47 (39%) of these variants were also reported in other cancers (Fig. 3A). The most frequent germline phenotypes associated with these 122 variants (Supplementary Fig. S5A) are risk for gynecological malignancies, colorectal and breast cancers, and Lynch syndrome (Supplementary Fig. S5B). Most (68 variants) follow an autosomal dominant inheritance pattern, a moderate number (15) are X-linked, and few (10) are recessive (Fig. 3A). The most common disease with a dominant inheritance pattern was Lynch syndrome (26). We also observed three Fanconi anemia variants. Additional variants were previously reported for non-cancer diseases, including autism, Coffin-Siris, infertility idiopathic, and aortic aneurism syndrome (Supplementary Fig. S5C). X-linked diseases include androgen insensitivity syndrome, idiopathic infertility, and Reifenstein syndrome (another androgen insensitivity). Recessive conditions included severe combined immunodeficiency disease. Thus, many variants reported as VUS in tumor sequencing are described as germline pathogenic variants conferring risk for heritable cancers and warranting their further consideration for both patients and family members.

3.7. Comparative performance of the current approach with state-of-the-art computational tool for variants annotation

We sought to compare our reanalysis with other state-of-the-art computational tools for assessing genomic variants. We used VIC (He et al., 2019) and CHASMplus (Tokheim and Karchin, 2019) to annotate our cohort’s genomics results. We again split the variant reports into those that were reported as actionable or VUS, and then by our re-annotation results, and finally by each tool’s classification. VIC classifications were partly concordant with re-annotation, yet more conservative (Supplementary Table S2A). Specifically, of the 426 actionable variant reports with re-annotation results, VIC identified 20 as having “strong evidence of clinical significance” and 235 with “potential” evidence. Then, among the 710 VUS reports with re-annotation results, VIC identified none as strong and 73 as potential evidence. All the variants with strong evidence by VIC had a prior germline pathogenic or cancer-associated annotation in our reanalysis. CHASMplus classifications similarly had overlap with re-annotation and were also more conservative (Supplementary Table S2B). Specifically, of the 426 actionable variant reports with re-annotation results, CHASMplus identified 145 as more likely to be driver mutations. Then, among the 710 VUS reports with re-annotation results, CHASMplus identified 12 as more likely to be driver mutations. Finally, we compared the numbers of unique protein coding variants prioritized by all three methods. We found that 196 (22%) are shared by at least two approaches, while 909 are prioritized by only one (Supplementary Fig. S6). We interpret this finding as the two types of analysis, somatic prioritization algorithms and enhanced annotation, complementing and enhancing each other. Therefore, current state-of-the-art tools leave most variants as VUS, and different approaches are not highly concordant.

4. DISCUSSION

The current article contributes to the field of genomics by proposing a broader bioinformatic analytical approach to reanalysis, which maximizes the yield from tumor genetic testing results in precision oncology. This multi-tier approach involves several key components: (1) the reclassification and annotation of tumor-derived variants for their potential role in cancer, (2) reclassification of VUS in tumors using germline information, (3) exploration of pathway associations with implications for therapeutics and genetic counseling, and (4) investigation of the association of other genetic diseases with tumor VUS. The data obtained from this approach emphasize the importance of untapped information in tumor genomics testing results, which can be effectively extracted through an enhanced annotation approach. Genomic data can provide valuable insights for patient care and family planning beyond the initial purpose of the test. Therefore, we argue that an enhanced annotation approach is necessary to maximize the yield from genetic testing.

Based on this knowledge, we propose three activities to enhance standard practices in genomics data interpretation and better support precision oncology: (1) critical evaluation of practice emphasis, (2) assessment of assays, and (3) integration across practices. The first recommended activity involves carefully evaluating the emphasis placed on different aspects of practice. In the context of congenital diseases, the primary concern lies in understanding etiology and making a diagnosis (Richards et al., 2015). In contrast, in oncology, the primary focus is on therapeutics and prognosis (Li et al., 2017). Consequently, the current reporting conventions, resources leveraged, and prioritization algorithms employed in oncology differ from those in inherited disorders (Fig. 4A). We demonstrate that a broader approach to variant annotation could enhance patient care by capturing information about alleles that are relevant to clinical disciplines beyond oncology, in line with recent guidelines (Li et al., 2017; Mandelker et al., 2019; Richards et al., 2015). While the primary goal of tumor profiling is to guide treatment decisions, it is important to note that up to 17% of patients may harbor an inherited (likely) pathogenic variant in a cancer susceptibility gene (DeLeonardis et al., 2019; Mandelker et al., 2017; Meric-Bernstam et al., 2016; Neben et al., 2019; Schrader et al., 2016). Moreover, data indicate that patients undergoing tumor profiling often desire to be informed of incidental findings (Meric-Bernstam et al., 2016). Therefore, we believe that these two paradigms must merge (Fig. 4B). In our view, all patients can benefit from the generation of systems that will uniformly and comprehensively evaluate their data because the goal of medicine has always been about personalized approaches. In the era of systems biology, it is increasingly appreciated that a genetic diagnosis is critical to cancer treatment (beyond the organ of origin), that treatments can be developed for congenital genetic diseases, and more. To achieve this convergence and practical implementation, institutions must establish processes that enable dynamic sharing and annotation of data. This can be accomplished through retrospective research (as demonstrated in this study), real-time discussions at tumor boards (with adequate data science support), or institution-wide implementation of robust and well-vetted systems. Thus, two approaches to software architecture can be used—to build either a centralized system that can ensure compliance and uniformity or a distributed network of vetted software applications that can be accessed on demand. In either case, computational tools and systems must be established to account for diverse types of genomic annotations and bring them together to support decision-making. By diverse types of annotations, we refer to oncology and heritable conditions as in the current work and information beyond DNA annotations, such as from functional genomics, biochemistry, biophysics, and advanced computational modeling of gene products (Chi et al., 2023; Haque et al., 2023; Jorge et al., 2023; Ratnasinghe et al., 2023), for example. Properly implemented with clear guidelines paired with narrow-AI-based automations, adopting a more robust and integrated approach will undoubtedly add value to patient care.

FIG. 4.

Intersection and integration between practice emphases are necessary for comprehensive patient care. This figure diagrams what we view as key challenges in the field that, with the help of additional data systems including enhanced annotation and more data science to support oncology clinics, can be addressed to improve care. (A) Current state: The clinical question with highest priority in congenital diseases is to arrive at a diagnosis, whereas in oncology the highest priority question is typically treatment. Therefore, genomic variants are prioritized differently between congenital genomics (blue lines) and cancer (purple lines), often using different data systems. We depict that different data sources and systems may underlay different encounters and clinical evaluations, even for the same patient or over time. While all four dimensions are desired by any patient, the emphasis differs substantially by practice standards. (B) Proposed state: At each stage in patient care, we believe that the clinical care team should be able to ask cross-cutting and cross-disciplinary questions of the data and that doing so will yield more rich and useful information about individual patients and their unique contexts. Data and data science are therefore a common support for the continuum of care, which we depict as a single data system underlying all patients and across the interwoven questions of how to interpret genetic variation for congenital and malignant questions alike. We envision family members, diagnosticians, care providers, and more, working together to maximize care, leveraging diverse data that can support decision making about care or interventions.

Clinical genomics testing exhibits variability in reporting, even when the same sample is sent to different laboratories (Harrison et al., 2017; Madhavan et al., 2018). Tumors often display heterogeneity, resulting in the presence of different variants in different regions of the tumor tissue. Additionally, bioinformatics analysis pipelines may be optimized to identify certain types of mutations more effectively than others. The technical details of genomic assay design, which vary significantly among different vendors (Supplementary Fig. S4), further contribute to the complexity. These challenges are independent of the testing laboratory or the specific approach employed to enhance tumor variant interpretation.

The field continues to recognize the value that approaches such as ours can provide to patients. During this study, other research groups have proposed guidelines for interpreting tumor-derived genomics data (Mandelker et al., 2019). We have taken additional steps to identify phenotypic trends within the growth factor pathway, off-tumor associations, and non-cancer associations of tumor VUS observed in our patient cohort. Through a series of clinical vignettes, we further demonstrate how enhanced information could have benefited patients or their families. However, the increased integration of genomics data raises important considerations related to informed consent and the scope of data processing and linking that should be routinely addressed in clinical care (Fisher and Layman, 2018) and in support of guidelines for secondary findings (Miller et al., 2021). Above all, we strongly believe that an integrated approach will provide better support to medical practice, resulting in cost savings through a reduction in manual curation time and improved patient care by enabling increased access to and utilization of accurate data for addressing the relevant clinical questions.

Footnotes

ACKNOWLEDGMENTS

The authors thank the CTSI grant National Institutes of Health CTSA award, UL1TR001436, for resources, services, and facilities. This research was completed in part with computational resources and technical support provided by the Research Computing Center at the Medical College of Wisconsin.

AUTHORS’ CONTRIBUTIONS

E.D. performed formal analysis, data curation, and wrote the article. H.V.R. reviewed the article. B.W.T. supervised staff, facilitated resource access, contributed to data curation, and reviewed the article. S.S. performed formal analyses and investigation. J.L.G. supervised staff and reviewed the article. B.G. facilitated resource access and contributed to data curation. R.S. facilitated resource access. R.U. facilitated resource access, funding acquisition, study conceptualization, and wrote the article. M.T.Z. conceived the study, contributed to formal analyses and data curation, supervised staff, and wrote the article.

DATA AVAILABILITY STATEMENT

This work uses data derived for clinical care and anonymized for research by our IRB-approved investigative team. The authors have made selected data available through the current publication with broader access to the data possible on individual bases, with requisite IRB approvals. The deidentified primary dataset used in this study is available in .

AUTHOR DISCLOSURE STATEMENT

The authors report no conflict of interest.

FUNDING INFORMATION

This publication was supported in part by The Linda T. and John A. Mellowes Endowed Innovation and Discovery Fund and the Genomic Sciences and Precision Medicine Center of Medical College of Wisconsin.

SUPPLEMENTARY MATERIAL

References

Chi

, Jorge

, Jensen

, et al. A multi-layered computational structural genomics approach enhances domain-specific interpretation of Kleefstra syndrome variants in EHMT1. Comput Struct Biotechnol J, 2023; 21:5249–5258.

Croft

, O’kelly

, Wu

, et al. Reactome: A database of reactions, pathways and biological processes. Nucleic Acids Research, 2011; 39(Database issue):D691–D697.

Deignan

, Chung

, Kearney

, et al. COMMITTEE, A. L. Q. A. Points to consider in the reevaluation and reanalysis of genomic test results: A statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med, 2019; 21(6):1267–1270.

Deleonardis

, Hogan

, Cannistra

, et al. When should tumor genomic profiling prompt consideration of germline testing? J Oncol Pract, 2019; 15(9):465–473.

Esteva

, Robicquet

, Ramsundar

, et al. A guide to deep learning in healthcare. Nat Med, 2019; 25(1):24–29.

Fischer

, Grossmann

, Padi

, et al. Integration of TP53, DREAM, MMB-FOXM1 and RB-E2F target gene analyses identifies cell cycle gene regulatory networks. Nucleic Acids Res, 2016; 44(13):6070–6086.

Fisher

, Layman

. Genomics, big data, and broad consent: A new ethics frontier for prevention science. Prev Sci, 2018; 19(7):871–879.

Forbes

, Bindal

, Bamford

, et al. COSMIC: Mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res, 2011; 39(Database issue):D945–D950.

Haque

, Kawai

, Ratnasinghe

, et al. RAG genomic variation causes autoimmune diseases through specific structure-based mechanisms of enzyme dysregulation. iScience, 2023; 26(10):108040.

10.

Harrison

, Dolinsky

, Knight Johnson

, et al. Clinical laboratories collaborate to resolve differences in variant interpretations submitted to ClinVar. Genet Med, 2017; 19(10):1096–1104.

11.

, Li

, Yan

, et al. Variant Interpretation for Cancer (VIC): A computational tool for assessing clinical impacts of somatic variants. Genome Med, 2019; 11(1):53.

12.

Jorge

, Chi

, Mazaba

, et al. Deep computational phenotyping of genomic variants impacting the SET domain of KMT2C reveal molecular mechanisms for their dysfunction. Front Genet, 2023; 14:1291307.

13.

Kanehisa

, Goto

, Sato

, et al. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research, 2012; 40(Database issue):D109–D114.

14.

Kaurah

, Huntsman

. 1993. Hereditary diffuse gastric cancer. In: GeneReviews(R). ( Adam

, Ardinger

, Pagon

, et al. eds.) University of Washington, Seattle: Seattle, WA.

15.

Knijnenburg

, Wang

, Zimmermann

, et al. Cancer Genome Atlas Research Network. Genomic and molecular landscape of DNA damage repair deficiency across the cancer genome Atlas. Cell Rep, 2018; 23(1):239–254.e6.

16.

Kocher

, Quest

, Duffy

, et al. The Biological Reference Repository (BioR): A rapid and flexible system for genomics annotation. Bioinformatics, 2014; 30(13):1920–1922.

17.

Kohlmann

, Gruber

. 1993. Lynch syndrome. In: GeneReviews(R). ( Adam

, Ardinger

, Pagon

, et al. eds.) University of Washington, Seattle: Seattle, WA.

18.

Landrum

, Lee

, Riley

, et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 2014; 42(Database issue):D980–D985.

19.

, Datto

, Duncavage

, et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer: A joint consensus recommendation of the association for molecular pathology, American Society of Clinical Oncology, and College of American Pathologists. J Mol Diagn, 2017; 19(1):4–23.

20.

Madhavan

, Ritter

, Micheel

, et al. ClinGen Cancer Somatic Working Group - standardizing and democratizing access to cancer molecular diagnostic data to drive translational research. Pac Symp Biocomput, 2018; 23:247–258.

21.

Malone

, Oliva

, Sabatini

PJB

, et al. Molecular profiling for precision cancer therapies. Genome Med, 2020; 12(1):8.

22.

Mandelker

, Donoghue

, Talukdar

, et al. Germline-focussed analysis of tumour-only sequencing: Recommendations from the ESMO Precision Medicine Working Group. Ann Oncol, 2019; 30(8):1221–1231.

23.

Mandelker

, Zhang

, Kemel

, et al. Mutation detection in patients with advanced cancer by universal sequencing of cancer-related genes in tumor and normal DNA vs guideline-based germline testing. JAMA, 2017; 318(9):825–835.

24.

MCKUSICK-NATHANS INSTITUTE OF GENETIC MEDICINE (Johns Hopkins University, Baltimore, MD). 2019. Online Mendelian Inheritance in Man. OMIM.

25.

Meric-Bernstam

, Brusco

, Daniels

, et al. Incidental germline variants in 1000 advanced cancers on a prospective somatic genomic profiling protocol. Ann Oncol, 2016; 27(5):795–800.

26.

Miller

, Lee

, Chung

, et al. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: A policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med, 2021.

27.

Neben

, Zimmer

, Stedden

, et al. Multi-Gene panel testing of 23,179 individuals for hereditary cancer risk identifies pathogenic variant carriers missed by current genetic testing guidelines. J Mol Diagn, 2019; 21(4):646–657.

28.

Nishimura

. Biocarta. Biotech Software Internet Rep, 2001; 2(3):117–120.

29.

Norgeot

, Glicksberg

, Butte

. A call for deep-learning healthcare. Nat Med, 2019; 25(1):14–15.

30.

Offin

, Liu

, Drilon

. Tumor-agnostic drug development. Am Soc Clin Oncol Educ Book, 2018; 38:184–187.

31.

Park

JJH

, Hsu

, Siden

, et al. An overview of precision oncology basket and umbrella trials for clinicians. CA Cancer J Clin, 2020; 70(2):125–137.

32.

Ratnasinghe

, Haque

, Wagenknecht

, et al. Beyond Structural Bioinformatics for Genomics with Dynamics Characterization of an Expanded KRAS Mutational Landscape. bioRxiv, 2023.

33.

Richards

, Aziz

, Bale

, et al. COMMITTEE, A. L. Q. A. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med, 2015; 17(5):405–424.

34.

Schrader

, Cheng

, Joseph

, et al. Germline variants in targeted tumor sequencing using matched normal DNA. JAMA Oncol, 2016; 2(1):104–111.

35.

Stenson

, Ball

, Mort

, et al. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinformatics, 2012;Chapter 1:1.13.1–1.13.20.

36.

Subramanian

, Tamayo

, Mootha

, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 2005; 102(43):15545–15550.

37.

Tokheim

, Karchin

. CHASMplus reveals the scope of somatic missense mutations driving human cancers. Cell Syst, 2019; 9(1):9–23.e8.

38.

Whitfield

, Sherlock

, Saldanha

, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell, 2002; 13(6):1977–2000.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.89 MB

0.19 MB

0.18 MB

0.10 MB

0.40 MB