Abstract
In order to identify somatic focal copy number aberrations (CNAs) in cancer specimens and to distinguish them from germ-line copy number variations (CNVs), we developed the software package FocalCall. FocalCall enables user-defined size cutoffs to recognize focal aberrations and builds on established array comparative genomic hybridization segmentation and calling algorithms. To distinguish CNAs from CNVs, the algorithm uses matched patient normal signals as references or, if this is not available, a list with known CNVs in a population. Furthermore, FocalCall differentiates between homozygous and heterozygous deletions as well as between gains and amplifications and is applicable to high-resolution array and sequencing data.
AVAILABILITY AND IMPLEMENTATION:
FocalCall is available as an R-package from: https://github.com/OscarKrijgsman/focalCall. The R-package will be available in Bioconductor.org as of release 3.0.
Introduction
The increase in the resolving power of DNA copy number profiling techniques has led to the simultaneous discovery of the extend of (1) copy number variations (CNVs) of germ-line origin in the general population 1 as well as (2) focal copy number aberrations (CNAs) of somatic origin in cancer specimens. 2 The limited size of focal CNAs offers an excellent opportunity to pinpoint potential driver genes in cancer.3–6 CNV detection usually is an obstacle in the identification of cancer driver genes. Unfortunately, with copy number assessment in tumors, a mix of focal CNAs and CNVs is detected, of which most have the same appearance (Fig. 1). A procedure that partly circumvents the interference of CNVs in tumor samples is the simultaneous analysis of matched patient normal DNA. However, if the diploid balance in a tumor is disturbed, ie, a single copy gain, a heterozygous CNV will still give rise to a superimposed focal signal. To recognize the CNAs, a negative selective procedure can be applied by identifying CNVs detected in the healthy population through the analysis of a series of healthy normal copy number profiles, preferably patient group matched, or otherwise an external database of genomic variants (ie, DGV). 7 Alternatively, an effective positive selection is through the identification of focal homozygous deletions and high-level amplifications that differ in amplitude from CNVs. 5 This approach however neglects many heterozygous focal CNAs.

Copy number profiles of a lung cancer sequencing sample and matched patient normal signal.
12
Panel (
Despite the great opportunities focal CNAs offer for cancer gene discovery, only few software tools are available that appreciate them, eg, GISTIC, WIFA, and control-FREEC.8–10Both GISTIC and WIFA were developed for array data and can detect focal CNAs in series of samples, but not in individual tumor profiles. GISTIC has a dedicated option to discriminate focal CNAs from CNVs based on an external database. Control-FREEC was developed to calculate genome-wide copy number information from whole genome sequencing data and can distinguish CNAs from CNVs, provided a matched patient normal signal is available.
Here, we present FocalCall, which elaborates on commonly used segmentation and calling algorithms. 11 A user-defined size cutoff allows for the identification of focal CNAs in individual samples as well as series of samples and can distinguish them from CNVs. FocalCall accepts copy number data from both high-resolution genome-wide array comparative genome hybrizations (aCGH) and single nucleotide polymorphism (SNP) arrays as well as data from sequencing data experiments, 12 with or without a matched patient normal signal.
Methods
Patient Materials and Settings.
FocalCall was evaluated with four publicly available data sets: (1) shallow whole genome sequencing data (∼0.2 × genome coverage) from tumor and normal DNA of a lung cancer patient 12 ; (2) SNP array (250K) data of 371 lung cancer patients without matched patient normal samples 2 ; (3) aCGH data (244K) of 74 glioblastoma multiforme (GBM) patients hybridized against its matched normal 13 ; and (4) aCGH data (105K) of 60 high-grade cervical cancer pre-curser lesions hybridized against a pool of 100 healthy individuals. 4 Dataset 4 is available from the Gene Expression Omnibus (GSE34575) and used as an example dataset in the R-package.
Detection of Recurrent Aberrations.
Standard data output as produced by CGHcall
11
was used as input for the main function

Frequency plots of the GBM dataset of all aberrations (top) and focal aberrations and CNVs (bottom) as generated by FocalCall functions
Distinction between Focal CNAs and CNVs.
For each SRO (Supplementary Fig. 1), the percentage of overlap of focal CNAs with a normal reference or known CNVs is returned. If matched patient reference data are available, this can be provided in
Reporting of Focal CNA.
The function
Computational Time.
Computational times for the detection of focal CNAs in the GBM dataset (
Results
Detection of Focal CNAs in Single Patient and Series of Tumors.
The lung cancer sequencing data yielded a total of 38 focal gains and losses: 7 were identified as CNVs and 31 as focal CNAs, of which 6 were high-level amplifications (including FGFR1) and 4 were homozygous deletions (including CDKN2A, Fig. 1 and Supplementary Table).
The lung cancer SNP array dataset yielded a total of 503 focal CNAs with a frequency >5%. A total of 43 of the focal gains and losses overlapped with the CNV regions as archived in the DGV database.
7
All genes in focal CNAs detected by GISTIC in the original paper were also detected by FocalCall.2 The remaining 460 detected focal CNAs were enriched for known cancer driver genes (
The GBM aCGH dataset yielded a total of 434 somatic focal CNAs and 90 CNVs. The focal CNAs encompassed known cancer driver genes like EGFR, PTEN, and CDKN2A. All 20 focal CNAs previously reported by GISTIC
13
were recognized by FocalCall. Additionally detected focal CNAs showed a highly significant enrichment for known cancer driver genes (
The cervical precursor lesion aCGH dataset yielded a total of 94 focal CNAs with FocalCall. Two of the identified genes, hsa-mir-375 and EYA2, were functionally tested and validated as a new oncogene and tumor suppressor gene. 4 The data and example scripts for this dataset are available in the R-package.
Conclusion
Focal CNAs provide an excellent opportunity to detect potential cancer driver genes. 6 Through advances in techniques, the resolution of DNA copy number detection has increased enormously and the changes we can identify have become smaller. Accurate detection and distinction of somatic aberrations from germ-line CNVs are thereby mandatory. FocalCall offers researchers a user-friendly tool to detect focal CNAs in high-resolution DNA copy number data and provides multiple methods to distinguish these from CNVs. FocalCall elaborates on a widely used DNA copy number tool CGHcall 11 and comprehensive genome analysis packages in the R/Bioconductor environment. In addition, FocalCall output in the IGV data format allows for easy browsing through the data and provides a direct link with the genes affected.
In conclusion, we provide an alternative and sensitive procedure for the detection of focal CNAs applicable to both individual and series of samples analyzed by either array or next-generation sequencing.
Author Contributions
Conceived and designed the experiments: OK, BY. Analyzed the data: OK, CB. Wrote the first draft of the manuscript: OK, BY. Contributed to the writing of the manuscript: OK, BY, MvdW, GAM. Agree with manuscript results and conclusions: OK, CD, GAM, MvdW, BY. Jointly developed the structure and arguments for the paper: OK, BY. Made critical revisions and approved final version: OK, CB, MvdW, GAM, BY. All authors reviewed and approved of the final manuscript.
Footnotes
Acknowledgments
We would like to thank Vanessa St. Aubyn for critically reading our manuscript and for useful comments.
