Abstract
Abstract
Genomic variations are in the focus of research to uncover mechanisms of host–pathogen interactions and diseases such as cancer. Nowadays, next-generation sequencing (NGS) data are analyzed through dedicated pipelines to detect them. Surrogate NGS data in conjunction with genomic variations help to evaluate pipelines and validate their outcomes, fostering selection of proper tools for a given scientific question. I describe how existing approaches for simulating NGS data in conjunction with genomic variations fail to model local enrichments of single nucleotide polymorphisms (SNPs), so called SNP clusters. Two distributions for count data are applied to publicly available collections of genomic variations. The results suggest modeling of SNP cluster sizes by overdispersion-aware distributions.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
