Abstract
An anomalous (i.e., significantly different from genome-average) GC-content is often used as one of the markers to reveal the events of horizontal gene transfer (HGT). Unfortunately, results obtained by the traditional fixed-length window analysis strongly depend on an arbitrary selection of DNA window length. Here we present a new method for genome-wide statistical analysis of GC-content without that drawback. The method is based on a set of nonparametric statistical tests and is capable of providing reliable estimations of both a local and global GC-content, and thus can identify small local areas (as short as 30 bp) with anomalous GC-content in a bacterial genome. The tests, applied to a well-studied bacterial genome of Escherichia coli K-12, show that approximately 21% of the genome belongs to the anomalous GC-content areas. Among top 23 anomalous GC-content areas, seven correspond to the annotated prophages, four to Rhs elements, and two to IS elements. A remaining 10 areas contain putative horizontally transferred DNA and genes with still unknown functions.
Software is available at http://mml.spbstu.ru/gcstat.