Abstract
Pathogenicity Islands (PAIs) are the sub-sets of Genomic Islands (GIs) that are acquired by horizontal gene transfer (HGT) and are generally shown to have a significant deviation in G + C, dinucleotide or codon frequency from core genome. Major approaches used for PAI identification are based on composition bias and/or similarity with known PAIs. These approaches either limit the search to GIs or to regions similar to previously annotated PAIs. PredictBias is a web application for the identification of genomic and pathogenicity islands in prokaryotes based on composition bias, presence of insertion elements, proximity with virulence-associated genes and absence in related non-pathogenic species. A profile database of virulence factors (VFPD) has been developed using 213 protein families associated to virulence retrieved from Pfam and PRINTS database. PredictBias performs a RPSBLAST search for regions with significant composition bias against VFPD. If a region encodes for at least one protein related to virulence then it is marked as potential PAI (biased-composition) otherwise as GI. Regions involved in virulence but having unsuspicious composition bias due to ancient HGT are identified by scanning genome segments (8 ORFs) with more than four significant hits to VFPD and are marked as potential PAI (unbiased-composition). The relative absence of potential PAIs in related non-pathogenic species can be investigated using 'compare genome feature' of PredictBias that further aids in validating the results and defining boundaries for PAIs. Performance measure analysis showed that the output of PredictBias is in agreement with the known findings. PredictBias is available at www.davvbiotech.res.in/PredictBias.
