Abstract
ABSTRACT
INFO, INterruption Finder and Organizer, has been used to find coding sequence intron-exon splice junctions in human and other DNA by comparing the six conceptual translations of the input DNA sequence with sequences in protein databanks using a similarity matrix and windowing algorithm. Similarities detected both delineate position of the gene and provide clues as to the function of the gene product. In addition to use of a standard similarity matrix and windowing algorithm, INFO uses two novel steps, the MiniLibrary and Reverse Sequence steps, to enhance identification of small exons and to improve precision of junction nucleotide delineation. Exons as small as about 30 bases can be reliably found, and >90% of junctions are precisely identified when canonical splice junction information is used. With the MiniLibrary and Reverse Sequence steps, INFO parameters need not be optimized by the user. In comparative test runs using 19 human DNA sequences, INFO found 108 of 111 exons, with 0 reported false positives, compared with 111 exons and 51 false positives for BLASTX, 99 exons and 6 false positives for GRAIL II, 77 exons and 24 false positives for GeneMark, 61 exons and 9 false positives for GeneID, and 105 exons and 6 false positives for PROCRUSTES. The correlation coefficient for finding and positioning these 111 exons was greater than 98% for INFO. Comparable results were obtained in test runs of 13 nonhuman DNA sequences. INFO is applicable to DNA from any species, will become more robust as sequence databanks expand, and complements other heuristic approaches.
Get full access to this article
View all access options for this article.
