Abstract
RecA is a highly conserved bacterial protein that plays crucial roles in many cellular processes and hence is a potential target in the chemotherapy of bacterial infections. An understanding of the functional similarity between RecA proteins from different bacterial species should yield further insights into the biochemistry of RecA protein, along with the potential for new approaches to facilitate the improvement of RecA-targeted drugs. In this technical note, the authors present an in silico method based on tri-oligonucleotide usage correlations (TOUC) to predict the functional similarity between two RecA orthologs. The TOUC values analyzed in this study are in good agreement with the available experimental results. This method should prove useful in guiding future experimental efforts aimed at furthering our understanding of the biochemistry of RecA proteins and subsequent development of new drugs that modulate RecA biological activities in bacteria.
Introduction
R
Short nucleic acid polymers, typically with eight or fewer bases, represent a source of information that can be used to characterize a DNA sequence, such as with respect to restriction modification, structural constraints, or relatedness to other DNA sequences. 4 Tri-oligonucleotide usage correlations (TOUC) reflect similarities between two DNA sequences in terms of codon usage, gene expression class, phylogeny, and DNA structure.5,6 In the case of RecA-encoding genes, trioligonucleotide usage patterns provide a much more accurate estimate of the relatedness of DNA sequences compared with other analyses, such as guanine plus cytosine percentage, for the following reasons. First, the evolutionary, structural, and functional conservation of RecA orthologs avoids inhomogeneous oligonucleotide usage. 7 Second, horizontal gene transfer is almost null in the case of RecA-encoding genes as stipulated by the complexity hypothesis 8 ; hence, local fluctuations in base composition that can be generated by horizontal gene transfer are also avoided. Third, the reliability of oligonucleotide usage patterns is a function of sequence length, and this analysis is not suited for sequences shorter than 1 kb. 9 Therefore, the lengths of RecA-encoding gene nucleotide sequences, which are usually around 1 kb, allow for meaningful oligonucleotide analysis.
In this study, we have tried to introduce an in silico method based on TOUC to predict the functional similarity between two RecA orthologs. To illustrate the versatility of the TOUC analysis, we have combined our in silico data into previously published experimental data and checked the predictive accuracy of the method.
Methods
A multiheaded FASTA file of RecA-encoding genes was prepared from the National Center for Biotechnology Information (NCBI) Web site (http://www.ncbi.nlm.nih.gov). The “Batch Mode” was used to read the FASTA file and to compute the nucleotide usage patterns of all sequences in a fully automated manner. Oligonucleotide usage pattern correlations were executed using the TETRA stand-alone program available at http://www.megx.net/tetra. 9 The TETRA program provides correlation coefficients that are equal to 1 for two identical nucleotide sequences and less than 1 or negative for two different sequences.
Results and Discussion
Our previous plasmid complementation assay showed that Deinococcus radiodurans RecA proteins complemented the RecA-deficient phenotype in Escherichia coli, unlike the case with Deinococcus geothermalis RecA.10,11 It has also been shown that E. coli RecA protein partially complemented D. radiodurans RecA deficiency. 12 These results suggested the possibility that RecA-targeted drugs developed based on E. coli RecA protein may be effective with D. radiodurans but not with D. geothermalis. The amino acid sequence similarity between D. radiodurans RecA and E. coli RecA is 87.0%, and that between D. geothermalis RecA and E. coli RecA is 86.4%. Therefore, the differences observed in the complementation experiments cannot be accounted for by differences in the amino acid sequence similarity of RecA proteins. Table 1 shows the correlations among representative RecA-encoding genes based on tri-oligonucleotide usage frequencies. The TOUC value for D. radiodurans RecA versus E. coli RecA (0.553) was higher than that for D. geothermalis RecA versus E. coli RecA (0.322). The TOUC values were correlated with results obtained by the complementation experiments.
Myxococcus xanthus has two RecA paralogs, RecA1 and RecA2. It has been shown that M. xanthus RecA2 fully complemented the UV sensitivity of an E. coli RecA-deficient strain, whereas M. xanthus RecA1 only partially complemented this strain, despite there being no significant difference in the degree of similarity to E. coli RecA between M. xanthus RecA1 and RecA2. 13 In this study, TOUC values for M. xanthus RecA1 versus E. coli RecA and M. xanthus RecA2 versus E. coli RecA were 0.172 and 0.405, respectively. The TOUC values are in good agreement with the above experimental result.
Correlations among RecA Proteins Based on Tri-Oligonucleotide Usage Frequencies
The degree of phylogenetic relatedness does not guarantee or exclude the complementation of a RecA protein from a donor that may or may not belong to the same phylogenetic group of a RecA-deficient host. For example, although Neisseria gonorrhoeae (Betaproteobacteria) is phylogenetically related to E. coli (Gammaproteobacteria), it has been experimentally shown that N. gonorrhoeae RecA protein only partially complemented E. coli RecA deficiency. 14 In this study, the TOUC value for N. gonorrhoeae RecA versus E. coli RecA was relatively low (0.25), supporting the above experimental result. RecA proteins from diverse bacterial species such as Symbiobacterium thermophilum (Firmicutes), Magnetospirillum magneticum (Alphaproteobacteria), Anabaena sp. (Cyanobacteria), Thermotoga maritima (Thermotogae), and D. radiodurans (Thermus-Deinococcus) exhibited relatively high TOUC values for E. coli RecA protein.
RecA proteins from S. thermophilum (Firmicutes), Kineococcus radiotolerans (Actinobacteria), Geobacter metallireducens (Deltaproteobacteria), and M. magneticum (Alphaproteobacteria) exhibited high TOUC values (greater than 0.65) for two deinococcal RecA proteins. The family Thermaceae is closely related to the family Deinococcaceae and constitutes a unique group of phyla in bacteria. 15 Because the TOUC value for Thermus thermophilus RecA versus D. geothermalis RecA was 0.602 ( Table 1 ), it is expected that T. thermophilus RecA protein can complement RecA deficiency in D. geothermalis. On the other hand, the TOUC value for T. thermophilus RecA versus D. radiodurans RecA was relatively low (0.403) compared with that of T. thermophilus RecA versus D. geothermalis RecA. We are interested in determining whether T. thermophilus RecA protein is functional in D. geothermalis and D. radiodurans. Plasmid complementation experiments are under way in our laboratory to confirm the versatility of the TOUC analysis.
Our study supports the notion that TOUC can be used to predict the degree of functional similarity of RecA proteins, being potential targets in the development of new antibacterial drugs. TOUC analysis should prove useful in guiding future experimental efforts aimed at furthering our understanding of the biochemistry of RecA and subsequent development of drugs that modulate RecA biological activities in bacteria.
Footnotes
Acknowledgements
This work was partly supported by a grant-in-aid for scientific research from the Japan Society for the Promotion of Science (grant no. 22580098 to I.N.). We are grateful to Dr. Robert M. Campbell and one anonymous referee for their thoughtful comments, which greatly helped to improve this manuscript.
