Sage Journals: Discover world-class research

Abstract

Two-component systems (TCS) are short signalling pathways generally occurring in prokaryotes. They frequently regulate prokaryotic stimulus responses and thus are also of interest for engineering in biotechnology and synthetic biology. The aim of this study is to better understand and describe rewiring of TCS while investigating different evolutionary scenarios.

Based on large-scale screens of TCS in different organisms, this study gives detailed data, concrete alignments, and structure analysis on three general modification scenarios, where TCS were rewired for new responses and functions: (i) exchanges in the sequence within single TCS domains, (ii) exchange of whole TCS domains; (iii) addition of new components modulating TCS function.

As a result, the replacement of stimulus and promotor cassettes to rewire TCS is well defined exploiting the alignments given here. The diverged TCS examples are non-trivial and the design is challenging. Designed connector proteins may also be useful to modify TCS in selected cases.

Keywords

histidine kinase engineering promoter sensor response regulator synthetic biology sequence alignment connector Mycoplasma

Introduction

A key mechanism used by bacteria for sensing their environment is based on two-component systems (TCS). These systems typically consist of a sensor protein with a membrane-bound histidine kinase domain (HisKA) and a corresponding regulator protein with a response regulator domain (RR). The sensor protein detects specific changes in the environment and subsequently binds adenosine triphosphate (ATP). This causes a structural change of the sensor protein and, after autophorphorylation at a histidine residue, evokes phosphor-transfer to the corresponding response regulator. The response regulator then changes its structure and mediates a cellular response.¹ TCS standard structure is well conserved.^2,3 Several databases describe different aspects of TCS.^4–7 Mutational analyses of individual components in TCS are described in previous reports.^8,9 Design, rewiring, and modifications of TCS have been studied for a long time, including efforts in biotechnology.^10–16 Still, it is a major challenge to successfully engineer TCS systems, as direct design attempts only work well for controlled cases and evolutionarily short distances.¹⁷ In taking a closer look, it turned out that information for specific cases on individual functional sites and sequences is often lacking. Therefore, we looked closely at evolutionary changes in TCS, in order to create a more solid basis for future design attempts. In synthetic biology, rewiring TCS allows us to construct synthetic networks.¹⁸ For this, exchange of TCS promotors, partial or full replacement of sensor and regulator, as well as adding additional components is key.¹⁹ The specific motifs involved and the overall topology of the system determine the observed switching behavior.²⁰

Consequently, the aim of this study is to describe and review evolutionary scenarios as a guide to rewire two-component systems.

Taking a large-scale screen on available TCS from various databases as our basis (see Supplementary material), we considered three general scenarios spanning from local to more global changes of TCS: (i) Individual amino acid changes. These lead to direct sequence changes of sensors and regulators, eg, changing specificity of stimulus or allowing the regulation of new genes. (ii) An alternative scenario considers more radical changes such as domain swapping. We performed large-scale screens and identified events in which such exchanges lead to a change in the overall function of a TCS. This can be exploited for more drastic engineering strategies, which are otherwise very difficult to predict in their outcome. (iii) Another modification strategy does not interfere with the sensor or regulator of the TCS. Additional proteins or domains, so called connectors, interact with either one or both of them. This again modulates output and performance of the TCS. Starting from a known event (SafA in Escherichia coli) we consider further proteins, which could have such connector functions and examine their potential to change TCS function.

Results and Discussion

We screened various databases for TCS and their modifications. Supplementary material illustrates this in Table S1 for a screen listing the most frequently occurring contexts in which histidine kinase or response regulator domains were found. Databases we screened include amongst others the database of protein families PFAM,²¹ the protein database Uniprot,²² as well as further repositories, such as MIST2,⁴ SENTRA,⁶ and P2CS.⁷ Furthermore, there are numerous sensors with periplasmic, membrane-embedded, and cytoplasmic sensor domains and a great diversity of regulator protein contexts.

TCS Rewiring by Changing Residues in Sequences

Sequence mutations change sensors and regulators, for instance the specificity of the stimulus recognized or the genes regulated. To gain concrete information useful for engineering, we looked closely at sequences from several bacterial model organisms, focusing especially on the recognition site and the DNA and promotor binding sites. Annotated information on these signatures is often not available and hence relies on detailed manual annotation as well as sequence comparisons. We revalidated predictions by extensive sequence-structure comparisons (more information see Supplementary material).

TCS Stimulus Signatures

We annotated here several stimulus recognition sites in different model organisms (E. coli 536, E. coli CFT073, E. coli K12 W3110, E. coli O157:H7 EDL933, E. coli K12 MG1655, E coli O157:H7 Sakai pO157, E. coli UTI89, Salmonella, Bacillus subtilis, Staphylococcus aureus, Legionella pneumophila, Listeria monocytogenes, Pseudomonas aeruginosa, and Mycoplasma pneumoniae) and for different stimuli (Table 1A; phosphor, iron, copper, osmotic, stress, citrate, fumarate and nitrate/nitrite;^23–25 sequence, genome and domain analysis, see Materials and methods). Table 1A shows the best consensus derived. However, for concrete engineering experiments and detection in new genomes, the signatures themselves are important and are given in detail summarizing all investigated sequences. They can be used directly for engineering. Detailed alignments are given in Supplementary material, section 1.2.

Table 1A.

Stimulus recognition consensus sequences for various TCS stimuli.

Stimulus	No. of sequences	Position	Recognition sequence¹
Phosphor	1	29–32	GYLP
Osmotic	4	36–158	NFAILPSLQQFNKVLAYEVRMLMTDKLQLEDGTQLVVPPAFRREIyrelgISLYTNEAAEEAGLRWAQHYEFLSHQMAQQLGGPTEVRVEVNKSSPVVWLKTWLSPNIWVRVPLTEIHQGDFS
Stress	6	25–135	LVYKFTAERAGRQSLDDLMNSSLYLMRSELREIPPHDWGKTLKEmdlnlsfdlrveplskyhlddismhrlrggeivALDDQYTFIQRIPRSHYVLAVGPVPYLYYLHQMr
Iron	6	35–64	HESTEQIQLFEQALRDNRNNDRHIMREIRE
Copper	3	37–86	HSVKVHFAEQDINDLKEISATLERVLNHPDETQARRLMTLEDIVSGYSNVLISLADSHGKTVYHSPGAPDIREFARDAIPDKDARGGEVFLLSGPTMMMPGHGHGHMEHSNWRMISLPVGPLVDGKPIYTLYIALSIDFHLHYINDLMNK
Citrate	4	43–182	asfedyltlhvrdmamnqakiiasndsvisavktrdykrlatianklQRDTDFDYVVIGDRHSIRLYHPNPEKIGYPMQFTKPGALEKGESYFITGKGSMGMAMRAKTPIFDDDGKVIGVVSIGYLVSKIDSWRAEFLLP
Fumarate	4	42–181	SQISDMTRDGLANKALAVARTLADSPEIRQGLQKKPQESGIQAIAEAVRKRNDLLFIVVTDMHSLRYSHPEAQRIGQPFKGDDILKALNGEENVAINRGFLAQALRVFTPIYDENHISKAQIGVVAIGLELSRVtqqindsrw
Nitrate/Nitrite	8	38–151	sslrDAHAINKAGSLRMQSYRLGYDLPSGEPDKNAHRQMFQQAlhspvltnlnvwyvpeavkTRYAHRNANWDGMNNRLQGGDDPWYNENIPNYMNQQDRFTLALDHYQerkqffec

Only the consensus recognition sequences are listed according to Uniprot. Well annotated sensors and organisms were compared as listed in Supplementary material. The sensor protein recognition site composition depends on the signal and is independent of the organism. Exact sequences and positions are aligned in Supplementary material. Accurate numbering according to E. coli proteins can be transferred to other organisms. Conserved amino-acids are labeled in bold print. Less conserved amino-acids are labeled in lowercase.

For rewiring, the transfer of such consensus sequences should be possible between organisms and proteins with the same sensor. To test in how far this is possible, we compared in detail the nitrate/nitrite recognition site (nitrate/nitrite sensor proteins NarX and NarQ; Table 1B). For different sensor proteins in the above-analyzed organisms, the structure of the sensor is accurately known (NarX or NarQ). We compared these sensor sequences in several E. coli, Salmonella, Vibrio and Haemophilus influenzae strains. The critical sensory region identified by sequence analysis was comparable in spite of the two different organisms and different proteins (for NARQ_ECOLI periplasmic region: position 35–146; numbering according to the E. coli Uniprot sequences). This supports the hypothesis that the signal is much more important than the organism or even the TCS family. In general, the recognition sites seem to depend strongly on the signal type, but remain conserved across the tested species.

Table1B.

Alignment of the Nitrate/Nitrite recognition site comparing NarX and NarQ.¹

Protein	Sequence
5…40….5…50….5…60….5…70….5…80….5…90… Q8Z4S5_SALTI –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA Q8XBE5_ECO57 –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA Q8ZN78_SALTY –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA NARQ_ECOLI –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA B5R4I7_SALEP TSSLRDAEAINIAGSLRMQSYRLGYDLQSRSPQINAHRQLFQHALNSPVLQNLN-AWYVPQAVKTRYA Q9KLR7_VIBCH ASSLNDAEAVNVSGSMRMQSYRLAYDIQTQSHDYKAHIFLFENSLYSPSMLALL-DWTVPSDIQQDYY NARQ_HAEIN –SNKYDAEAINISGSLRMQSYRLLYEMQEQPESVETNLRRYHISLHSSALLEVQNQFFTPNVLKHSYQ NARX_ECOLI QGVQGSAHAINKAGSLRMQSYRL-LAAVPLSEKDKPLIKEMEQTAFSAELTRAA—-ERDGQLAQLQ NARX_ECO57 QGVQGSAHAINKAGSLRMQSYRL-LAAVPLSEKDKPLIKEMEQTAFSAELTRAA—-ERDGQLAQLQ NARX_SHIFL QGVQGSAHAINKAGSLRMQSYRL-LAAVPLSEKDKPLIKEMEQTAFSAELTRAA—-ERDGQLAQLQ . ..:* ::***** . . :. . : . : . 5 100 110 120 130 140 Q8Z4S5_SALTI HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 Q8XBE5_ECO57 HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 Q8ZN78_SALTY HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 NARQ_ECOLI HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 B5R4I7_SALEP RLHANWL-EMNSRLQDG-DIAWYQTNINNYVDQIDLFVLAL 119 Q9KLR7_VIBCH QLIERWH-ELKKVLNSD-QKAQYLDQVAPFVSLVDGFVLKL 115 NARQ_HAEIN NILQRWT-NMEKYARQQ-DVKNYSKQLTDYVADVDYFVFEL 105 NARX_ECOLI GLQDYWRNELIPALMRAQNRETVSADVSQFVAGLDQLVSGF 103 NARX_ECO57 GLQDYWRNELIPALMRAQNRETVSADVSQFVAGLDQLVSGF 103 NARX_SHIFL GLQDYWRNELIPALMRAQNRETVSADVSQFVAGLDQLVSGF 103 : :: : :: :* :* :* :

Protein

Sequence

5…40….5…50….5…60….5…70….5…80….5…90… Q8Z4S5_SALTI –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA Q8XBE5_ECO57 –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA Q8ZN78_SALTY –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA NARQ_ECOLI –SSLRDAEAINIAGSLRMQSYRLGYDLQSGSPQLNAHRQLFQQALHSPVLTNLN-VWYVPEAVKTRYA B5R4I7_SALEP TSSLRDAEAINIAGSLRMQSYRLGYDLQSRSPQINAHRQLFQHALNSPVLQNLN-AWYVPQAVKTRYA Q9KLR7_VIBCH ASSLNDAEAVNVSGSMRMQSYRLAYDIQTQSHDYKAHIFLFENSLYSPSMLALL-DWTVPSDIQQDYY NARQ_HAEIN –SNKYDAEAINISGSLRMQSYRLLYEMQEQPESVETNLRRYHISLHSSALLEVQNQFFTPNVLKHSYQ NARX_ECOLI QGVQGSAHAINKAGSLRMQSYRL-LAAVPLSEKDKPLIKEMEQTAFSAELTRAA—-ERDGQLAQLQ NARX_ECO57 QGVQGSAHAINKAGSLRMQSYRL-LAAVPLSEKDKPLIKEMEQTAFSAELTRAA—-ERDGQLAQLQ NARX_SHIFL QGVQGSAHAINKAGSLRMQSYRL-LAAVPLSEKDKPLIKEMEQTAFSAELTRAA—-ERDGQLAQLQ . .*.*:* :**:******* . . :. . : *. : . 5 100 110 120 130 140 Q8Z4S5_SALTI HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 Q8XBE5_ECO57 HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 Q8ZN78_SALTY HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 NARQ_ECOLI HLNANWL-EMNNRLSKG-DLPWYQANINNYVNQIDLFVLAL 105 B5R4I7_SALEP RLHANWL-EMNSRLQDG-DIAWYQTNINNYVDQIDLFVLAL 119 Q9KLR7_VIBCH QLIERWH-ELKKVLNSD-QKAQYLDQVAPFVSLVDGFVLKL 115 NARQ_HAEIN NILQRWT-NMEKYARQQ-DVKNYSKQLTDYVADVDYFVFEL 105 NARX_ECOLI GLQDYWRNELIPALMRAQNRETVSADVSQFVAGLDQLVSGF 103 NARX_ECO57 GLQDYWRNELIPALMRAQNRETVSADVSQFVAGLDQLVSGF 103 NARX_SHIFL GLQDYWRNELIPALMRAQNRETVSADVSQFVAGLDQLVSGF 103 : * :: : :: :* :* :* :

For the same signal, two different sensors are compared in several E. coli, Vibrio, Haemophilus influenzae and Salmonella species regarding the Nitrate/Nitrite binding site: We identified the critical region for sensoring by structure analysis of the periplasmic region (NARQ_ECOLI periplasmic region, position 35–146). Subsequently different protein sequences and organisms were compared. The completely conserved sequence parts (indicated by stars) support that the sensor sequence depends more on the signal and not on the protein or organism type. Colon and point indicate well and less well conserved amino acid positions.

Binding Sites on the DNA

Another way to modify TCS functionality is to exchange the cellular response. Therefore, we analyzed the DNA binding site between regulator protein and DNA. Promotor information is normally badly annotated. The required promotor data retrieval in this study was achieved in a manual, hand curated manner by direct sequence comparison. DNA binding sites for target genes in E. coli K-12 were first collected from different sources (Prodoric,²⁶ DBTBS,²⁷ TractorDB,²⁸ and PDBSum) and afterwards analyzed applying specific perl-scripts and regarding further E. coli strains (E. coli 536, E. coli CFT073, E. coli K-12 W3110, E. coli O157:H7 EDL933, E. coli K-12 MG1655, E. coli O157:H7 Sakai pO157, E. coli UTI89). Conserved motifs for the DNA binding sites were summarized in form of consensus sequences per TCS family (E. coli, Table 2A; other gram-negative bacteria, Table 2B). Re-annotation using databases and subsequent sequence analysis tools are described in Materials and methods.

Table 2A.

Specific target gene DNA sequences in E. coli.¹

Regulated gene	Sequence
OmpC	TTTACATTTTGAAACATCT
OmpF	T[GT][GT][TG]TA[CG][AC][TA][AC]TTT[TC]
OmpF/OmpC	TTT[TA]C-TTTT[TG]
NarG1	1 TACCCATTAA 10
NarG2	1 TAACCAT—- 7
NarG3	1 TAATTAT—- 7
NarG4	1 TACTTTA—- 7
NarG5	1 -AGGGGTA— 7
NarG6	1 TAGGAAT—- 7
NarG7	TTTAACCCGAtcggggtatg
NarK	TAC[TC][CG][CA]T
CitB	agtAATTTAATTaatt
LytT	[TA][AC][CA]GTTN[AG][TG]
LytT	taaggAAATAAAACTGATTTTcacgtca
AlgR	aaatGAATATTTATTCAAat
GlnG/GlnK	tgcaCCACCATGGTGCA
Spo1	1 ————-TTTGTCGAATGTAA—————- 14
Spo2	1 —AATTTCATTTTTAGTCGAAAAACAGAGAAAAACAT 35
Spo3	1 AAAAGAAGATTTTTCGACAAATTCA ———————– 25

Profiles of target gene binding sites bound by regulators in E. coli are given. Consensus sequences were derived from detailed multiple alignments (see Supplementary material) mining several databases (Prodoric, TractorDB, PDB and PDBSum, PubMed). Sequences and positions were aligned (Supplementary material). Given binding sequences were first found in E. coli K-12 strains and were verified for the other E. coli strains (see Supplementary material) using motif specific scripts (Materials and methods). Less conserved parts are labeled in lowercase letters, motifs with brackets and strongly conserved parts are highlighted by black boxes.

Table 2B.

Specific target gene DNA sequences in further gram negative bacteria.¹

Family	Regulated gene	Function	Example organism	Sequence
NtrC	GlnH	Transcription factor	Salmonella	GacatTTGCACTTAAATAGTGCACaaccc
NtrC	GlnA	Transcription factor	Salmonella	ttctaTTGCACCAATGTGGTGCTTaatgt
				cattgAAGCACTATTTTGGTGCAAcatag
NtrC	GlnK	Transcription factor	Salmonella	CcattATGCACCGTCGTGGTGCGTttttc
NtrC	GlnA	Transcription factor	Salmonella	CtataATGCACTAAAATGGTGCAAccttt
NarL	NarK	Transcription factor	Salmonella	AatagCCTACTCATTAAGGGTAATaacta
NtrC	GlnG	Transcription factor	Shigella flexneri	CtataATGCACTAAAATGGTGCAAcctgt
ArgR	ArgA	Transcription factor	Salmonella	actaaTTTCGAATAATAATTCACTAgtggg
ArgR	ArgC	Transcription factor	Salmonella	cgttaATGAATAAAAATACATaatta

The table shows TCS target gene promotor sites in Salmonella (two strains) and Shigella. Capital letters indicate similarities within the binding site between the three compared organisms.

In most cases the promotor nucleotide sequences identified were quite short. As analyzed previously for different promoter sequences,^29,30 we found that the TCS promoter sequences we identified have to occur in multiple copies to allow for higher specificity (including different affinities and different functions). Motifs were often repeated allowing oligomeric binding of the regulator protein.

Based on our analyses, it was possible to retrieve the concrete numbers of replicates and distances between the replicates: Table 3 summarizes the regulator proteins, the regulated genes, the numbers of binding site replicates, and the distances between the replicates.

Table 3.

Promotor binding sites.

Response regulator protein	Regulated gene	Repetition	Distance [NS]
Citrate utilization protein B (CitB)	Citrate lyase (CitC)	6	40
Nitrogen regulation protein (NtrC)	Sequences glutamine synthetase (GlnA)	2	63
Nitrogen regulation protein (NtrC)	Nitrogen regulator protein (GlnK)	7–12	Variable
Nitrate/Nitrite response regulator protein (NarL)	Respiratory nitrate reductase (NarG)	Variable	Ca. 6
Nitrate/Nitrite response regulator protein (NarL)	Nitrite extrusion protein (NarK)	Variable	Variable
Osmolarity response regulator (OmpR)	Outer membrane protein C and F (OmpC/OmpF)	3	7

Notes: The table shows response regulator protein and the regulated gene. The numbers of binding site replicates are listed as well as the distance between the binding sites.

As these results show that the stimulus recognition sites and promoter regions are well conserved, we are confident that the resulting consensus sequences given in Tables 1–3 will be of great help in direct design experiments¹⁷ (see also Supplementary material, Figure S2 and Table S2 for detailed suggestions on HisKA substitution design).

TCS Rewiring by Domain Shuffling and Diverged Domains

The screens furthermore revealed more extensive changes in TCS, such as domain swapping. We identified diverged regulators or sensors in a genome where only one partner is known (Legionella, Listeria) and spot strongly diverged TCS by conserved domains in a new context (several examples including M. pneumoniae).

Diverged TCS domains

Extensive sequence analysis per TCS family, including related organisms, enabled us to better describe and predict the regulatory function for three TCSs in L. pneumophilia. New partners could be found for the osmosis-sensing family (OmpR) and the nitrate/nitrite response family (NarL). Table 4A contains the predicted and previously missing partners, the identification methods, and the TCS functions. Regarding the organism L. monocytogenes, three new TCSs within the NarL and the OmpR family could be identified, see Table 4B.

Table 4.

Recognition of divergent TCS and missing TCS partners.

Family	Identification	Stimulus	Sensor²	Regulator²	Strain	Function
(A) L. pneumophila str. Philadelphia¹
OmpR	Iterative sequence searches with cut off e-30 using OmpR sequences from Enterobacter cloacae	Mg starvation	Qse GI:52841522 CKnown/annotated by PMID 15448271	GI:52841523 which is potential similar to QseB	Philadelphia 1	Regulated protein FliC; GI: 52841570; Flagella regulation;
NarL	Iterative sequence searches with cut off e-30 using NP_288375 E. coli O157:H7 str. EDL933	Carbon	BarA GI: 52842130 Known/annotated by PMID 15448271	GI:52842852 which is potential similar to UvrY	Philadelphia 1	Regulated protein CsrA; GI:52841018 Carbon storage regulator
NarL	Iterative sequence searches with cut off e-30 in E. coli ETEC H10407	Pheromone		GI:52840952 which is potential similar to EvgA	Philadelphia 1	Regulated protein EmrY; GI:52841684; antibiotic resistance

Family	Identification	Stimulus	Sensor^*	Regulator^*	Strain	Function
(B) Listeria monocytogenes³
NarL	Iterative sequence searches with cut off e-30 in E. coli ETEC H10407		Q4EKW8_LISMO which is potential similar to EvgS	GI: 16804553 which is potential similar to EvgA	EGD-e	Antibiotic resistance
OmpR	Iterative sequence searches with cut off e-30 in B. subtilis; the sequences of these proteins where used to search in the Listeria genome	Stress	GI: 16804620 GI: 16803101 which is potential similar to CSSS_BACSU	GI: 16804621 which is potential similar to CSSR_BACSU	EGD-e	Regulated protein HtrA; serine protease
OmpR	PSI-Blast search in B. subtilis with cut off e-60; the sequences of these sensors where used to search in the Listeria genome	Mg starvation	GI: 16803061 which is potential similar to ZP_03239257	PhoP GI: 16804539 Known/annotated by PMID 11679669	EGD-e	Virulence, antimicrobial peptide resistance

New annotated features (interactions or part of TCS) apparent from sequence searches with various available TCS sequences and domains in the genome sequence (Genbank acc. No.: AE017354, Chien M, et al, 2004). Regulated proteins are given as well as homologous standard TCS. Predicted changes (mainly by their operon context) in their function for L. pneumophila are indicated on the right. The right-most column summarizes which aspect of the TCS is reported here new.

Listed are well characterized homologs from other organisms which have the same function within the same family.

Table contains additional features (interactions or parts of TCS) extending what is already known in KEGG or annotated in Genbank (Acc. No.: AE017262) or Listilist (http://genolist.pasteur.fr/ListiList/). On the left the TCS family is given. Starting from B. subtilis TCS sequences we searched for missing sensor and regulator proteins. The right-most column summarizes which aspect of the TCS is reported here new.

Some of the identified proteins are already known to be involved in TCS, but their connection to a specific family is unknown. The now identified TCS partners are critical for the functioning of these TCS in Legionella and Listeria. They justify further analysis and confirmation by direct experiments.

Extensive TCS Domain Shuffling

Further divergence may lead to the appearance of typical TCS domains in a new context. To detect such domain shuffling events, we applied PROSITE predictions, further sequence analyses, and literature mining. All examples investigated scrutinized proteins with either a HisKA domain or a RR domain, focusing on rather diverged cases. Four prokaryotic and even three eukaryotic examples are shown with far diverged proteins including new functional properties (Table 5). Two biotechnologically interesting examples are described in more detail:

Table 5.

Natural examples for domain shuffling in divergent TCS.¹

Domain	Protein	Context	Function
HisKin	Pyruvate dehydrogenase kinase	Glucose metabolism In S. cerevisiae	Inhibits the mitochondrial pyruvate dehydrogenase complex by phosphorylation of the E1 alpha subunit, thus contributing to the regulation of glucose metabolism
HisKin	Adenylate cyclase	Sporulation in some organisms	Stringent response, protein kinases are activated (PKAs)
HisKin	BCKD-kinase	Valine, leucine and isoleucine catabolic pathways in Mouse	Catalyzes the phosphorylation and inactivation of the branched-chain alpha-ketoacid dehydrogenase complex, the key regulatory enzyme of the valine, leucine and isoleucine catabolic pathways. Key enzyme that regulate the activity state of the BCKD complex
HisKin	Phytochrome A	Regulatory photoreceptor In Deinococcus	Regulatory photoreceptor which exists in two forms that are reversibly interconvertible by light: the Pr form that absorbs maximally in the red region of the spectrum and the Pfr form that absorbs maximally in the far-red region. Photoconversion of Pr to Pfr induces an array of morphogenic responses, whereas reconversion of Pfr to Pr cancels the induction of those responses. Pfr controls the expression of a number of nuclear genes including those encoding the small subunit of ribulose-bisphosphate carboxylase, chlorophyll A/B binding protein, protochlorophyllide reductase, rRNA, etc. It also controls the expression of its own gene(s) in a negative feedback fashion
Response Reg	Adventurous-gliding motility protein Z	Chemosensory system in Myxococcus	Required for adventurous-gliding motility, in response to environmental signals sensed by the frz chemosensory system. Forms ordered clusters that span the cell length and that remain stationary relative to the surface across which the cells move, serving as anchor points that allow the bacterium to move forward. Clusters disassemble at the lagging cell pol
Response Reg	Adenylate cyclase	Sporulation in some organisms	Stringent response, response regulators are activated
Response Reg	Serine/threonine-protein kinase ppk18	Schizosaccharomyces pombe	Serine/threonine-protein kinase ppk18 plays pivotal roles in cell proliferation and cell growth in response to nutrient status

The table shows natural domain shuffling events where sensor domains and response regulator domains appear in different new contexts. In the three prokaryotic as well as in the eukaryotic examples only domains can be recognized but new functions are adopted.

Shuffled sensor domain: The branched-chain alpha-ketoacid dehydrogenase complex (BCKD) in mice was considered as a quite diverged example.³¹ BCKD possesses a characteristic nucleotide-binding domain and a four-helix bundle domain similar to a TCS sensor. Binding of ATP induced disorder to ordered transitions in a loop region at the nucleotide-binding site. These structural changes led to the formation of a quadruple aromatic stack in the interface between the nucleotide-binding domain and the four-helix bundle domain, finally resulting in a movement of the top portion of two helices and to a modified enzyme activity. Our analysis indicates a diverged TCS with HisKA domain but without an RR domain and with new cellular response, namely to change enzymatic activities. Until now only the structural similarity to the Bergerat fold family has been demonstrated by inhibition experiments using radicicol as an autophosphorylation inhibitor for histidine kinases³² but there is no in vivo evidence of BCKDHK in a signaling event of a two-component histidine kinase. In contrast, two component systems in plants such as maize seem to be genome-wide spread³³ (see Supplementary material, Table S3).

Shuffled regulator domain: If further signaling is mediated by transcription, the trans-activation domain involves a wide-range of different DNA binding motifs. Such domains appear also in new enzyme contexts or activities. One identified eukaryotic example for natural domain shuffling of a RR domain in a new protein context was the predicted serine/threonine protein kinase ppk18 in the “fission yeast” Schizosaccharomyces pombe. Ppk18 plays pivotal roles in cell proliferation and cell growth in response to nutrient status.³⁴ A RR domain is located C-terminal in the protein (well conserved PROSITE signature PS50110) and is target of rapamycin (TOR). TOR itself activates ppk18 by phosphorylation but does not contain the typical HisKA domain. Consequently eukaryotes can have similar operational interactions as typical prokaryotic TCS, in particular in yeast and in plants. Our computational analysis of this protein function according to the available data suggests a rather similar operation according to its interactions, in particular by its involvement of a RR domain (see Supplementary material Table S4).

High divergence is easily achieved by new molecular partners of the domain that is known from prokaryotic TCS, as shown in these eukaryotic examples. Nevertheless, there is a certain level of convergent evolution observable in the examples, regarding their regulatory function and effect.

A Putative new Family of TCS in Mycoplasma Pneumoniae

Modification in TCS can even go so far that both TCS partners are quite diverged and it is difficult to identify them as TCS. Combining bioinformatical sequence and structure analyses, there is a chance to identify such (quite) degenerated TCS in prokaryotes. A putative new TCS family encoded in the M. pneumoniae genome, so far described as TCS-free, is suggested here. In particular, MPN013 and MPN014 could form a rather diverged sensor and regulator pair in M. pneumoniae.

Putative Sensor: These proteins could not be identified with simple sequence searches, since direct sequence similarity searches did not yield significant hits.³⁵ After at least seven PSI-BLAST iterations, the collected alignment included described TCS sensors in addition to the UPF family to which MPN013 was previously known to belong to, the non-annotated protein family DUF16 exclusively found in Mycoplasma.

To verify MPN013 as a potential sensor protein structure, analysis with respect to the primary, secondary and tertiary structure and several alignments were established: A re-check of the prediction via PSI-BLAST analysis identified M. pneumoniae protein MPN013 as a potential sensor protein; its primary structure sequence was similar to NarX in Psychrobacter arcticum (PSI-BLAST e-value 6 × 10^–13 after 5 iterations).

Afterwards we analyzed the secondary and tertiary structure of MPN013. The homology model applying SWISS-MODEL yielded the template 2ba2A (crystal structure of MPN010, another member of the DUF16 family) for MPN013. 2ba2A is a four alpha helix-bundle corresponding to the HisKA domain of a sensor protein. The MPN013 sequence extended the C-terminus and contained an additional second domain.

MPN013 starts as all sensor proteins with an unspecified domain (1–120) probably representing a signal-perception domain. Following this, we found an alpha-helical structure (130–165). This outcome was supported by secondary structure prediction (PredictProtein³⁶ and Predator³⁷) and was in line with the homology model. The last part was a mixture composed of helices, sheets, and loops. Secondary structure predictions were not completely identical. However, secondary structure alignments with the software SSEA³⁸ showed a similarity to alpha/beta sandwiches (z-score 2.28; normalized score of 54.5).

To further verify the features required for a TCS, it is demonstrated that MPN013 can be aligned in primary and secondary structure with NarX from Psychrobacter arcticus (Fig. 1). The corresponding E. coli NarX sensor was added for comparison purposes. The structure (Fig. 1; top panel) was given according to the structure template 2c2a (HisKA853 of Thermotoga maritima) from PDB, which should be valid for NarX as well as HisKA in general. Conserved residues for TCS are highlighted (yellow boxes) and the homology model for MPN013 (PDB entry 2ba2_A for MPN010) is shown in green.

Figure 1.

Divergent TCS sensor in M. pneumoniae.

Four conserved amino acid boxes were analyzed next: The first box (Fig. 1, yellow) represents the strongly conserved histidine environment, which binds phosphor for the transfer to the RR. This site is situated in the four-helix bundle. The comparison between the E. coli, P. arcticus and MPN013 sequences already made clear that this site was variable with respect to its position and environment. The secondary structure comparison revealed that the histidine has to be situated at the end of an alpha helix. However, the further environment of the histidine residue in MPN013 is diverged. A second box could mainly be found in E. coli and was therefore rarely conserved. The third and fourth conserved boxes comprise the ATP-binding site (Fig. 1). Those two sites are more highly conserved, as demonstrated by the conserved PFAM based pattern Glu/Asn-X-Ile/Leu-X-Asn/Ala-X and Asp/Glu-X-Gly/Ser-X-Gly/Glu-Ile. This secondary structure comparison showed that the structure might be even more flexible than initially assumed.

Furthermore, regarding a tentative ATPase activity predicted by the sequence analysis, close comparisons with the HisKA subclasses as described by Grebe³ showed that the MPN013 histidine environment was new (see Supplementary material). It was clearly different than what has been already described; however, the closest relative was a mixture of the HK3b and HK11 environment. An autophosphorylation region was identified and contained the conserved amino acids histidine and arginine just as in the HPK11 family. Within the ATP binding site, the MPN13 motif contained the conserved glycine as observed in the HK3b motif.

Consequently, even when the overall structure of the putative sensor did not match perfectly, conservation was apparent in structure as well as with respect to key residues. However, other parts of the sequence vary more than standard TCS, which explains why this was not detected by sequence comparison before. Furthermore, though key conserved structure and sequence features point to a diverged TCS in M. pneumoniae, its divergence may lead also to diverged function (see examples above).

Putative Response Regulator: Additional predictive evidence for this diverged TCS became available by searching for a corresponding regulator protein:

This search was initiated by an organism specific iterative BLAST with NarL from P. arcticus. NarL is the corresponding RR to the HisKA of NarX in P. arcticus, which was the most similar HisKA to MPN013. Consequently, on a primary structure level, NarL is similar to the Mycoplasma protein MPN014. This result was further supported by gene neighborhood considerations,^39,40 which are also expected for TCS as sensors and regulator genes are often situated directly next to each other in different genomes.⁴¹

In order to test this hypothesis on a secondary structure level, a homology model for MPN014 was calculated. MPN014 was not only located next to MPN013, but the secondary structure sequence alignment showed that it was homologous to NarL from P. arcticus and the general structure template 1p2f (TM_0126 of T. maritima) for RR in TCS. It has already been noted that MPN014 contains a topoisomerase/primase domain (“toprim” domain) including a nucleotidyl transferase or hydrolase function according to PFAM.⁴²

For a detailed structure sequence comparison the secondary structure is provided (according to the PDB file: 1rnl) and the sequence of NarL in E. coli. A comparison between the MPN014 sequence and NarL in P. arcticus is shown in Figure 2. The sequence comparison displayed good similarity between NarL in P. arcticus, NarL in E. coli and MPN014 in M. pneumoniae (conserved residues are highlighted).

Figure 2.

Diverged TCS regulator in M. pneumoniae.

The phosphor binding alpha/beta 3-layer sandwich was apparent (red letters in the NarL sequence) as well as the DNA-binding alpha-orthogonal bundle (blue letters). The alignment was good enough to enable identification of all conserved regions (colored boxes). The second part of MPN014 did not display an HTH motif, but the similarity of MPN014 to the topoisomerase/primase domain and its particular relatedness to DNA-primase related proteins (protein cluster CLSK542094) supported the idea that the topoisomerase/primase domain may bind to DNA (just) as many regulators in TCS do.

Based on the patterns, which were only partially conserved, it became apparent that this element was probably a quite diverged RR. (i) The sequence contained only weak hydrophobic residues in the region corresponding to beta-strand-1. (ii) Immediately following, it contained the conserved pair of acidic residues involved in binding the metal ion for phosphorylation reactions, it was the combination glutamic acid plus glutamine as second amino acid. (iii) Hydrophobic residues corresponding to beta-strand-3 and the immediately following absolutely conserved aspartic acid that is the site of phosphorylation were observed, as well as some hydrophobic residues corresponding to beta-strand-4, but the sequence did not contain the immediately following and highly conserved serine/threonine that binds to the phosphoryl group and mediates conformational change. This was replaced by an asparagine.

Nevertheless, based on the above results, we see that structure and sequence features are sufficiently conserved to suggest that the pair MPN013/MPN014 could be a rather diverged TCS. Furthermore, its diverged functionality is at least used by M. pneumoniae (expression data see below).

The entire DUF16 family is M. pneumoniae specific, but contains a number of potential sensor proteins (MPN139, MPN138, MPN137, MPN130, MPN127, MPN104, MPN038, MPN013, MPN010, MPN655, MPN524, MPN504, MPN501, MPN410, MPN368, MPN344, MPN287, MPN283, MPN204), and the encoded two M. pneumoniae proteins related to the DNA-primase family could act as potential regulator proteins (MPN014, MPN353). In M. genitalium we have only identified a homologous counterpart for the regulator. However, the multiple copies found are another indicator that the protein family is at least useful and kept in M. pneumoniae (and this although in general there is genome reduction in parasite genomes). This is further confirmed by EST expression data for MPN013 and preliminary expression data for MPN014 (see http://coot.embl.de/Annot/MP/).

Rather diverged TCSs do thus occur in various and quite different instances. They are involved in changing of partners, but also in changing of different residues, cooperative changes can even lead to the adoption of new functions. This is difficult to design. For such experiments, complex, correlated changes in the overall protein structure and function revealed eg, by statistical coupling analysis⁴³ have to be taken into account. This method has been shown to work well for the redesign of proteins such as Hsp70 and of allosteric changes.⁴⁴ A key requirement is a sufficient statistical sampling, ie, large alignments to study sequence variation in the protein family of interest. Furthermore, extensive structural information is required.⁴⁵ Combining both aspects allows defining specific and important regions within the protein where mutations influence each other. However, for large protein families these regions predict quite well coordinated or cooperative changes in proteins.⁴³ This can then be exploited for protein design, for instance the design of protein chimeras while preserving functionality of critical domains.⁴⁶ We are confident that this approach will also work for two-component system design and maybe even in a diverged TCS. At least a sufficient number of TCS sequences, required to get the statistical power for reliable predictions, are available as well as known structures to define structural sectors of conserved and cooperatively changing regions in two-component systems for sensor and regulator proteins.

TCS Rewiring by Additional Components

TCS can furthermore be modified by additional components, so-called connectors. These modify or enhance signal transmission, increase the binding to regulator proteins, or act as additional response modifying proteins within a TCS.^47,48 Such interacting proteins enhance evolution and adaptation of TCS further and are also an interesting option to modify their rewiring. In general, the connector is present in addition to the sensor and regulator protein.

Connector family SafA, Sensor-associating factor A: Eguchi et al describe the SafA as a small membrane protein in connection with TCS, to be found in the EvgS/EvgA and PhoQ/PhoP TCS in E. coli.⁴⁸ The expression of EmrY is induced by activated EvgA. The activated EvgS/EvgA system activates the PhoQ sensor protein of the PhoQ/PhoP. SafA thus supports the interaction between the two TCS.

With the help of organism specific alignments, sequence and gene context analysis, it could be confirmed that SafA does not only occur in E. coli but also in Shigella and Salmonella. All identified potential SafA proteins are unknown or hypothetical proteins and STRING predicts interactions to either EvgS or proteins with similar functions (see Table 6A and Supplementary material, Table S5).

Table 6A.

SafA containing proteins (potential connector proteins).

Protein	Description	Organism	STRING score
NP 310132	Hypothetical protein ECs2105	E. coli 0157	0,9 to EvgS
ZP 02799272	Conserved hypothetical protein	E. coli 0157	0,9 to EvgS
YP 540723	Hypothetical protein C1714	E. coli UTI89	0,9 to EvgS
NP 837211	Hypothetical protein S1655	S. flexneri	0,76 to EvgS
NP_458304	Putative phosphodiesterase	S. typhi	0,65 to ygiM (put. signal transduction protein)
NP_462516	Putative phosphodiesterase	S. typhimurium	0,6l to lon

Notes: ¹SafA similar proteins can be found in several organisms. This table lists the proteins of the family, a short description and the detected organism as well as the predicted probability to interact with TCS as a connector according to the protein interaction database STRING.

EAL and GGDEF domains: EAL domains have diguanylate phosphodiesterase activity and are found in diverse bacterial signaling proteins.^49,50 If they interact with a TCS, they may influence it. This is documented for GGDEF domain containing regulators in many prokaryotic signal connected proteins, as the GGDEF domain has an enzymatic activity for synthesis of the second messenger molecule cyclic-di-GMP.⁵¹ We looked for new examples applying gene context methods, literature mining, and the STRING database.³⁹ Table 6B displays the predicted interaction partners for several proteins containing an EAL-domain. Indeed, EAL proteins were often predicted to interact with known regulator proteins or had partners with DNA-binding domains (as most of the known RR in TCS). Alternatively they interacted with proteins containing the GGDEF domain. EAL and GGDEF domains can frequently be found in response regulator domain containing proteins.

Table 6B.

Putative connector proteins containing an EAL-domain and their interaction partners.

Protein with EAL-Domain	Interaction partner¹
>Q21G90_SACD2 Diguanylate cyclase/phosphodiesterase Saccharophagus degradans (full protein with two domains)	Sde_3649 GGDEF family protein Sde_2537 hypothetical protein Sde_3232 hypothetical protein Sde_3313 putative diguanylate phosphodiesterase Sde_1079 putative diguanylate phosphodiesterase Sde_3648 Formamidopyrimidine-DNA glycolase Sde_0078 GGDEF domain protein Sde_3427 Putative diguanylate cyclase (GGDEF) Sde_3693 res_reg receiver domain protein (CheY-like) Sde_1063 GGDEF family protein
>A6Q1G4_NITSB Signal transduction response regulator nitratiruptor sp.	dgkA Diacylglycerol kinase NIS_0211 Putative uncharacterized protein dnaG DNA primase DnaG NIS_0567 Putative uncharacterized protein NIS_0004 Putative uncharacterized protein NIS_1647 Putative uncharacterized protein NIS_1732 Putative uncharacterized protein NIS_0150 Putative uncharacterized protein NIS_0136 Putative uncharacterized protein
>A1AD34_ECOK1 Putative uncharacterized protein rtn E. coli O1	yedQ hypothetical protein yaiC Putative uncharacterized protein ydeH Putative uncharacterized protein ydeH yeaP Putative uncharacterized protein yeaP ycdT predicted diguanylate cyclase yfiN Putative diguanylate cyclase yneF Putative uncharacterized protein yneF yeaI Putative uncharacterized protein yeaI yejA Putative uncharacterized protein yejA yejB Predicted oligopeptide transporter subunit

Protein with EAL-Domain

Interaction partner¹

>Q21G90_SACD2 Diguanylate cyclase/phosphodiesterase Saccharophagus degradans (full protein with two domains)

Sde_3649 GGDEF family protein

Sde_2537 hypothetical protein

Sde_3232 hypothetical protein

Sde_3313 putative diguanylate phosphodiesterase

Sde_1079 putative diguanylate phosphodiesterase

Sde_3648 Formamidopyrimidine-DNA glycolase

Sde_0078 GGDEF domain protein

Sde_3427 Putative diguanylate cyclase (GGDEF)

Sde_3693 res_reg receiver domain protein (CheY-like)

Sde_1063 GGDEF family protein

>A6Q1G4_NITSB Signal transduction response regulator nitratiruptor sp.

dgkA Diacylglycerol kinase

NIS_0211 Putative uncharacterized protein

dnaG DNA primase DnaG

NIS_0567 Putative uncharacterized protein

NIS_0004 Putative uncharacterized protein

NIS_1647 Putative uncharacterized protein

NIS_1732 Putative uncharacterized protein

NIS_0150 Putative uncharacterized protein

NIS_0136 Putative uncharacterized protein

>A1AD34_ECOK1 Putative uncharacterized protein rtn E. coli O1

yedQ hypothetical protein

yaiC Putative uncharacterized protein

ydeH Putative uncharacterized protein ydeH

yeaP Putative uncharacterized protein yeaP

ycdT predicted diguanylate cyclase

yfiN Putative diguanylate cyclase

yneF Putative uncharacterized protein yneF

yeaI Putative uncharacterized protein yeaI

yejA Putative uncharacterized protein yejA

yejB Predicted oligopeptide transporter subunit

Interaction predictions included sequence- and structure analysis and data from public interaction databases such as STRING database.

For protein engineering or synthetic biology experiments, connectors could be used to specifically modify TCS or connect two TCS. The analyzed examples are known and shown to work in several organisms, but the connector may also be tried on TCS from other species by just over-expressing these together. Evolution uses a large pool of potential interacting proteins.^52,53 The same connectors are used only on comparatively short distances: In prokaryotes in particular, there is a counter selection, as wrong interactions lead to wrong regulation. However, as in eukaryotic evolution, where new protein interactions compensate for random drift in functional complexes,⁵⁴ new protein design may of course adapt connectors for broader use. For instance, the SafA connector protein family efficiently bridges two different TCS systems. This can be attractive for new designs in synthetic biology such as synthetic circuits.⁵⁵

TCS can also occur in eukaryotes such as plants, for instance in maize⁵⁶ and in Arabidopsis, where systems showing activities similar to TCS are found.^57,81 These could in principle be quite diverged eukaryotic TCS, similar to the Mycoplasma example, or fairly close to standard TCS. Supplementary material, Table S6 shows both is true to some extent. Thus, in maize 25 proteins similar to HisKA proteins could be found, but only 20 of them are known to be involved in a plant TCS; for Arabidopsis the ratio is such that from 61 proteins similar to HisKA proteins there are only 16 proteins known and annotated to be participating in a TCS. For response regulators the differences between identified domains and annotated response regulators are even larger, indicating more divergence. However, this analysis also shows that a considerable number of these TCS are surprisingly well conserved in their domain architecture, and sometimes even in their motifs and signatures. At least these comparatively conserved eukaryotic TCS can be tackled with the strategies and bioinformatics data given here based largely on prokaryotic data. For more diverged eukaryotic TCS again careful and complex calculations as outlined above are the only potential strategy. However, the number of eukaryotic TCS sequences available is comparatively low and hence the statistical power of sequence-structure correlation algorithms will not be strong.

The various examples and three modification strategies applied also raise the question about a quantitative estimate of TCS divergence in general. To answer this question we first give an overview and a sequence tree on the species distribution of HisKa and response regulator domains in general (see Supplementary material, Figure S1). Furthermore, we made a detailed quantitative assessment of TCS divergence regarding the HisKA site (see Supplementary material, Figure S2) and performed various analyses about the different context in which TCS domains can occur. Those analyses included the frequency of different domain-family occurrences as well as specific domain combinations (Supplementary material Table S1 gives a detailed example). However, to get a more general overview, we give in Table S6 also an estimate on the occurrence of key TCS domains versus the number of annotated and known TCS in several bacterial genomes plus the recent data on maize as well as Arabidopsis plant genomes. As the data show, the number of domains is in all cases clearly higher than the number of annotated TCS. These new domain contexts for key marker domains of TCS give an upper bound on the number of highly diverged TCS for these different species, in reality the actual figure is lower (depending on how strict the function of the TCS as a sensor plus phosphorelay system is defined).

Conclusions

The plasticity of TCS is of high interest. It has been studied since a long time and documented in various databases.^4–6 The aim of this study is to identify evolutionary modification scenarios and analyze their use for engineering TCS. Extensive genome comparisons, sequence, and structure analysis of natural instances revealed three general rewiring scenarios modifying TCS: (i) exchanges of few amino acid residues or (ii) of whole domains,⁵⁴ as well as (iii) applying connector proteins.^47,48,50 For engineering, the accurate and specific binding sites, promoter motifs, and stimulus recognition motifs described should work best. In contrast, the identified diverged TCS, including potential eukaryotic variations, partners for Listeria and Legionella TCS, and a highly diverged TCS family in Mycoplasma show that extensive changes in TCS function are possible, but involve complex cooperative changes, which are not easily predicted or designed. Of the connectors analyzed, the SafA family may be attractive for synthetic circuit design,⁵⁵ as they efficiently bridge TCS systems.

Materials and Methods

The identification and analysis of individual TCS components was performed in separate steps and with specific methods for sequence alignment, for the investigation of domain and structural features, for their gene context, as well as for pathway aspects.

Methods for Sequence Analysis

Large-scale screens for diverged TCS were conducted on different databases (PFAM,²¹ the protein database Uniprot²²) and we examined further repositories such as MIST2,⁴ SENTRA⁶ and P2CS.⁷ Furthermore, KEGG⁵⁸ databases as well as specific sequence searches were used to collect all known and available TCS in standard model organisms. Iterative sequence searches and domain analyses were conducted as described previously.⁴⁰ We included the following model organism and strains: E. coli genome sequences E. coli 536,⁵⁹ E. coli CFT073,⁶⁰ E. coli K-12 W3110,⁶¹ E. coli O157:H7 EDL933,⁶² E. coli K-12 MG1655,⁶³ E. coli O157:H7 Sakai,⁶⁴ E. coli UTI89⁶⁵ as well as Shigella 2a str. 2457T and Salmonella typhi strains CT18⁶⁶/Ty2⁶⁷ ATCC 700931; S. typhimurium LT2,⁶⁸ B. subtilis (strain 168), S. aureus (COL),⁶⁹ L. pneumophila (Philadelphia 1),⁷⁰ L. monocytogenes (EGD-e⁷¹/F2365⁷²) and M. pneumoniae (M129)⁷³ as well as all sequences and organisms available from PFAM. Data on promotor interactions were retrieved from the ProDoric database,²⁶ which comprises information from exhaustive literature analyses, computational sequence predictions, and DBTBS,²⁷ a reference database of published transcriptional regulation events on B. subtilis. This source of information was complemented by studies performed in TractorDB,²⁸ which contains a collection of computationally predicted transcription factor binding sites in gamma-proteobacterial genomes.

Domains were tested and verified by comparison with known domain families, including data from databases such as SMART,⁷⁴ PFAM,²¹ and Uniprot.²² TCS components of various genomes were extensively compared in their sequence composition, intrinsic properties, as well as regarding amino acid conservation and variation.

To calculate consensus sequences, the COnsensus Biasing By Locally Embedding Residues method was applied (COBBLER).⁷⁵ A single sequence was selected from a set of blocks and enriched by replacing the conserved regions with consensus residues derived from the blocks. Comprehensive tests demonstrated that these embedded consensus residues improved performance in readily available sequence query searching programs. Further sequence analysis programs included BLAST,³⁵ position-specific BLAST (PSI-BLAST), and ClustalW.⁷⁶ The visualization of sequence conservation was achieved by using sequence logos, which show the degree of amino acid conservation by different letter sizes or uppercase and lowercase letters.

The DNA binding sites in related genomes were identified with perl-scripts, which employ the Fuzznuc program of the EMBOSS package⁷⁷ as a method for pattern searching. A binding site was assigned as soon as it matched the pattern. Screening runs allowing mismatches were also conducted and results were manually annotated, eg, whether the pattern was long enough to tolerate mismatches or whether symmetry-breaking mismatches were not tolerated. The described approach enabled the identification of conserved binding sites with mismatches in related E. coli genomes starting from E. coli strain K-12.

Methods for structural analysis

Based on results from PFAM and SMART, a search for essential functional domains in TCS was initiated. Moreover, an analysis of their cellular location within the cell using annotation from literature and public databases was performed.

To determine domain boundaries, we included functional and structural information. The transfer of domain features to non-annotated proteins was achieved with the help of search patterns (according to PROSITE and PFAM patterns).

After domain analyses individual domain results were assembled to a complete protein structure. Tertiary and secondary structure information was added from PDBSum, AnDOM, SCOP⁷⁸ and CATH.⁷⁹ Homology models were created using SWISS-MODEL.⁸⁰ Further analyses included secondary structure, binding features as well as function-specific motifs and key conserved structural residues. The structure of TCS was furthermore analyzed in more detail starting from available PDB structures.⁸¹ We started with well-annotated domains in sensor and regulator proteins and compared these to less well-characterized sequences. Furthermore, detected structural or sequential characteristics in all analyzed proteins were transferred to proteins without annotations.

Structure predictions were performed by PredictProtein,³⁶ and Predator.³⁷ Secondary structure alignments were derived with the Server for Protein Secondary Structure Alignment (SSEA).³⁸ Predictions for protein interactions exploited the STRING tool,³⁹ structure analyses, and literature mining.

Author Contributions

BK implemented the process concept and alignments. BK, TF, and JB programmed perl-scripts and calculated all data. JB, RG, FF, TD and BK analyzed data and participated in writing the MS. TD led and guided the study and supervised BK, TF and FF. All authors approved the final version of the MS.

Funding

We thank German Research Foundation (TR 34/A8 in particular as well as TR 34/Z1; Da 208/13-2, SFB 479) for support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests

The authors declare there are no competing interests.

Disclosures and Ethics

As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.

Footnotes

Acknowledgments

We thank German Research Foundation (SFB 479, Da 208/13-1, TR 34/A5/Z1) for support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Dr. Ulrike Rapp-Galmiche for stylistic and language corrections.

Supplementary Data

Supplementary material contains sequence data and alignments as well as the analysed HisKA families.

References

Stock

A.M.

, Robinson

V.L.

, Goudreau

PN.

Two-component signal transduction.

Annu Rev Biochem. 2000; 69: 183–215.

Yamada

, Shiro

Structural basis of the signal transduction in the two-component system.

Adv Exp Med Biol. 2008; 631: 22–39.

Grebe

T.W.

, Stock

JB.

The histidine protein kinase superfamily.

Adv Microb Physiol. 1999; 41: 139–227.

Ulrich

L.E.

, Zhulin

IB.

The MiST2 database: a comprehensive genomics resource on microbial signal transduction.

Nucleic Acids Res. 2010; 38(Database issue): D401–7.

Galperin

MY.

A census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial IQ, extroverts and introverts.

BMC Microbiol. 14, 2005; 5: 35.

D'Souza

, Glass

E.M.

, Syed

M.H.

Sentra: a database of signal transduction proteins for comparative genome analysis.

Nucleic Acids Res. 2007; 35(Database issue): D271–3.

Barakat

, Ortet

, Jourlin-Castelli

, Ansaldi

, Méjean

, Whitworth

DE.

P2CS: a two-component system resource for prokaryotic signal transduction research.

BMC Genomics. 2009; 15; 10: 315.

Salis

, Kaznessis

YN.

Computer-aided design of modular protein devices: Boolean AND gene activation.

Phys Biol. 2006: 295–310.

Robinson

V.L.

, Buckler

D.R.

, Stock

AM.

A tale of two components: a novel kinase and a regulatory switch.

Nat Struc Biol. 2000; 7: 626–33.

10.

Drubin

D.A.

, Way

J.C.

, Silver

PA.

Designing biological systems.

Genes Dev. 2007; 21: 242–54.

11.

Pleiss

The promise of synthetic biology.

Appl Microbiol Biotech. 2006; 73: 735–9.

12.

Levskaya

, Chevalier

A.A.

, Tabor

J.J.

, Simpson

Z.B.

, Lavery

L.A.

Synthetic biology: engineering Escherichia coli to see light.

Nature. 2005; 438: 441–2.

13.

Ninfa

AJ.

Using two-component systems and other bacterial regulatory factors for the fabrication of synthetic genetic devices.

Methods Enzymol. 2007; 422: 488–512.

14.

Kohanski

M.A.

, Collins

JJ.

Rewiring bacteria, two components at a time.

Cell. 2008; 133: 947–8.

15.

Néron

, Ménager

, Maufrais

Mobyle: a new full web bioinformatics framework.

Bioinformatics. 15, 2009; 25(22): 3005–11.

16.

Williams

R.H.

, Whitworth

DE.

The genetic organisation of prokaryotic two-component system signalling pathways.

BMC Genomics. Dec 20, 2010; 11: 720.

17.

Skerker

J.M.

, Perchuk

B.S.

, Siryaporn

Rewiring the specificity of two-component signal transduction systems.

Cell. 2008; 13: 1043–54.

18.

Ninfa

AJ.

Use of two-component signal transduction systems in the construction of synthetic genetic networks.

Curr Opin Microbiol. 2010; 13: 240–5.

19.

Morey

K.J.

, Antunes

M.S.

, Albrecht

K.D.

Developing a synthetic signal transduction system in plants.

Methods Enzymol. 2011; 497: 581–602.

20.

Shah

N.A.

, Sarkar

CA.

Robust network topologies for generating switch-like cellular responses.

PLoS Comput Biol. Jun 2011; 7(6).

21.

Finn

R.D.

, Mistry

, Schuster-Bockler

Pfam: clans, web tools and services.

Nucleic Acids Res. 2006; 34: 247–51.

22.

Apweiler

, Bairoch

, Wu

C.H.

, Barker

W.C.

, Boeckmann

UniProt: the Universal Protein knowledgebase.

Nucleic Acids Res. 2004; 32: 115–9.

23.

Sevvana

, Vijayan

, Zweckstetter

A ligand-induced switch in the periplasmic domain of sensor histidine kinase CitA.

J Mol Biol. 2008; 377: 512–23.

24.

Cheung

, Hendrickson

WA.

Crystal Structures of C4-Dicarboxylate Ligand Complexes with Sensor Domains of Histidine Kinases DcuS and DctB.

J Biol Chem. 2008; 283: 30256–65.

25.

Cheung

, Hendrickson

WA.

Structural analysis of ligand stimulation of the histidine kinase NarX.

Structure. 2009; 17: 190–201.

26.

Munch

, Hiller

, Barg

PRODORIC: prokaryotic database of gene regulation.

Nucleic Acids Res. 2003; 31: 266–9.

27.

Makita

, Nakao

, Ogasawara

, Nakai

DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics.

Nucleic Acids Res. 2004; 32: 75–7.

28.

Perez

A.G.

, Angarica

V.E.

, Vasconcelos

A.T.

, Collado-Vides

Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes.

Nucleic Acids Res. 2007; 35: D132–6.

29.

Huang

, Tsui

, Freundlich

Positive and negative control of ompB transcription in Escherichia coli by cyclic AMP and the cyclic AMP receptor protein.

J Bacteriol. 1992; 174: 664–70.

30.

Jubelin

, Vianney

, Beloin

CpxR/OmpR interplay regulates curli gene expression in response to osmolarity in Escherichia coli.

J Bacteriol. 2005; 187: 2038–49.

31.

Huang

Y.S.

, Chuang

DT.

Regulation of branched-chain alpha-keto acid dehydrogenase kinase gene expression by glucocorticoids in hepatoma cells and rat liver.

Methods Enzymol. 2000; 324: 498–511.

32.

Besant

P.G.

, Attwood

PV.

Mammalian histidine kinases.

Biochimica et Biophysica Acta. 2005; 1754: 281–90.

33.

Chu

Z.X.

, Ma

, Lin

Y.X.

Genome-wide identification, classification, and analysis of two-component signal system genes in maize.

Genet Mol Res. 2011; 10(4): 3316–30.

34.

Nakashima

, Sato

, Tamanoi

Fission yeast TORC1 regulates phosphorylation of ribosomal S6 proteins in response to nutrients and its activity is inhibited by rapamycin.

J Cell Sci. 2010; 123(pt 5): 777–86.

35.

Altschul

S.F.

, Madden

T.L.

, Schaffer

A.A.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Nucleic Acids Res. 1997; 25: 3389–402.

36.

Rost

, Yachdav

, Liu

The PredictProtein server.

Nucleic Acids Res. 2004; 32: W321–6.

37.

Pollastri

, McLysaght

, Porter

A new, accurate server for protein secondary structure prediction.

Bioinformatics. 2005; 21: 1719–20.

38.

Fontana

, Bindewald

, Toppo

, Velasco

, Valle

, Tosatto

SCE.

The SSEA Server for Protein Secondary Structure Alignment.

Bioinformatics. 2005; 21: 393–5.

39.

Jensen

L.J.

, Kuhn

, Stark

STRING 8—a global view on proteins and their functional interactions in 630 organisms.

Nucleic Acids Res. 2009; 37(Database issue): D412–6.

40.

Gaudermann

, Vogl

, Zientz

Analysis of and function predictions for previously conserved hypothetical or putative proteins in Blochmannia floridanus.

BMC Microbiol. 2006; 6: 1.

41.

Dandekar

, Snel

, Huynen

, Bork

Conservation of gene order: a fingerprint of proteins that physically interact.

Trends Biochem Sci. 1998; 23: 324–8.

42.

Aravind

, Leipe

D.D.

, Koonin

EV.

Toprim—a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins.

Nucleic Acids Res. 15, 1998; 26(18): 4205–13.

43.

Halabi

, Rivoire

, Leibler

, Ranganathan

Protein Sectors: Evolutionary Units of Three-Dimensional Structure.

Cell. 2009; 138: 774–86.

44.

Smock

R.G.

, Rivoire

, Russ

W.P.

An interdomain sector mediating allostery in Hsp70 molecular chaperones.

Mol Syst Biol. Sep 21, 2010; 6: 414.

45.

Lee

, Natarajan

, Nashine

V.C.

Surface sites for engineering allosteric control in proteins.

Science. Oct 17, 2008; 322(5900): 438–42.

46.

Poole

A.M.

, Ranganathan

Knowledge-based potentials in protein design.

Curr Opin Struct Biol. Aug 2006; 16(4): 508–13. Review. PubMed.

47.

Attila

, Ueda

, Wood

TK.

5-Fluorouracil reduces biofilm formation in Escherichia coli K-12 through global regulator AriR as an antivirulence compound.

Appl Microbiol Biotechnol. Mar 2009; 82(3): 525–33.

48.

Eguchi

, Ishii

, Hata

, Utsumi

Regulation of acid resistance by connectors of two-component signal transduction systems in Escherichia coli.

J Bacteriol. Mar 2011; 193(5): 1222–8.

49.

Tchigvintsev

, Xu

, Singer

, Chang

Structural insight into the mechanism of c-di-GMP hydrolysis by EAL domain phosphodiesterases.

J Mol Biol. 24, 2010; 402(3): 524–38.

50.

Galperin

M.Y.

, Nikolskaya

A.N.

, Koonin

EV.

Novel domains of the prokaryotic two-component signal transduction systems.

FEMS Microbiol Lett. 2001; 203: 11–21.

51.

Chan

, Paul

, Samoray

Structural basis of activity and allosteric control of diguanylate cyclase.

Proc Natl Acad Sci USA. 2004; 101: 17084–9.

52.

Krause

, von Mering

, Bork

, Dandekar

Shared components of protein complexes—versatile building blocks or biochemical artefacts?

Bioessays. 2004; 26(12): 1333–43.

53.

Heo

, Maslov

, Shakhnovich

Topology of protein interaction network shapes protein abundances and strengths of their functional and nonspecific interactions.

Proc Natl Acad Sci USA. 2011; 108(10): 4258–63.

54.

Fernández

, Lynch

Non-adaptive origins of interactome complexity.

Nature. 18, 2011; 474(7352): 502–5.

55.

Lou

, Liu

, Ni

Synthesizing a novel genetic sequential logic circuit: a push-on push-off switch.

Mol Syst Biol. 2010; 6: 350.

56.

Liang

, Wang

, Hong

, Li

, Zuo

Deletion of the Initial 45 Residues of ARR18 Induces Cytokinin Response in Arabidopsis.

J Genet Genomics. Jan 2012; 39(1): 37–46.

57.

Kanehisa

, Goto

, Kawashima

, Okuno

, Hattori

The KEGG resource for deciphering the genome.

Nucleic Acids Res. 2004; 32: D277–80.

58.

Brzuszkiewicz

, Brüggemann

, Liesegang

, Emmerth

, Olschläger

How to become a uropathogen: comparative genomic analysis of extraintestinal pathogenic Escherichia coli strains.

Proc Natl Acad Sci USA. 2006; 103: 12879–84.

59.

Welch

R.A.

, Burland

, Plunkett

, Redford

, Roesch

Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli.

Proc Natl Acad Sci. 2002; 99: 17020–4.

60.

Mori

, Hirai

, Morooka

, Horiuchi

Escherichia coli str. K12 substr. W3110 DNA, complete genome. direct submission. 2005.

61.

Perna

N.T.

, Plunkett

, Burland

, Mau

, Glasner

J.D.

Genome sequence of enterohaemorrhagic Escherichia coli O157: H7.

Nature. 2001; 409: 529–33.

62.

Blattner

F.R.

, Plunkett

, Bloch

C.A.

, Perna

N.T.

, Burland

The complete genome sequence of Escherichia coli K-12.

Science. 1997; 277: 1453–74.

63.

Hayashi

, Makino

, Ohnishi

, Kurokawa

, Ishii

Complete genome sequence of enterohemorrhagic Escherichia coli O157: H7 and genomic comparison with a laboratory strain K-12.

DNA Res. 2001; 8: 11–22.

64.

Chen

S.L.

, Hung

C.S.

, Xu

, Reigstad

C.S.

, Magrini

Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach.

Proc Natl Acad Sci. 2006; 103: 5977–82.

65.

Parkhill

, Dougan

, James

K.D.

, Thomson

N.R.

, Pickard

Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18.

Nature 2001; 413: 848–52.

66.

Deng

, Liou

S.R.

, Plunkett

, Mayhew

G.F.

, Rose

D.J.

Comparative genomics of Salmonella enterica serovar Typhi strains Ty2 and CT18.

J Bacteriol. 2003; 185: 2330–7.

67.

McClelland

, Sanderson

K.E.

, Spieth

, Clifton

S.W.

, Latreille

Complete genome sequence of Salmonella enterica serovar Typhimurium LT2.

Nature. 2001; 413: 852–6.

68.

Gill

S.R.

, Fouts

D.E.

, Archer

G.L.

, Mongodin

E.F.

, Deboy

R.T.

Insights on Evolution of Virulence and Resistance from the Complete Genome Analysis of an Early Methicillin-Resistant Staphylococcus aureus Strain and a Biofilm-Producing Methicillin-Resistant Staphylococcus epidermidis Strain.

J Bacteriol. 2005; 187: 2426–38.

69.

Chien

, Morozova

, Shi

, Sheng

, Chen

The genomic sequence of the accidental pathogen Legionella pneumophila.

Science. 2004; 305: 1966–8.

70.

Glaser

, Frangeul

, Buchrieser

, Rusniok

, Amend

Comparative genomics of Listeria species.

Science. 2001; 294: 849–52.

71.

Nelson

K.E.

, Fouts

D.E.

, Mongodin

E.F.

, Ravel

, DeBoy

R.T.

Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species.

Nucleic Acids Res. 2004; 32: 2386–95.

72.

Dandekar

, Huynen

, Regula

J.T.

, Ueberle

, Zimmermann

C.U.

Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames.

Nucleic Acids Res. 2000; 28: 3278–88.

73.

Letunic

, Copley

R.R.

, Pils

, Pinkert

, Schultz

, Bork

SMART 5: domains in the context of genomes and networks.

Nucleic Acids Res. 2006; 34: 257–60.

74.

Henikoff

, Henikoff

JG.

Embedding strategies for effective use of information from multiple sequence alignments.

Protein Science. 1997; 6: 698–705.

75.

Thompson

J.D.

, Higgins

D.G.

, Gibson

TJ.

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.

Nucleic Acids Res. 1994; 22: 4673–80.

76.

Rice

, Logden

, Bleasby

EMBOSS: The European Molecular Biology Open Software suite.

Trends in Genetics. 2000; 16: 276–7.

77.

Andreeva

, Howorth

, Chandonia

J.M.

Data growth and its impact on the SCOP database: new developments.

Nucleic Acids Res. 2008; 36: 419–25.

78.

Greene

L.H.

, Lewis

T.E.

, Addou

, Cuff

, Dallman

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution.

Nucleic Acids Res. 2007; 35: 291–7.

79.

Arnold

, Bordoli

, Kopp

, Schwede

The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling.

Bioinformatics. 2006; 22: 195–201. http://bioinformatics.oxfordjournals.org/cgi/content/short/22/2/195.

80.

Deshpande

, Addess

K.J.

, Bluhm

W.F.

, Merino-Ott

J.C.

, Townsend-Merino

The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema.

Nucleic Acids Res. 2005; 33: 233–7.

81.

Kyriakidis

D.A.

, Theodorou

M.C.

, Tiligada

Histamine in two component system-mediated bacterial signaling.

Front Biosci. Jan 1, 2012; 17: 1108–19.

82.

Ulrich

L.E.

, Zhulin

IB.

The MiST2 database: a comprehensive genomics resource on microbial signal transduction.

Nucleic Acids Res. Jan 2010: 38.

83.

Galperin

MY.

A census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial IQ, extroverts and introverts.

BMC Microbiol. Jun 14, 2005; 5: 35.

84.

D'Souza

, Glass

E.M.

, Syed

M.H.

Sentra: a database of signal transduction proteins for comparative genome analysis.

Nucleic Acids Res. Jan 2007; 35(Database issue): D271–3.

85.

Barakat

, Ortet

, Jourlin-Castelli

, Ansaldi

, Méjean

, Whitworth

DE.

P2CS: a two-component system resource for prokaryotic signal transduction research.

BMC Genomics. Jul 15, 2009; 10: 315.

86.

West

A.H.

, Stock

AM.

Histidine kinases and response regulator proteins in two-component signaling systems.

Trends Biochem Sci. Jun 2001; 26(6): 369–76.

87.

Galperin

MY.

Bacterial signal transduction network in a genomic perspective.

Environ Microbiol. Jun 2004; 6(6): 552–67.

88.

Galperin

MY.

A census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial IQ, extroverts and introverts.

BMC Microbiol. Jun 14, 2005; 5: 35.

89.

Gao

, Stock

AM.

Biological insights from structures of two-component proteins.

Annu Rev Microbiol. 2009; 63: 133–54.

90.

Galperin

MY.

Structural classification of bacterial response regulators: diversity of output domains and domain combinations.

J Bacteriol. Jun 2006; 188(12): 4169–82.

91.

Gao

, Mack

T.R.

, Stock

AM.

Bacterial response regulators: versatile regulatory strategies from common domains.

Trends Biochem Sci. May 2007; 32(5): 225–34.

92.

Galperin

MY.

Diversity of structure and function of response regulator output domains.

Curr Opin Microbiol. Apr 2010; 13(2): 150–9.

93.

Grebe

T.W.

, Stock

JB.

The histidine protein kinase superfamily.

Adv Microb Physiol. 1999; 41: 139–227. Review.

94.

Hakenbeck

, Grebe

, Zähner

, Stock

JB.

beta-lactam resistance in Streptococcus pneumoniae: penicillin-binding proteins and non-penicillin-binding proteins.

Mol Microbiol. Aug 1999; 33(4): 673–8.

95.

Pratt

L.A.

, Silhavy

TJ.

Identification of base pairs important for OmpR-DNAinteraction.

Mol Microbiol. Aug 1995; 17(3): 565–73.

96.

Egger

L.A.

, Inouye

Purification and characterization of the periplasmic domain of EnvZ osmosensor in Escherichia coli.

Biochem Biophys Res Commun. Feb 3, 1997; 231(1): 68–72.

97.

Tanaka

, Saha

S.K.

, Tomomori

NMR structure of the histidine kinase domain of the E. coli osmosensor EnvZ.

Nature. Nov 5, 1998; 396(6706): 88–92.

Different Evolutionary Modifications as a Guide to Rewire Two-Component Systems

Abstract

Keywords

Introduction

Results and Discussion

TCS Rewiring by Changing Residues in Sequences

TCS Stimulus Signatures

Binding Sites on the DNA

TCS Rewiring by Domain Shuffling and Diverged Domains

Diverged TCS domains

Extensive TCS Domain Shuffling

A Putative new Family of TCS in Mycoplasma Pneumoniae

TCS Rewiring by Additional Components

Conclusions

Materials and Methods

Methods for Sequence Analysis

Methods for structural analysis

Author Contributions

Funding

Competing Interests

Disclosures and Ethics

Footnotes

Acknowledgments

Supplementary Data

References