Abstract
Multiple inositol polyphosphate phosphatase 1 (Minpp1) in higher organisms dephosphorylates InsP6, the most abundant inositol phosphate. It also dephosphorylates less phosphorylated InsP5 and InsP4 and more phosphorylated InsP7 or InsP8. Minpp1 is classified as a member of the histidine acid phosphatase super family of proteins with functional resemblance to phytases found in lower organisms. This study took a bioinformatics approach to explore the extent of evolutionary diversification in Minpp1 structure and function in order to understand its physiological relevance in higher organisms. The human Minpp1 amino acid (AA) sequence was BLAST searched against available national protein databases. Phylogenetic analysis revealed that Minpp1 was widely distributed from lower to higher organisms. Further, we have identified that there exist four isoforms of Minpp1. Multiple computational tools were used to identify key functional motifs and their conservation among various species. Analyses showed that certain motifs predominant in higher organisms were absent in lower organisms. Variation in AA sequences within motifs was also analyzed. We found that there is diversification of key motifs and thus their functions present in Minpp1 from lower organisms to higher organisms. Another interesting result of this analysis was the presence of a glucose-1-phosphate interaction site in Minpp1; the functional significance of which has yet to be determined experimentally. The overall findings of our study point to an evolutionary adaptability of Minpp1 functions from lower to higher life forms.
Keywords
Introduction
Inositol phosphates (InsPs) are a group of vital molecules naturally occurring in both animal and plant cells. They are essential for regulating diverse cellular processes such as calcium mobilization, vesicular trafficking, chromatin remodeling, and apoptosis.1–3 Changes in cellular levels of InsPs have been implicated to regulate cell physiology.
4
Among InsPs, inositol hexakisphosphate (InsP6) is well studied for its role in apoptosis and other cellular signaling processes.
5
Plant cells contain more InsP6 than animal cells.
6
In higher animals, the amount of InsP6 ranges from nanomolar to micromolar concentrations.
7
Multiple inositol polyphosphate phosphatase 1 (Minpp1) is an enzyme that dephosphorylates 3-phosphate from the most abundant InsP68–12 as well as less abundant InP7 and InsP8. Minpp1 is classified as a member of the histidine acid phosphatase super family of proteins.
Subcellular localization of Minpp1 was studied in an attempt to understand the physiological role of Minpp1. Previous studies indicate that Minpp1 is localized inside the endoplasmic reticulum (ER) as a soluble luminal protein with limited access to predominantly cytosolic InsPs. 8 This has prompted researchers to search for alternative functions of Minpp1 that are different from its role in InsPs hydrolysis. For example, in human follicular thyroid carcinomas, Minpp1 was shown to have a role in cell differentiation 13 and proliferation. 14 The levels of InsP6 and InsP5 were noticeably higher in Minpp1 gene knockout mice than in wild type. Exogenous reintroduction of Minpp1 into cells resulted in decreased levels of InsP6, which shows its role in maintenance of InsPs homeostasis. Additional studies have shown that there exists a relationship between Minpp1 and insulin-dependent alkaline phosphatase. On deleting the Minpp1 gene, there was a decrease in insulin-dependent alkaline phosphatase levels, which apparently influenced phosphate stability. 15 Subsequently, overexpression of Minpp1 in chondrogenic cells decreased levels of InsP6, which impaired chondrogenesis and cellular differentiation. In a different study conducted in osteoblastic mouse cells, Minpp1 was used as an osteoblastic differentiation marker for bone development. 16 Minpp1 has also been implicated in regulating oxygen binding to hemoglobin in erythrocytes by hydrolysis of 2,3-bisphosphoglycerate (2,3-BPG) to 2-phosphoglycerate (2-PG), bypassing the intermediate mutase reaction in Rapoport–Luebering shunt in the glycolytic pathway. 17 The chick homolog of Minpp1 is referred to as HiPER; it regulates maturation of chondrocytes from the proliferative to the hypertrophic stage in the growth plate region of the bone. 18 Minpp1 also dephosphorylates other phosphorylated organic compounds, eg, p-nitrophenyl phosphate. This demonstrates that Minpp1 may function as a non-specific dephosphorylating enzyme.
Despite several attempts to study Minpp1 structure, function, and its subcellular location, our understanding of its physiological significance remains unclear. There are also a number of other concerns, eg, its occurrence in lower and higher organisms, cellular location, functional diversification, and non-specific dephosphorylation property. It is not clear how Minpp1 functions vary in a phylogenetic context ranging from microorganisms like prokaryotes to lower eukaryotes, plants, and animals. Since there is no distinct cell organelle differentiation in prokaryotes, does subcellular or organelle-specific localization of this protein have any significant impact on its physiological function?
The purpose of our study was to understand the evolutionary significance of Minpp1, ie, whether Minpp1 function is conserved through evolution or if there is diversification from a simple prokaryotic protein to a more complex functional protein in eukaryotes. To accomplish this goal, we took a bioinformatics approach to analyze Minpp1 related sequences for its distribution across taxa. We identified and collected 40 different species that shared homology with the Minpp1 gene and have compared similarity and variability within conserved motifs among sequences. Phylogenetic analysis was performed to understand the homology among collected organisms and study the evolutionary pattern of Minpp1 from lower to higher organisms. Further, we have analyzed the amino acid (AA) sequences by multiple sequence alignment, predicted the 3D model of Minpp1 protein by I-TASSER and motif scan to predict possible functions. This is an innovative bioinformatics approach to analyze active site and ligand interaction at the molecular level. This approach facilitated our ability to recognize its structure and function and also predicted its possible interactions with potential substrates.
Methods
Collection of Minpp1 Sequences
Homology modeling of the Minpp1 protein was performed to determine the structural and functional relation that exists between the Minpp1 proteins of other species. The full-length human Minpp1 (hMinpp1) AA sequence, as shown in Figure 1, was obtained from the National Center for Biotechnology Information (NCBI) with gene ID: 9562, NCBI RefSeq: NP_004888_2, UniProt ID: Q9UNW1 (other accession IDs: O05286, Q59EJ2, Q9UGA3), and protein code (E.C 3.1.3.62). This was used to find related sequences using the Basic Local Alignment Search Tool (BLAST) from NCBI (www.blast.ncbi. Blast.cgi) and The European Bioinformatics Institute (EMBL-EBI) (www.ebi.ac.uk/) databases. Multiple sequence alignment of the various sequences was achieved using ClustalX2. We predicted the three-dimensional structure of Minpp1 protein sequence (Fig. 1) using I-TASSER online-Zhang-Server (I-TASSER, http://zhanglab.umich.edu/I-TASSER/). Templates judged to be appropriate were downloaded from the protein database (PDB, http://www.pdb.org) connected to the above server and were regenerated using PyMOL.

AA sequence of hMinpp1.
Phylogenetic Analysis: Evolutionary Relevance of Minpp1
A hierarchical clustering approach was adopted to analyze the relatedness among the AA sequences. Results of the analysis are summarized in a phylogenetic tree constructed by the neighbor joining of a Jones–Taylor–Thornton (JTT) matrix using MEGA6 (M6, http://megasoftware.net/). Sequences used for analyses were cross-checked for their appropriateness to be included by a pairwise distance analysis method; a value of one (1) was considered reliable for further analysis. The analysis did not contain any duplicate sequences. The next step was to confirm the average AA identity to estimate the reliability of the aligned sequences. This was done by estimating the
Prediction of 3D Model for Minpp1
The full-length hMinpp1 AA sequence (NP_004888_2) was used to predict a 3D model using I-TASSER, an online platform that predicts the 3D protein structure from AA sequences. 19 The 3D models are built based on multiple-threading alignment using LOMETS. The LOMETS utilizes a number of internal servers such as MUSTER, HHSEARCH, SAM-T02, SPARKS2, SP3, PROSPECT2, and PPA. Each server relies on the inherent template modeling (TM) score cut-off values, which can have bias to the individual algorithms.20,21 However, an overall TM score gives the confidence level of the particular server and the sequence identity between the query and the template. This scoring function is termed as confidence score. The top 10 models are selected based on the confidence score. The root mean squared deviation (RMSD) values for the selected top 10 models ranged from 0.91 Å to 3.70 A. The selected Minpp1 model had an RMSD value of 0.91 A and a TM score of 0.89. Our top selected models had unique global fold,20,21 suggesting quality modeling. The pdb file of the predicted Minpp1 model was downloaded and was reproduced by the PyMOL Molecular Graphics System. 3D model was regenerated using Protein Homology/analogY Recognition Engine V2.0 (http://www.sbg.bio.ic.ac.uk/~phyre/). Inositol phosphatase motif RHGxRxP was then highlighted and labeled.
Structural Motif Analysis: Biological Relevance of Minpp1
To understand the biological relevance of the protein, online bioinformatic tools were used to first find functional motifs in the hMinpp1 sequence and then to compare these motifs among other related sequences. Tools used included: PROSITE Scan prosite tool (PROSITE, http://prosite.expasy.org), motif scan (SIB myhits, http://myhits.motif_scan), Sanger Pfam (pfam, http://pfam.sanger.ac.uk), Simple Modular Architecture Research Tool (SMART, http://smart.embl-heidelberg.de/), Protein ANalysis THrough Evolutionary Relationships (PANTHER, http://www.pantherdb.org/), CATH: Protein Structure Classification (CATH/Gene3D, http://www.cathdb.info/), and SCOP: Structural Classification of Proteins (superfamily, http://supfam.org/). A total of five unique motifs were selected based on the highest number of hits or scores using the above tools. Selected motifs were ER retention signal, phosphotransferase, phytase, inositol phosphate phosphatase, and pleckstrin homology (PH) domain. The presence of the above motifs was searched among all collected sequences. The sequences of these motifs were then compared for conservation and variation using MEGA 6.
Ligand-Binding Prediction of Minpp1
Besides predicting the 3D structure of Minpp1, we predicted protein ligand-binding or enzyme active site interaction with possible substrates. We did this in order to find if Minpp1 binds to ligands other than InsPs. This was achieved by COACH, highly rated bioinformatics software for protein–ligand docking (COACH, http://zhanglab.umich.edu/COACH/). This is a meta-server approach that combines the use of the state-of-the-art tools (COFACTOR, TM-SITE, S-SITE, FINDSITE, and ConCavity) to predict protein–ligand binding. First, the Minpp1 primary sequence was provided as input to generate 3D structure using I-TASSER. I-TASSER feeds the 3D structure into the COACH pipeline for ligand-binding site prediction. COACH utilizes the BioLiP database that houses data on known proteins with their specific ligands. The predicted ligands with their binding sites on the protein were selected based on their scoring values.
Results and Discussion
Identification of Functional Domains in Minpp1 Sequence
In order to identify structural domains and functional motifs in hMinpp1 protein sequence, NCBI Refseq: NP_004888_2 was used. This accession number is for the full length of hMinpp1 isoform that contains 487 AAs. The major domains and motifs present in Minpp1 as identified using the above-mentioned bioinformatics tools are labeled in the AA sequence of the protein as shown in Figure 1 and summarized in Table 1. The use of multiple bioinformatics tools provided us with varied criteria to identify the true motifs within a domain. This has also allowed us to minimize prediction errors and confirmed the results of various approaches. The motifs with higher TM scores were chosen for further analysis. TM is a measure of similarity between two protein structures with different tertiary structures and is considered an accurate reflection of true protein structure.19–21 The overall TM scores were between 0.55–0.75; a reliable range to consider for an identified functional motif. 22
Summary of functional domains identified with references.
Most domains and motifs identified in our analysis have also been described in the literature (Table 1). AAs 1–30 at the N-terminal end constitute a signal peptide, 10 AAs 71–429 comprise acid phosphatase-A (AP-A) domain, 9 and AAs 74–207 span a domain for phosphoglyceromutase acid phosphatase (PGAM). 17 These two domains share AAs 71–207, contributing to the complexity of the protein. A protein kinase B domain 23 identified between AAs 401–480 overlaps with AP-A. The most conserved region known for InsPs phosphatase activity is AAs 88–94, with the signature sequence RHGTRYP. 9 This motif is shared between the PGAM and AP-A domains. AAs 242–245 (NATA) and 481–484 (NSTS) are N-glycosylation sites that might be involved in shuttling of proteins between the ER and Golgi complex. However, no evidence of such transportation mechanism is described in the literature. AAs 149–160 (KGRQDMRQLALR) were identified as PH domain motif, AAs 306–309 (DIDD) as glucose-1-phosphatase (G1P)/glucose transferase, and AAs 485–487 (KDEL) as an ER retention signal. 10
Three signature motifs for phytases were also identified in Minpp1. AAs 126–128 (DLG), AAs 103–200 (QJHYH), and AAs 48–53 (GTKTRY). 24 The presence of N-glycosylation sites and an ER retention signal predicts its role in trafficking between ER–Golgi and other endomembrane systems. The tools also gave hits for N-myristolation sites at three different positions: AAs 107–112, 113–118, and 128–133. All tools predicted and collectively confirmed the presence of acid phosphatase and phosphoglycerate mutase domains and motifs for phytases and InsPs phosphatase. These features qualify Minpp1 as a genuine member of the histidine phosphatase super family that possesses the signature RHGxRxP catalytic motif. A considerable number of references for the members of this category are available.2,9,15,25 It is one of the largest functionally diversified families of proteins known. Members of this family also include bisphosphoglycerate mutase that is known for dual phosphatase and mutase activities and fructose-2,6-bisphosphatase, which is involved in glycolysis and gluconeogenesis. 26
Identification of Local Similarity between Sequences
Both NCBI and EMBL-EBI databases were searched using the Minpp1 sequence for the presence of homologous sequences. There was a wide range of variation in homology between species given the breadth of taxa examined (eg, yeast, zebrafish, plants, and mammals). The BLAST results produced a total of 286 hits with a significant matching score >100 belonging to varied taxonomic groups like bacteria, metazoa, fungi, and plants. This suggests that Minpp1 is a widely distributed protein through evolution. Such a distribution of Minpp1 has also been addressed by Chi and associates. 10 While collecting data from the two databases, care was taken not to collect duplicate sequences and any putative, hypothetical, or synthetic sequences.
Species selected for phylogenetic analysis were taken based on AA sequence identity. Plant species
Summary of the identified species with their sequence identity ranges.
In our study, the query resulted in 104 and 108 hits in NCBI and EMBL-EBI databases, respectively. The sequences selected from the above categories were based on their presence in both databases and upon elimination of predicted, constructed, and putative sequences. Results show that highly evolved species like
Analysis of Hminpp1 Isoforms
Location of the Minpp1 protein in higher organisms is restricted to the ER because of the presence of ER retention signal (KDEL), while InsPs are present in the cytosol. Minpp1 must hydrolyze these InsPs outside the ER as there has not been any association of InsPs with ER demonstrated. Until now it was not known how Minpp1 and InsPs interact for enzymatic activities against InsPs. To address this, we examined available databases for the existence of Minpp1 isoforms and their subcellular locations. Four isoforms of hMinpp1 were found during our search using UniProt (www.uniprot.org/uniprot/Q9UNW1). These isoforms are believed to be formed by alternative splicing of the gene for which experimental confirmation is still unknown. The first isoform with UniProt ID Q9UNW1 is considered variant 1. It represents the longest isoform encoded. The second isoform (variant 2) of the protein with UniProt ID Q9UNW1–2 is shorter than variant 1. Variant 2 lacks two alternate coding exons. The third isoform (variant 3) of the protein with UniProt ID Q9UNW1–3 is even shorter than variant 2. Variant 3 differs in the 5′ UTR and coding sequence as compared to variant 1. Variant 3 has a shorter and distinct N-terminus compared to variant 1. There were variations in AA sequences 279–284 between variant 1 and 3; AAs 285487 were missing in variant 3. The fourth isoform (variant 4) with UniPort ID Q9UNW1–4 is the shortest. In this isoform, AAs 1–213 are absent at the N-terminal end of the sequence. Upon comparison with variant 1, variant 4 has a C-terminal KDEL motif. Functionality of these isoforms is still unknown and no experimental data are available. Sequence alignment was done (Fig. 2) to see the similarity among all four isoforms of hMinpp1. Considerable sequence identity is seen for AAs 213–278; AA residues H 231 and H 248 are conserved. The NATA-N-glycosylation site is shown to be conserved (Fig. 2) in all four human isoforms. Of the four isoforms, two are predicted to be present as a luminal protein in the ER because of the presence of their KDEL motif. The two other isoforms are perhaps present in the cytosolic location because of the absence of KDEL-like sequence at the C-terminal.

Multiple sequence alignment of isoforms of hMinpp1
The 3-phosphatase activity of Minpp1 in erythrocyte plasma membrane appears similar to Minpp1 activity found in the ER.17,28 It is likely that the phosphatase activity in lower organisms is associated with the cell membrane.
Evolutionary Relationship of Minpp1 Determined by Phylogenetic Analysis
The phylogenetic tree constructed using all 40 species of Minpp1 is shown in Figure 1. The reliability and reproducibility of the tree was evaluated using a bootstrap procedure. Minimum replication value, a measure of reproducibility, was set at 2000 in this study. The resultant cladogram from all 40 species represents the evolutionary pattern of Minpp1 protein (Fig. 3). The tree provides strong evidence of conservation of Minpp1 protein across a wide range of species ranging from primitive species like

Phylogenetic tree for Minpp1 related sequences.
Since there was a wide variation in sequence identity between the two major clusters of the phylogenetic tree, it appeared obvious to examine the degree of identity among organisms within each cluster. In other words, whether lower organisms eg, bacteria share a greater degree of identity among themselves. The phylogenetic trees generated from this analysis are given as Supplementary Figures 6A and B. Our analysis of the cluster representing 22 lower organisms generated the phylogenetic tree showing the highest log likelihood (-9452.8222), which indicates that they have a close evolutionary relatedness to each other. In the cluster for higher organisms for a group of 18 sequences, the analysis showed the highest log likelihood (-1878.8844), and the partitions reproduced were within 50% bootstrap replicates. An overall pairwise alignment value was 0.386, indicating more evolutionary divergence among higher organisms.
Prediction of 3D Structure for Minpp1
The 3D structure of Minpp1 (NP_00048_2) was predicted by using I-TASSER. Top 10 models were selected based on the confidence score. The TM score, which gives the confidence level of the sequence identity between the query and the template, was taken into account. Since crystal structure of Minpp1 is not available, we relied on the predicted structure quality of our 3D modeling. The RMSD values for the top 10 highest scoring models ranged from 0.91 to 3.70 Å. The selected Minpp1 model had an RMSD value of 0.91 Å and a TM score of 0.89, suggesting the high quality of the predicted model. The global folding was also unique among the top scoring models. The predicted PDB structure consisted of a helix represented in green, extended β-sheet and bridged β structures in magenta, and coils in pink (Fig. 4A). The motif RHGxRxP for phosphatase activity is highlighted in Figure 4B.

3D modeling of Minpp1 protein generated using the AA sequence.
Motif Analysis
One of the objectives of our study was to analyze any functional variation in Minpp1 activity that might have occurred through evolution. We examined the conserved functional motifs in the Minpp1 related sequences. We addressed whether basic Minpp1 function was conserved among species or functional complexity developed and new functions were added and adopted as more complex functional needs arose through evolution. The homology modeling tools described in the methods section were used to find the functional motifs in Minpp1 sequences. Motifs that were studied are discussed in the paragraphs that follow.
Inositol Phosphate Phosphatase Motif (RHGXRXP)
Phosphorylation and dephosphorylation are fundamental processes that regulate physiological functions in the cell. The histidine acid phosphatase family of enzymes is characterized by the presence of the conserved RHGXRXP (Arg-His-Gly-X-Arg-X-Pro) motif 25 that is essential for InsPs hydrolysis. The RHGXRXP motif has a nucleophilic histidine, an active arginine, and aspartic or glutamic acid residues. The tripeptide RHG interacts directly with the phosphate group of the substrate, making it more susceptible to nucleophilic attack given the positive charge of the guanidino group of the arginine residue. The conserved arginines in the RHGxRxP motif are important for binding the highly negatively charged phosphate group. During hydrolysis, the aspartic acid residue from the C-terminal motif protonates the substrates and the nucleophilic histidine residue is involved in the formation of a covalent phospho-histidine intermediate. In Minpp1, the phosphatase motif is generally conserved. The organisms that did deviate from the conserved pattern are microbial organisms and insects (Supplementary Fig. 1). The variations in RHG noted were THG, MHG, and VHG.
ER Retention Motif (K/SDEL)
The ER participates in folding, sorting, and transport of proteins to various cellular destinations. Because of a retention motif, the majority of ER-resident proteins are retained in the ER. This motif is composed of four AAs at the C-terminal end of the protein. KDEL (Lys-Asp-Glu-Leu) is a signal for permanent retention of proteins in the ER. 18 The KDEL sequence is recognized by a membrane-bound receptor that continually retrieves proteins from the Golgi compartment of the secretory pathway and returns them to the ER via retrograde transport vesicles. In Minpp1, AAs 485–487 at the C-terminal represent the ER retention signal.
Variations in the KDEL motif include RDEL, QDEL, KNEL, KEDL, DKEL, or KDEL
29
It is not known whether such variations can lead to sub-ER localization.
30
Our analysis, obtained by pairwise multiple alignment using ClustalX2, shows that KDEL is highly conserved in higher organisms. Exceptions include VKTEL in two of three species; RNTEL in
Phosphotransferase Motif (DXDX[T/V])
Phosphotransferases are a complex group of enzymes that catalyze reactions by phosphorylation. This family is characterized by the presence of a conserved aspartate residue in the amino-terminal region. The first aspartate of the motif DXDX(T/V) (Asp-X-Asp-X[Thr/Val]) is well conserved. 31 Phosphotransferase is considered to be functionally conserved from prokaryotes to higher eukaryotes, and it belongs to a large hydrolase family comprising several phosphatases. The family is slightly larger than the bisphosphoglycerate mutase family, which comprises two mutases and three phosphatases. The phosphatases have the phosphorylated residue that is a histidine in a conserved RHG motif. The conserved DXDX(T/V) motif is considered characteristic of phosphatases/phosphomutases acting on phosphate esters.28,31 The first aspartate in the motif is responsible for phosphorylation and is position sensitive. The second aspartate in the motif is involved in catalysis, which acts as a nucleophile and in some cases as an acid-base catalyst. 32 The conserved motif of phosphotransferase is well conserved in a hierarchical fashion from lower to higher organisms. Some of the AA variations that were observed are depicted in Supplementary Figure 3. The two extremes include variations in insects and a complete absence in bacteria.
PH Domain Motif (K-Xn-[K/R]-X-R)
The PH domain occurs in a wide range of proteins. This was first detected in pleckstrin and is the major substrate for protein kinase C in platelets. It is usually made up of AA residues K-X
Phytase Motif (DXG, GDXXY, GNH[E/D], GHXH)
Phytases are a ubiquitous class of proteins. They have broad substrate specificity and have the ability to hydrolyze many phosphorylated compounds that are not structurally similar to phytic acid. Phytases (PAP) and histidine superfamily phytases primarily hydrolyze InsPs. Purple acid phosphatases are commonly present in lower organisms. They have seven conserved residues (bold) in the five conserved motifs –
Active Site Residue–Ligand/Substrate Binding Prediction Analysis
AA residues that are active in ligand binding were predicted using a consensus-based algorithm (COACH). InsP6 or inositol hexakissulfate (InsS6), glucose-1-phosphate (GP), and PO4 ligand binding sites were studied. AAs involved in InsP6/InsS6 binding are T49, K50, R88, H89, R92, T95, K97, R186, F228, Q321, H370, and A371; and in PO4 binding are R88, H89, R92, R186, H370, and A371. This shows that R88 and H89 are actively involved in phosphatase activity. GP ligand binding showed active binding sites at R88, R92, R186, H370, A371, and E372. G1P belongs to the histidine acid phosphatase family and acts primarily as a glucose scavenger. It is well studied in

Potential ligand-binding site prediction by a consensus-based algorithm (COACH).
The AAs R88, R92, R186, H370, and A371 in hMinpp1 are identified as the most active residues for binding to phosphorylated substrates. Our studies revealed that GP is also a potential substrate for this active site. It would be interesting to see if mammalian Minpp1 also utilizes GP as a substrate. This prediction is based on the nature of Minpp1 as a non-InsP-specific phosphatase, but this needs to be experimentally confirmed.
Conclusion
In an attempt to understand the physiological relevance of Minpp1 in mammalian systems, we took a bioinformatics approach to compare the hMinpp1 sequence across a broad range of taxa. Two major databases encompassing 40 species were used to study relatedness, divergence, and complexity adopted in function through evolution.
Phylogenetic analysis showed that Minpp1 related proteins clustered into two major groups, one representing lower and another higher organisms. Since we observed two very distinct clusters with a greater degree of variation in sequence identity, we carried out separate analysis on each cluster to see intra-organism relatedness among lower and higher organisms. While lower organisms were phylogenetically more related to each other, they were distinct from higher organisms (only 20–30% similarity). Additionally, we observed more divergence among higher organisms. It is likely that lower organisms developed a simpler version of Minpp1 that functioned primarily in non-specific dephosphorylation of phosphorylated organic compounds. In higher organisms, this function was adopted by way of natural selection and diversified through evolution. More functional motifs were added over time giving rise to a more complex structure with a higher molecular size of the enzyme in higher organisms. This is evident by the presence of fewer functional motifs in primitive than in higher organisms.
One of the key findings of our analysis was the identification of four spliced variants (isoforms) of Minpp1 in humans. These isoforms vary in size and apparent localization in the cell because all isoforms do not contain, for example, the KDEL motif at C-terminal for ER retention. The existence of these isoforms, to our knowledge, has not been documented in the literature. Whether other species also have similar isoforms has not yet been analyzed.
Another key finding of our predictive bioinformatics analysis has been the identification of interaction of GP with Minpp1. It will be interesting to determine the significance of this interaction experimentally. The presence of G1P activity has been documented in lower organisms. However, the presence of G1P motif in Minpp1 in higher organisms is intriguing with regard to its potential role in glucose metabolism. Our future studies are directed to establish experimentally any catalytic activity of Minpp1 against GP or related compounds such as glucose-6-phosphate as potential substrates.
Authors' Contributions
SPK performed bioinformatics analysis and writing of the first draft. AS assisted with bioinformatics tools. WHB guided the appropriateness of the tools and the study. NA conceived the idea, guided the study, and participated in writing. All authors participated in editing of the manuscript. All authors reviewed and approved of the final manuscript.
Supplementary Material
Forty protein sequences collected from the NCBI and EMBL-EBI databases were aligned using ClustalX2 to see similarity and diversification in the conserved catalytic “RHGxRxP” motif. The highlighted (blue) region shows that the motif is conserved from lower to higher organisms; especially Arg, His, Gly. Out of the 40 species used in the analysis, 34 showed the conserved pattern, exceptions were seen in insects and microorganisms.
ER resident proteins are retrieved and retained in the ER due to retention motif. In Minpp1 AAs KDEL 485–487 at the C-terminal were ER retention signal. All collected sequences are aligned using ClustalX2 to see similarity and diversification in the protein sequences. Out of the 40 sequences, 36 species highlighted in blue show that the motif is conserved from lower to higher organisms; in plants it is VKTEL and in some insects and bacteria it was absent.
The 40 sequences collected from NCBI and EMBI-EBI databases were aligned using ClustalX2 to see similarity and diversification in the conserved phosphotransferase motif “DXDX(T/V)”. Highlighted are 36 out of 40 species, which represent conservation of aspartic acid in all aligned species of higher organisms with some exceptions in insects. The first aspartic acid is responsible for phosphorylation and is position sensitive.
PH Domain is found in many proteins that are involved in intracellular signalling and as a constituent of the cytoskeleton. The conservation and diversification of the motif were determined using ClustalX2. The motif highlighted in blue for 37 species shows the conservation of the motif; some amino acid residual variations are seen in more complex organisms and completely absent in some insects and bacteria.
Phytases are enzymes that catalyze phosphate monoester of phytate into the stepwise formation of
The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model. The tree with the highest log likelihood (-9452.8222) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 22 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 210 positions in the final dataset. Evolutionary analyses were conducted in MEGA6.
The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model. The tree with the highest log likelihood (-1878.8844) is shown. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 18 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 136 positions in the final dataset. Evolutionary analyses were conducted in MEGA6.
Footnotes
List of Abbreviations
Acknowledgments
SPK thanks UALR Tech Launch, University of Arkansas at Little Rock, for a graduate assistantship.
