Abstract
Genomes may now be sequenced in a matter of weeks, leading to an influx of “hypothetical” proteins (HP) whose activities remain a mystery in GenBank. The information included inside these genes has quickly grown in prominence. Thus, we selected to look closely at the structure and function of an HP (AFF25514.1; 246 residues) from
Keywords
Introduction
Next-generation sequencing (NGS) has shortened the time it takes researchers to collect massive volumes of data.
1
The difficulty of attributing functions to genes is growing as the genomes of more and more species are sequenced. Among the all sequenced data, more than 30% of proteins in various animals are called “Hypothetical Proteins” (HPs) because their molecular activities are unknown.
2
The increased quantity of raw HP is compelling researchers to find ways to use them.
In light of these considerations, the purpose of this study is to define a PM subsp.
Materials and Methods
Retrieval of protein sequence
By searching the NCBI Protein database (https://www.ncbi.nlm.nih.gov/protein/) for the phrase “Hypothetical proteins AND

The study’s overarching notion is shown in a flowchart. Cyan, light green, and blue boxes represent HP’s sequence analysis, structural evaluation, and molecular interaction tests, respectively.
Physicochemical properties analysis
The chemical and physical attributes of the favored HP were assessed using the ProtParam tool on the ExPASSy website (https://web.expasy.org/protam/). The analyzer provides theoretical metrics such as molecular mass, amino acid composition, totally positive and negative residue count, extinction coefficient, theoretical pH, aliphatic index (AI), instability index (II), and grand average of hydropathicity (GRAVY) score. 42
Annotation of functional domain
Functional annotation was applied to the HP to reveal its functions. Several publicly available tools and databases, including NCBI CDD (https://www.ncbi.nlm.nih.gov/cdd/), 43 InterProScan (https://www.ebi.ac.uk/interpro/search/sequence/), 44 and SUPERFAMILY (https://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/) 45 were used to annotate precisely the conserver and functional domain within HP. The default settings were considered in each case. These databases and other bioinformatics tools aid in the identification of conserved domains, which are then used to classify the proteins.
Multiple sequence alignment and phylogenetic analysis
Sequence similarities with the studied HP were searched using NCBI’s Basic Local Alignment Search Tool (BLAST) (https://blast.ncbi.nlm.nih.gov/Blast.cgi). We used NCBI’s BLASTp method 46 to search for matches in a unique protein database. Multiple protein sequences were initially retrieved from the NCBI protein database, all of which were assumed to have the same purpose. The Molecular Evolutionary Genetics Analysis X (MEGA X) program was then used to conduct the multiple sequence alignment (MSA) and phylogenetic analysis between the HP and recovered protein sequences. 47 The ClustalW method, which works in steps, was employed for the MSA analysis. 48 To further illustrate the evolutionary separation of the linked proteins, a phylogenetic tree was built by homologous sequence alignment. We used the standard settings (maximum likelihood, or ML, techniques) with 1000 replicates of the bootstrap. 49
Secondary structure prediction of selected hypothetical protein
The PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred)50,51 and SOPMA (https://npsa-prabi.ibcp.fr/cgi-bin/npsaautomat.pl?page=/NPSA/npsasopma.html) servers were used to make predictions for the HP’s secondary structure (2D). Comparatively, SOPMA predicts a protein’s secondary structure by consulting the “DATABASE.DSSP,” whereas the PSIPRED service employs feed-forward neural networks and the PSI-BLAST algorithm.50,51 Secondary structure prediction was performed in both instances using the HP’s FASTA sequence. 52
Tertiary structure prediction of protein
The tertiary (3D) structure of the HP was predicted by the HHpred (https://toolkit.tuebingen.mpg.de/tools/hhpred)53-56 and I-TASSER (https://zhanggroup.org/I-TASSER/) servers.57-59 Using the MODELLER software developed at the Max Planck Institute for Developmental Biology,53-56 the HHpred predicts the 3D structure of a hitherto uncharacterized protein. In addition, beginning with an amino acid sequence, I-TASSER generates 3D atomic models using multiple threading alignments and iterative structure assembly simulations. 60 For homology modeling, both HHpred and I-TASSER used their respective default values for all parameters. The 3D structures predicted by the HP were refined, and their energy was minimized using the YASARA energy minimization server 61 (http://www.yasara.org/minimizationserver.htm). GalaxyRefine (https://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE) 62 was then used to further enhance the refined 3D structures. GalaxyRefine generates several possible structures; the best quality and performance ones are hand-picked. PyMOL and BIOVIA Discovery Studio were then used to create 3D images of the HP’s structures.
Model quality assessment of studied hypothetical protein
The energy-minimized and fine-tuned 3D structure of the HP was evaluated using the PROCHECK
63
and ERRAT
64
modules of the SAVES server (https://saves.mbi.ucla.edu/). The
Active site prediction of hypothetical protein of Pasteurella multocida strain HN06
The HP’s active site and residues were determined with the use of the Computed Atlas of Surface Topography of Proteins (CASTp) (http://sts.bioe.uic.edu/castp/calculation.html) 69 and FTSite (https://ftsite.bu.edu/) 70 servers. Protein Data Bank (PDB), 71 UniProt, 72 and Structure Integration with Function, Taxonomy, and Sequence (SIFTS) 73 databases were also used. When a protein’s structure and its sequence are correlated, as they are in the CASTp server, rapid residue-level annotations become possible. 69 The predicted active site and residues were further validated by molecular docking (MD) analysis. In addition, the docking investigation verified the anticipated active site and residues. Small organic compounds of varying sizes and polarities may bind to ligand-binding sites, as shown by the FTSite server’s implementation of an algorithm verified by experimental data. Without employing evolutionary or statistical data, the program achieves near experimental accuracy. 70
Subcellular localization and function prediction of hypothetical protein
The spatial environment that governs a protein’s interaction patterns and biological networks influences a protein’s ability to function at its best. 74 For this context, the subcellular localization of the HP was predicted by multiple servers including PSLpred (https://webs.iiitd.edu.in/raghava/pslpred/), 75 SOSUIGramN (https://harrier.nagahama-i-bio.ac.jp/sosui//sosuigramn/sosuigramn_submit.html), 42 Gneg-PLoc (http://www.csbio.sjtu.edu.cn/bioinf/Gneg-multi/),76,77 DeepTMHMM 2.0 (https://dtu.biolib.com/DeepTMHMM), 78 and PSORTb (https://www.psort.org/psortb/) servers. 79
Molecular docking of hypothetical protein with S -adenosylmethionine and S -adenosylhomocysteine
Molecular docking is frequently employed to investigate and evaluate the intermolecular interactions between ligands and macromolecules.
80
Hence, docking experiments were performed on the HP using both SAM and
Molecular dynamic simulation
The stability and function of every protein complex depend on the atoms’ mobility, which may be analyzed computationally using molecular dynamic simulation (MDS).86-88 For this reason, MDS was performed on the HP-ligand complexes, such as HP-SAM and HP-SAH, predicted by the AutoDock Vina, using the Internet server “WebGRO for Macromolecular Simulations” (https://simlab.uams.edu/). 89 The ligand topology files, which are required for the simulation run, were generated using the GlycoBioChem PRODRG2 Server (http://davapc1.bioch.dundee.ac.uk/cgi-bin/prodrg). 90 Selecting “neutralize” and “add 0.15 M salt” and using the SPC 91 box type of triclinic water model were other necessary parameters in addition to using the Gromos96 43a192 force field on the Webgrow server. Moreover, the energy minimization settings 93 include a steepest descent integrator and 5000 steps. NVT/NPT (here, N-Constant number, V-Constant volume, T-Constant temperature, P-Constant pressure) equilibration, 300 K temperature, 1 bar pressure, 50 ns simulation period, and 1000 estimated frames per simulation are also recommended for MDS runs. 94 Finally, the results of the MDS analysis have been interpreted, and the stability and flexibility of the docked complexes have been assessed using metrics such as the root mean square deviation (RMSD) of the given structure over time, the root mean square fluctuation (RMSF) of each residue in the given structure, the average number of H-bonds in each frame over time, the radius of gyration (Rg) or structural compactness, and the solvent-accessible surface area (SASA). 89
Result and Discussion
Retrieval of protein sequence
The NCBI Protein database was queried at random, yielding the HP PMCN06 2293, which is the PM strain HN06 HP. The acquired sequence was then used to search UniProt, a public, free database of protein sequences and their functional annotations. For the sake of analysis, the HP’s attributes have been saved. This includes the HP’s locus, definition, accession, version, and version as well as the HP’s total number of amino acids and FASTA sequence. There are a total of 246 amino acids in the HP, which has been labeled as PMCN06 2293 and assigned the locus, accession, and version numbers of AFF25514, AFF25514, and AFF25514.1 (Table 1).
The properties of HP protein retrieved from NCBI protein database.
Abbreviations: HP, hypothetical protein; NCBI, National Center for Biotechnology Information.
Physicochemical properties analysis
Several physicochemical parameters of the HP PMCN06 2293 were analyzed using the ProtParam tool of the ExPASSy service, and the findings are shown in Table 2. The server predicted that the HP has a 246 amino acid sequence and a molecular weight of 28 352.60 Da. A theoretical pI value of −9.18 was calculated as well for the HP by the server, indicating that it is an alkaline protein with a high negative charge. Protein stability is a crucial factor in various biological processes. One way to determine the stability of a protein is by calculating its II. If the II of a protein is less than 40, it is anticipated to be stable. However, if the II is more than 40, the protein is expected to be unstable. 95 This predicts that HP is an unstable protein with a stability score of 56.57. The AI of a protein is the ratio of the volume occupied by its aliphatic side chains (alanine [Ala], valine [Val], isoleucine [Ile], and leucine [Leu]) to the overall volume of the protein. 96 Therefore, an AI of 84 is predicted for HP, indicating the protein’s widened temperature stability. For each amino acid in the query sequence, its hydropathy value is computed and then divided by the total number of residues to get the GRAVY score for the peptide or protein. The computed value for HP is −0.565, proving that it is a hydrophilic protein. According to the Beer-Lambert law, the extinction coefficient serves as a proportionality constant and measures the intensity of a certain wavelength of light absorbed by a protein. 97 Therefore, the extinction coefficient of the HP was calculated to be 25 565. There are plenty of tyrosine, tryptophan, and cysteine around because of the high extinction coefficient. 95 However, Table 2 provides a comprehensive overview of the physicochemical properties of HP. These features will be helpful when working with the protein in future studies.
The physicochemical properties of HP protein predicted by ExPASSy server.
Abbreviations: HP, hypothetical protein; pI, isoelectric point.
Annotation of functional domain
Predicted by the servers to be present in the HP is the well-known conserved domain of tRNA (adenine(37)-N6)-methyltransferase TrmO (Supplementary Table 1).
Multiple sequence alignment and phylogenetic analysis
The NCBI protein database served as a BLASTp server, which returned HP values for the proteins that were found. In this instance, the software was run against a nonredundant protein database to return the microorganisms with the largest percentage of identical protein sequences, the lowest e-value, and the highest query coverage. These results suggest that the HP and tRNA (N6-threonylcarbamoyladenosine(37)-N6)-methyltransferase TrmO may have comparable purposes (Table 3). After that, the MEGA X program was used to do sequence alignment and phylogenetic tree building. For MSA and tree building, we used the MEGA X software’s ClustalW algorithm and ML technique, respectively, for their iterative processes. The HP and
The identical proteins with the HP, aligned by BLASTp algorithm, the NCBI.
Abbreviations: HP, hypothetical protein; NCBI, National Center for Biotechnology Information.

The evolution and ancestral relationship of the HP with the top aligned sequences. The red marked sequence represents the HP, whereas the tree nodes represent the ancestral relationship.
Secondary structure prediction
The HP’s secondary structure has been predicted using tools like PSIPRED and the SOPMA servers. As a quick summary, the PSIPRED server projected that the HP structure will include the most random coils, followed by prolonged strands, and finally an alpha-helix area (Figure 3). The SOPMA server agreed with the PSIPRED’s assessment that the HP would have a greater proportion of random coil than extended stand or alpha helix (Table 4 and Supplementary Figure 1).

The secondary structure of the HP predicted by PSIPRED server. The strand, helix, and coil structures are depicted by the yellow, pink, and ash colors.
The predicted secondary structure of the HP by SOPMA server.
Abbreviation: HP, hypothetical protein.
Tertiary structure prediction
For accurate HP model prediction, we used the HHpred and I-TASSER servers. The HHpred server determined an optimal 3D model of HP by comparing it to a database of known protein structures and picking a template that best fit the protein’s structure. Using the criteria of a 100% success rate, an

The tertiary structure of the HP predicted by HHpred (A) and I-TASSER (B) servers. The spiral and arrow ribbon represent alpha-helix and beta-sheet structures, whereas the line ribbon represents coil structure of the HP, respectively.
Model quality assessment
The SAVES PROCHECK found that 89.9% of the amino acid residues in the HHpred-predicted model of the HP were located in the Ramachandran preferred area, but only 84.5% of the residues in the I-TASSER referenced model were located there (Figure 5). The ERRAT score is likewise greater in the HHpred-predicted model (87.5) compared with the I-TASSER-predicted model (85.3211) (Table 5 and Supplementary Figure 2). Both HHpred and I-TASSER provide a negative value for the HP model’s projected

The Ramachandran plot of the predicted models by HHpred (A) and I-TASSER (B) server. The first represents the tertiary structure of the HP such as the beta-sheet region, where second and third quadrants represent the right-handed and the left-handed alpha-helix region, respectively. In addition, the red, yellow, gray, and white color regions depict the residues in most favored, additional allowed, generously allowed, and disallowed region, respectively.
The model quality assessment of the HP by SAVES, ProSA, and SWISS-MODEL structural assessment server.
Abbreviation: HP, hypothetical protein.
Active site prediction
The CASTp server predicted a total number of 75 amino acid residues within the active site of HP. However, the active site has been predicted to be covered a total surface area and surface volume of 1811.175 and 2510.612 Å2, respectively (Figure 6A). In the meantime, the FTSite predicted 37 active amino acid residues within the active site of HP (Figure 6B). However, there are 27 common active amino acid residues reported from the servers including Lys-21, Phe-22, Ser-23, Val-24, Pro-25, Arg-26, Pro-28, Phe-63, Gln-64, Phe-65, Asp-66, Arg-94, Thr-96, Gly-103, Leu-104, Ser-105, Asp-127, Leu-128, Val-129, Thr-132, Gln-195, Asp-196, Pro-197, Arg-198, Pro-199, Ala-200, and Tyr-201 (Figure 6C).

The predicted active sites and active amino acid residues by CASTp (A) and FTSite (B) server and common active residues (C) from these servers. The cyan color denotes the protein, whereas the purple color indicates the active amino acid residues.
Prediction of subcellular localization
Numerous servers—such as PSLpred, SOSUIGramN, Gneg-PLoc, DeepTMHMM 2.0, and PSORTb—have made predictions on where in the cell the HP will be found. Different cellular locations are linked to various biological processes, 101 therefore knowing where an HP is found inside the cell might provide light on its potential role. This knowledge might be useful in creating a medication that inhibits the functioning of the targeted protein. 101 As a result, the authors hypothesized the HP is a cytoplasmic protein with comparable functions to other cytoplasmic proteins (Supplementary Table 2).
Molecular docking analysis
The MD study showed that the HP and ligands had several intermolecular interactions (SAM and SAH). Docking scores of −7.4 and 7.5 (kcal/mol) for the HP indicate that SAM and SAH, 2 ligands, have a strong affinity for the HP in site-specific docking (AutoDock Vina) (Table 6 and Figure 7A and B). With a docking score of −7.7 (kcal/mol), both SAM and SAH showed strong attraction for HP in blind docking (Table 6 and Figure 7C and D). Site-specific docking, however, reveals that the HP-SAM and HP-SAH-docked complexes include 16 and 19 interacting amino acid residues of the HP, respectively (Table 6 and Figure 8A and B). The HP-SAM- and HP-SAH-docked complexes have 6 conventional hydrogen bonds. The HP-SAM had 8 van der Waals and 2 carbon-hydrogen bonds, whereas the HP-SAH-docked complexes had 9 van der Waals and 3 carbon-hydrogen bonds. Hydrogen bonds are a vital aspect in determining the specificity of ligand binding. In addition, blind docking showed that the HP has 18 interacting amino acid residues within the HP-SAM complex and 19 interacting amino acid residues within the HP-SAH complex (Table 6 and Figure 8C and D). There are a total of 6 conventional hydrogen bonds in the docked complexes of HP-SAM and HP-SAH. The HP-SAM contained 11 van der Waals and 1 carbon-hydrogen bonds, whereas HP-SAH docked complexes had 8 van der Waals and 11 carbon-hydrogen bonds. Notably, the amino acid residues LYS-21, PHE-22, VAL-24, GLN-195, and ASP-196 are all documented in both site-specific and blind docking to interact with the SAM and SAH. Docking scores of −7.1 (kcal/mol) and −7.3 (kcal/mol) were obtained from the SeamDock server for the HP-SAM and HP-SAH complexes, respectively, validating the predictions of AutoDock Vina (Table 6 and Supplementary Figure 4). Results from the functional domain and MSA analyses suggested that the HP may act as a variant of SAM-dependent MTase; this hypothesis was confirmed by the following docking study. Therefore, we decided to conduct our molecular dynamic simulation research on docked complexes generated using site-specific docking (AutoDock Vina).
The MD analysis of the HP with the SAM and SAH.
Abbreviations: HP, hypothetical protein; MD, molecular docking; RMSD, root mean square deviation; SAH,

The molecular docking analysis of the HP with the SAM and SAH. The figure depicted both the site-specific (A and B) and blind docking (C and D) studies, where the ribbon indicates the HP and the sticks indicate the ligand (green color).

The interacting amino acid residues of the HP-ligand complexes, including HP-SAM (A and C) (site-specific and blind) and HP-SAH (B and D) (site-specific and blind) complexes predicted by AutoDock Vina software. The yellow color sticks depicted the ligands, whereas the disk represents the interacting amino acids.
Molecular dynamic simulation
The stability and performance of the docked protein complexes have been assessed using an MDS to investigate the atomic dynamic movements inside the complexes. Using a time-dependent MDS at 50 ns with the Gromacs forcefield on the Webgrow server, we have assessed the anticipated stability and flexibility of docked complexes such as HP-SAM and HP-SAH generated by AutoDock Vina. The RMSD and RMSF plots have been used to evaluate the complexes’ residual fluctuations and changes. To assess the equilibrium and stability of the HP-SAM and HP-SAH complexes, we calculated their average potential. It has been calculated that the average potential energy of the HP-SAM is −25 4405 kJ/mol, whereas that of the HP-SAH is −25 4919 kJ/mol (Supplementary Figure 5). The root mean square error, Rg, SASA, kinetic energy, enthalpy, volume, and density were all reported throughout the simulation. Changes in protein structure may be evaluated using RMSD by looking at how far C atoms deviate from the average orientation (Figure 9A to D). The average RMSF of all residues has also been counted to assess the local structural flexibility of the HP-SAM and HP-SAH (Figure 9E and F). Because of the correlation between a protein’s Rg and its SASA, the Rg of the HP-SAM and HP-SAH has been determined in the context of structural compactness evaluation (Figures 9C and D and 10A and B). The SASA of the HP-SAM is lower than that of the HP-SAH up to 50 ns. Structural stability has also been predicted for the HP-SAM and HP-SAH, based on their average intramolecular hydrogen bonds (Figure 10E and F). For energy, it is estimated that the HP-SAM has an average kinetic energy of 52 481.3 kJ/mol and an average enthalpy of −60 9650 kJ/mol. The average kinetic energy and enthalpy of the HP-SAH, however, are much higher than those of the original SAH, coming in at 52 575.6and −610 548 kJ/mol, respectively. The subsequent analysis of the docked complexes, HP-SAM and HP-SAH, revealed their stability and flexibility through the parameters such as RMSD, RMSF, Rg, SASA, and hydrogen bond analysis. The graphical depicts of all these parameters conveyed that the docking complexes are well stable and flexible, which imparts the HP to be a probable SAM-dependent MTase as well.

Molecular dynamic (MD) study of the HP-ligand complexes. The RMSD and RMSF of the HP-SAM (A, B, and C) and HP-SAH (D, E, and F) complexes were depicted as 50 ns run and up to 246 amino acid residues, respectively.

The Rg, SASA, and hydrogen bond analysis of the HP-SAM (A, C, and E) and HP-SAH (B, D, and E) docked complexes by MD simulation.
Conclusions
It has been established that the HP of PM strain HN06 is a valuable and stable protein, and one of the protein’s functional domains is tRNA (adenine(37)-N6)-methyltransferase TrmO. Surprisingly, the HP is an essential component in preventing the spread of pasteurellosis as it is a modified form of SAM-dependent MTase, namely, Class VIII SAM-dependent MTase. The biocomputational examination, in particular by MD and simulation studies, established the HP to be a Class VIII SAM-dependent MTase. It is possible to draw the conclusion that HP has great potential to progress research on
Supplemental Material
sj-docx-1-bbi-10.1177_11779322231184024 – Supplemental material for In Silico Functional Characterization of a Hypothetical Protein From Pasteurella Multocida Reveals a Novel S-Adenosylmethionine-Dependent Methyltransferase Activity
Supplemental material, sj-docx-1-bbi-10.1177_11779322231184024 for In Silico Functional Characterization of a Hypothetical Protein From Pasteurella Multocida Reveals a Novel S-Adenosylmethionine-Dependent Methyltransferase Activity by Md. Habib Ullah Masum, Sultana Rajia, Uditi Paul Bristi, Mir Salma Akter, Mohammad Ruhul Amin, Tushar Ahmed Shishir, Jannatul Ferdous, Firoz Ahmed, Md. Mizanur Rahaman and Otun Saha in Bioinformatics and Biology Insights
Footnotes
Acknowledgements
The authors acknowledge the Department of Microbiology, Noakhali Science and Technology University and Research Cell, NSTU for providing the research facilities and fundings respectively.
Funding:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
MHUM, OS, SR, and UPB carried out the studies (data collection, curation, molecular, and data analysis) and participated in drafting the manuscript. MHUM, MSA, MRA, TAS, JF, FA, and MMR critically reviewed and drafted the manuscript. MHUM and OS visualized figures, interpreted data and results, and critically reviewed and edited the manuscript. OS developed the hypothesis, supervised the whole work, and helped to prepare and critically revise the manuscript. All authors read and approved the final manuscript
Data Availability
No data were used to support this study.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
