In-silico approach to characterize the structure and function of a hypothetical protein of Monkeypox virus exploring Chordopox-A20R domain-containing protein activity

Abstract

Background: Monkeypox has emerged as a noteworthy worldwide issue due to its daily escalating case count. This illness presents diverse symptoms, including skin manifestations, which have the potential to spread through contact. The transmission of this infectious agent is intricate and readily transfers between individuals.

Methods: The hypothetical protein MPXV-SI-2022V502225_00135 strain of monkeypox underwent structural and functional analysis using NCBI-CD Search, Pfam, and InterProScan. Quality assessment utilized PROCHECK, QMEAN, Verify3D, and ERRAT, followed by protein-ligand docking, visualization, and a 100-nanosecond simulation on Schrodinger Maestro.

Results: Different physicochemical properties were estimated, indicating a stable molecular weight (49147.14) and theoretical pI (5.62) with functional annotation tools predicting the target protein to contain the domain of Chordopox_A20R domain. In secondary structure analysis, the helix coil was found to be predominant. The three-dimensional (3D) structure of the protein was obtained using a template protein (PDB ID: 6zyc.1), which became more stable after YASARA energy minimization and was validated by quality assessment tools like PROCHECK, QMEAN, Verify3D, and ERRAT. Protein-ligand docking was conducted using PyRx 9.0 software to examine the binding and interactions between a ligand and a hypothetical protein, focusing on various amino acids. The model structure, active site, and binding site were visualized using the CASTp server, FTsite, and PyMOL. A 100 nanosecond simulation was performed with ligand CID_16124688 to evaluate the efficiency of this protein.

Conclusion: The analysis revealed significant binding interactions and enhanced stability, aiding in drug or vaccine design for effective antiviral treatment and patient management.

Keywords

Chordopox_A20R domain-containing protein hypothetical protein in-silico characterization Monkeypox virus etc

Introduction

The zoonotic orthopoxvirus, a member of the poxviridae family causes monkeypox.¹ Animals like rodents and monkeys are the primary carriers, but humans are also involved.² In 1958, the virus was found in monkey’s body in a laboratory in Copenhagen, Denmark.³ As a result of an intensified campaign to eradicate smallpox, the Democratic Republic of the Congo reported the first case of monkeypox in 1970.⁴ However, monkeypox is mostly dominant in central and western regions of Africa, where people who live near tropical rainforests.⁵ In West and Central Africa, only 50 cases of monkeypox were reported in 1990. In 2020, there were more than 5000 cases all over the world, whereas in the past, monkeypox was thought to exist only in Africa. In the recent year, 2022, it was reported in several of non-African countries, including the United States and Europe.⁶ The Centers for Disease Control and Prevention (CDC) estimated that as of August 23, 2023, there were approximately 89,385 monkeypox patients globally, and 94 nations reported cases of monkeypox in 2022.⁷ As a result, people are gradually becoming more and more anxious and afraid, which is frequently reflected in people’s opinions on social media,⁸ as there is currently no effective treatment for the monkeypox virus, according to the CDC’s recommendations.

The rapid gathering of information gathering is needed to develop an effective therapeutic treatment for this infectious disease. The Next Generation Sequencing (NGS) is a widely used way to gather enormous volumes of data in a relatively quickly. When many more organisms are sequenced, the challenge of assigning functions to genes is increasing^9,10. These genes have translated into multiple types of crucial proteins, and more than 30% of their molecular activities are unknown. These are termed as Hypothetical Protein (HP).¹¹ The monkeypox virus contains many hypothetical proteins whose functions are unknown. Proper treatment might be generated by knowing the structural and functional characteristics of a hypothetical protein of the monkeypox virus. Furthermore, structural and functional annotation of HPs may also reveal potential biomarkers and pharmacological targets.¹² With the use of many updated algorithms and software, bioinformatics tools give a platform for determining the structure and function of hypothetical proteins through homology modeling or domain homology searches.¹³ Thus, the objectives of the present study are to use bioinformatic tools based on various algorithms to annotate the structural and functional fractures of a hypothetical protein (accession no. URK21192.1, PDB ID 6zyc.1). Based on the findings of the study, the hypothetical protein played a key role in a complete scenario of viral replication and host-pathogen interaction. Nevertheless, this study may provide insight into a new approach to the design and discovery of drugs.

Materials and methods

Sequence retrieval

The NCBI protein database was searched for the HP (accession no. URK21192.1) of the monkeypox virus, which led to of the amino acid sequence selection. The sequence’s FASTA format was retrieved and submitted to several prediction servers for the in-silico characterization.

Physiochemical properties analysis

The physical and chemical properties of the target protein sequence including molecular weight, amino acid composition, theoretical pI, instability index, extinction coefficient, atomic composition, estimated half-life, the total number of positively charged residues (Arg + Lys), the total number of negatively charged residues (Asp + Glu), aliphatic index, and grand average of hydropathicity (GRAVY) were analyzed using ExPASy’s ProtParam tool.¹⁴

Identification of conserved domains, motifs, and super families

Conserved Domain Database (CDD, available at NCBI),¹⁵ Pfam,¹⁶ and InterProScan¹⁷ were used for domain analysis. The Motif¹⁸ server was used to find protein motifs. The protein folding pattern was identified using the PFP-FunDSeqE.

Multiple sequences alignment and phylogeny analysis

BlastP¹⁹ from NCBI was used to find similarities in the non-redundant (nr) sequences. At the same time, multiple sequence alignment and phylogeny analyses were carried out by using CLC sequencer viewer 8.0. In this case, the neighbor-joining method for constructing the tree was implemented, and an unrooted phylogeny model was portrayed.

Prediction of secondary structure

The self-optimized prediction method with alignment (SOPMA) was used to predict the secondary structure of the target protein.²⁰ To ensure the accuracy of SOPMA results, PSIPRED²¹ was also implemented. Furthermore, assessment of the model was assessed to check its validity and structural conformation.

Three-dimensional (3D) structure prediction

The SWISS-MODEL server²² was used to determine the 3D structure of the target protein based on homology modeling. The server automatically performs a BLASTp search for potential templates for each protein sequence. Template protein 6zyc.1. A was chosen for homology modeling from the search result with 96.75% sequence identity, which was a reliable score to start modeling; this is an X-ray diffraction model of C-terminal domain of the vaccinia virus DNA polymerase processivity factor component A20 protein

Energy minimization of the 3D model

The overall energy of the 3D structure from the SWISS-MODEL server was minimized using the YASARA force field minimizer.²³ Comparatively lower energy, precise 3D structure provides stable configuration of the target protein. Therefore, a high throughput field of the YASARA program was utilized to reach optimum energy stability because stable configuration should not be compromised to get additional information.

Model quality assessment

The PROCHECK,²⁴ Verify3D,²⁵ and QMEAN²⁶ programs of the ExPASy server of SWISS-MODEL Workspace and ERRAT²⁷ were used to analyze the quality of the three-dimensional structure. The ProSA server estimated the Z-score for the template and the HP.²⁸ Finally, the aforementioned programs were utilized to check the behavior of the protein verifying the structural assortment and overall quality.

Active site detection

The protein’s active site was identified using the computed atlas of surface topography of proteins (CASTp) server.²⁹ The CASTp provides precise, thorough, and quantitative information on a protein’s topographical characteristics. It is possible to precisely find and measure active pockets on protein surfaces and the interior side of three-dimensional structures. As a result, it has evolved into a platform essential for the predicting of the protein regions that interact with ligands. Additionally, FTsite was used for high-accuracy detection of ligand binding sites. It is an accessible a free web-based server at https://ftsite.bu.edu.

Molecular docking analysis

Docking analysis was performed using PyRX software. It is used to observe how the ligands are binding with the protein. The ligands used for the docking were tecovirimat (TPOXX) CID_ 16124688, cidofovir (CID_ 60613), and ribavirin (CID_ 100252). The binding affinity, 3D, 2D structure, and docking results were observed and analyzed by BIOVIA Discovery Studio software.

Analysis of molecular dynamics simulation

Schrodinger Maestro version 11.8 generated simulation snapshots depicting individual atomic movements. The simulation data was then analyzed using the Simulation Interaction Diagram (SID) feature provided within the Schrodinger package. In addition to the root mean square deviation (RMSD), metrics such as root mean square fluctuations (RMSFs), radius of gyration (Rg), and solvent accessible surface area (SASA) values were also investigated.

Results

Physicochemical properties

Several physicochemical properties of the hypothetical protein of (accession no. URK21192.1) were estimated by the ProtParam tool shown in Table 1. The protein consists of 426 amino acids and the most abundant were isoleucine (9.9%), lysin (9.2%), serine (8.2%), asparagine (8.0%), valine (8.0%), leucine (7.5%), aspartic acid (7.0%), glutamic acid (6.8%), phenylalanine (6.6%), threonine (5.2%), tyrosine (5.2%), arginine (3.8%), proline (2.3%), alanine (1.9%), glutamine (1.6%), methionine (1.4%), and cystine (1.2%). Interestingly, this HP has no pyrrolysine and selenocysteine amino acids in the sequence. The protein possesses a molecular weight of 49147.14 Da, theoretical pI 5.62, aliphatic index of 92.77, and grand average of hydropathicity (GRAVY) of −0.258. The target protein’s instability index (II) was reported to be 36.35, classifying the protein as stable. The total number of positively charged (Arg + Lys) and negatively charged (Asp + Glu) residues were estimated to be 55 and 59, respectively. Mammalian reticulocytes (in vitro) were found to have a half-life of 30 h, yeast, >20 h, and Escherichia coli, >10 h. The molecular formula of the HP is reported as C₂₂₃₁H₃₄₇₃N₅₆₁O₆₆₅S₁₁.

Table 1.

Physicochemical properties of the MPXV-SI-2022V502225_00135 estimated by ProtParam tool.

Description	Value
No. of amino acid	426
Molecular weight (Da)	49147.14
Theoretical PI	5.62
No. of positively charged residue	55
No. of negatively charged residue	59
No. of atoms	6941
Instability index	36.35
Aliphatic index	92.77
Grand average of hydropathicity	−0.258

Function prediction by domain and motif analysis

Based on predictions made by NCBI-CD Search, Pfam, and InterProScan, the target protein was suggested to contain the domain of the Chordopox_A20 R superfamily with 6.77e⁻¹⁶⁷and is classified as Chordopox_A20R domain-containing protein. The A20R protein is required for DNA replication, is associated with the processive form of the viral DNA polymerase, and directly interacts with the viral proteins encoded by the D4R, D5R, and H5R open reading frames. A20R may contribute to the assembly or stability of the multiprotein DNA replication complex. NCBI-CD server predicted the Chordopox_A20R superfamily domain at 1-332 amino acid residues. Pfam and InterProScan also predicted the Chordopox_A20R superfamily domain listed in Table 2.

Table 2.

Functional annotation results of different tools.

Tools	Domain	Interval	E-value	Description
NCBI-CDD	Chordopox_A20R superfamily	1–332	6.69 × 10⁻¹⁶⁷	Chordopoxvirus A20R protein encoded by D4R, D5R, and H5R open reading frames
InterProScan	Chordopoxvirus A20R	1–332	6.69 × 10⁻¹⁶⁷
Motif	Chordopoxvirus A20R	1–332	3 × 10⁻¹³³

Multiple sequences alignment and phylogenetic tree

The non-redundant (nr) database showed sequence similarities up to 99% with other known DNA polymerase processivity factor protein families of the monkeypox virus (Table 3). Multiple sequence alignments (MSAs) of top ten (10) selected proteins retrieved from BLATp results were done to inspect the conserved and dissimilar residues among the homologs (Figure 1). Phylogenetic result depicted the target protein and UVB85072.1 of the same organism was found having a common ancestor with YP_010377130.1 (Figure 2). The amount of genetic change is represented by the line segment with the number bar (0.002) where the scale bar estimated sequence divergences.

Table 3.

Non-redundant sequencing information of proteins with similar properties.

Accession no	Organism	Protein name	Score	Identity (%)
URK21192.1	Monkeypox virus	Hypothetical protein	855	100
YP_010377130.1	Monkeypox virus	DNA polymerase processivity factor	855	100
UYL69626.1	Monkeypox virus	DNA polymerase processivity factor	855	99.77
WDO28958.1	Monkeypox virus	DNA polymerase processivity factor	855	99.77
UVB85072.1	Monkeypox virus	DNA polymerase processivity factor	855	99.77
UXP70025.1	Monkeypox virus	DNA polymerase processivity factor	854	99.77
UWO30788.1	Monkeypox virus	DNA polymerase processivity factor	854	99.77
UYF07623.1	Monkeypox virus	DNA polymerase processivity factor	854	99.77
UZL93661.1	Monkeypox virus	DNA polymerase processivity factor	854	99.77
WCS76309.1	Monkeypox virus	DNA polymerase processivity factor	854	99.77

Figure 1.

Multiple sequence alignment (MSA) among different DNA polymerase processivity factor protein family of monkey pox using CLC Sequence Viewer version 8.

Figure 2.

Phylogenetic tree showing evolutionary relationship of the target protein with DNA polymerase processivity factor proteins.

Prediction of secondary structure

In this case, the secondary structure of the protein was predicted by PSIPRED, SOPMA, and ENDscriptserver. According to SOPMA estimation, the alpha helix was found to be the most predominant (42.49%), followed by extended strand (22.77%), random coil (27.46%), and beta-turn (7.28%). Similar results were obtained from ENDscript and PSIPRED (alpha helix: 38.26%, random coil: 34.50%, and extended strand: 26.99%). The secondary structure of the protein predicted by PSIPRED is shown in Figure 3.

Figure 3.

Predicted secondary structure of the target protein using PSIPRED server.

Tertiary structure prediction and energy minimization

The HP tertiary structure was obtained from the SWISS-MODEL server using the template 6zyc.1, which shows 96.75% sequence identity with the target protein. The structure obtained through SWISS-MODEL is depicted in Figure 4. The YASARA Energy Minimization Server minimized the energy of the modeled protein from −52684.8 KJ/mol to −72589.0 kJ/mol. The preliminary score was −1.77, but after energy minimization, the final score turned to 0.39, indicating a more stable form.

Figure 4.

Predicted 3-dimensional structure of the target protein through SWISS-MODEL server after YASARA energy minimization.

Model quality assessment

The quality of the modeled 3D structure was assessed by PROCHECK, Verify 3D, QMEAN, and ERRAT programs. According to the PROCHECK result, 92.9% of amino acid residues fell within the most favored region in the “Ramachandran plot” (Table 4 and Figure 5). The model structure successfully passed the Verify 3D server where 100% of the residues have averaged a 3D-1D score ≥0.2. QMEAN tool placed the model inside the dark gray zone with QMEAN4 value reflecting a good structure quality (Figure 6). ERRATA also predicted the protein structure to be of good quality with a quality factor of 100.

Table 4.

Ramachandran plot analysis.

Statistics	Number of A.A residues	Percentage (%)
Residues in the most favored regions. [A, B, L]	107	94.7
Residues in the additional allowed regions [a, b, l, p]	5	4.4
Residues in the generously allowed regions [∼a, ∼b, ∼I, ∼P]	1	0.9
Residues in disallowed regions	0	0.0
Number of nonglycine and nonproline residues	113	100
Number of end-residues (excl. Gly and Pro)	2
Number of glycine residues (shown as triangles)	6
Number of proline residues	2
Total number of residues	123

Figure 5.

Quality assessment of the model of Ramachandran plot of model structure validated by PROCHECK program.

Figure 6.

Graphical representation of QMEAN result (a), Z-score (b), and Verify3D result (c) of the model structure.

Active site determination and molecular docking analysis

The active site was predicted by using the CASTp server (Figure 7). CASTp is a database server that can recognize regions on proteins, determine their boundaries, compute the area of the areas, and calculate the dimensions of the areas. Vacuums concealed within proteins and pockets on protein surfaces are also involved. To define a pocket and volume spectrum or vacuum, surfaces of solvent accessible molecules (Richard surface) and molecular surfaces (Connolly surface) are employed. However, the most active site was predicted in to be one of the largest enormous pockets, having 39.371 solvent accessible (SA) surface area and a total volume of 13.707 amino acids. Figure 7(a) displays the key active residues predicted from pockets are LYS³⁰⁶, TYR³⁰⁷, PHE ³⁰⁸, SER ³⁰⁹, GLU ³⁸⁰, LYS ³⁸³, VAL ³⁸⁵, and ASN ⁴¹⁵. Moreover, the ligand binding sites were detected and observed by the web-based server FTsite. This binding site shows the functional relationship between protein-ligands.

Figure 7.

Active site predicted by CASTp server representing (a) the active location of HP in red sphere and active amino acid residues are highlighted in gray color also and (b) the active site chain A.

Docking analysis among Hypothetical proteins and ligands was performed using PyRx 0.9. The ligands were tecovirimat, ribavirin, and cidofovir. The binding affinity of the ligands for the hypothetical protein was −7.1 kcal/mol, −4.7 kcal/mol, and −4.4 kcal/mol. Some interacting residues in the active site were found mentioned in the Table 5. Docking analysis results are shown in Table 6 and Figure 8.

Table 5.

Binding interaction of tecovirimat (CID_ 16124688) with hypothetical protein (URK21192.1).

Protein	Ligand	Van der Waals	Pi-Pi stacked	Alkyl
Hypothetical protein	Tecovirimat (CID_ 16124688)	ASN373, MET375, ARG376, and PHE410	PHE377	PHE354, PHE414, ILE379, VAL384, and PHE414

Table 6.

The docking score of different ligands with hypothetical protein.

Chemical identifier	Chemical name	Docking score (kcal/mol)
CID_ 16124688	Tecovirimat	−7.1
CID_ 100252	Ribavirin monophosphate	−4.7
CID_ 60613	Cidofovir	−4.4

Figure 8.

(a) Predicted three-dimensional structure of the target protein binding with the ligand visualized by BIOVIA Discovery Studio. It shows a ligand (CID_16124688) bound with hypothetical protein URK21192.1 and (b) 2D structure of protein-ligand binding complexes.

Analysis of molecular dynamics

Molecular dynamics (MD) simulation plays a crucial role in post-dock analysis, enabling the exploration of time-dependent stability and atom movements within the biological environment. Essential analyses within MD simulations include RMSD, RMSF, SASA, and RG, collectively offering a comprehensive grasp of the molecular behavior of protein-ligand complexes. These analyses were conducted following a 100 ns dynamics trajectory for the monkeypox protein 6zyc.1 and the ligand tecovirimat (CID_ 16124688).

RMSD analysis

RMSF stands for root mean square fluctuation. Root means square fluctuations (RMSFs) determine the individual amino acids in a complex system. It calculates individual residue flexibility or how much a particular residue moves (fluctuates) during a simulation.³⁰ Based on the atomic root mean square deviation (RMSD) of protein 6zyc.1 backbone, we can evaluate the change in displacement of selected atoms during a period.³¹ The RMSD (Figure 9(a)) shows that the value of Tecovirimat with the 6 zyc.1 from 0 to 50 ns was stable, but after that, it was less stable. A sudden fluctuation was seen at 30ns of 0.534 Å. The maximum, minimum, and average RMSDs of 6zyc.1 are 0.855 Å, 0.126 Å, and 0.421 Å.

Figure 9.

Illustrating that RMSD, RMSF, Rg, and SASA values of the complex structure extracted from 6yzc: CID_16124688 concerning 100 ns simulation time.

RMSF analysis

RMSF stands for root mean square fluctuation. RMSFs determine the individual amino acids present in a complex system. It calculates of individual residue flexibility or how much a particular residue moves (fluctuates) during a simulation.³⁰ It showed continuous minor fluctuations around 3 Å during simulation (Figure 9(b)).

Radius of gyration (Rg)

The radius of gyration is the measurement of the distance of the root, which means square radial, from the center of mass of the target protein to both of its terminals. By this measurement the compactness, of the protein-ligand complex can be understood from the value of protein mobility and rigidity with the corresponding ligands.³² The protein-ligand complex was stable to the whole of the 100 ns (Figure 9(c)). The minimum, maximum, and average Rg values of tecovirimat (CID_ 16124688) are, respectively, 4.984 Å, 4.689 Å, and 4.86 Å which demonstrate higher mobility and rigidity of 6 zyc.1 with the compound.

Solvent accessible surface area (SASA)

The solvent accessible surfaces of protein-ligand complexes explain their solvent-like behavior (hydrophobic or hydrophilic).³² Each complex was plotted based on its surface area, which is assessable to solvent molecules (SASA) in (Figure 9(d)). The figure demonstrates good tight bonding of the compounds with the hypothetical protein.

Discussion

Monkeypox (MPX) is a zoonotic disease, but its animal reservoir remains unknown. Various rodent species from Central and West African tropical rainforests, including tree squirrels and Gambian pouched rats, are currently considered to be strong candidates.^33,34 However, monkeypox viruses belong to the family of Poxviridae, a subfamily of Chordopoxviridae, and the genus of orthopoxvirus.³⁵ There are two genetic clades, with genomes differing by less than 1%. The first clade is endemic in Central Africa, and the second in West Africa.^36,37 Various symptoms of MPX include fever, head and muscle aches, lymphadenopathy, and a characteristic rash that develops into papules, vesicles, and pustules, which eventually scab over and heal. Unfortunately, there are no licensed vaccines available for MPX, but researchers are striving to develop a Monkeypox vaccine. Therefore, over 70% of people living today are never vaccinated against smallpox.³⁸

Research on hypothetical proteins has yet to keep up with the rapid development of low-cost sequencing technology despite the enormous amount of genomic and proteomic data that has been generated.³⁹ Characterization of hypothetical proteins can improve the knowledge of viral metabolic pathways, disease progression, drug development, and disease control strategies.⁴⁰ In this study, we analyzed physicochemical properties, and the protein was estimated to contain 426 amino acids with a molecular weight of 49147.14, theoretical pI 5.62, aliphatic index of 92.77, grand average of hydropathicity (GRAVY) of −0.258, and the instability index of the target protein 36.35 (Table 1).

In this investigation, domain and motif analysis predicted our target hypothetical protein to be a Chordopox_A20R domain-containing protein by all the annotation tools with high-level confidence (Table 2). The BLASTp results against the non-redundant (nr) database showed homology (above 99% sequence similarity) with other known A22 R (Monkeypox virus Zaire-96-I-16) from different Monkeypox virus Zaire-96-I-16, validating the prediction (Table 3).

There are several Chordopoxvirus A20R proteins in this family. The viral proteins encoded by the D4R, D5R, and H5R open reading frames directly interact with the A20R protein, contributing to assembly or stability of the multiprotein DNA replication complex.³⁰ Furthermore, it was estimated that the viral protein has a necessary mode of action for DNA replication and is linked with the processive form of the viral DNA polymerase. A20R may influence the multiprotein DNA replication complex’s stability or ability to assemble. Interestingly, D4R, D5R, and H5R encode uracil DNA glycosidase, nucloeside-triphosphatase (NTPase), and late transcription factor VLTF-4, respectively. It was also suggested that unlike VLTF-1, -2, and -3, that which were synthesized with elevated levels after viral DNA replication, VLTF-4 is synthesized both before and after viral DNA synthesis. However, observing its expression pattern and subcellular distribution, it was reported that the H5R gene product may have different role for a productive infection throughout the viral life cycle.⁴¹ Uracil DNA glycosidase in the poxviral protein is responsible for the repairing of the viral DNA genome, involving removing uracil from DNA to crate apyrimidinic site.⁴²

The secondary structure includes random coils, alpha helices, beta-turns, and extended strands, with the alpha helices being the predominant form. The protein’s three-dimensional (3D) structure obtained using the SWISS-MODEL server successfully passed all model quality assessment tools, such as PROCHECK, Verify 3D QMEAN, and ERRAT. The 3D structure became more stable after YASARA energy minimization was 0.39. Accordingly, the Z-score was calculated −6.0, indicating the overall model quality is good (Figure 6(a)–(c)). Local model quality was observed as a stable profile in terms of the regarding knowledge-based energy of the residues.

The protein’s folding arrangement creates a pocket or groove known as the active site. However, the active amino acid residues were computed by CASTp server and were consistent with the prediction of functional annotation tools and lie in the Chordopoxvirus A20Rsuperfamily domain region. Its three-dimensional structure, as well as and the electrical and chemical characteristics of the co-factors and amino acids in the active site. The result of the investigation reported that the active site had a solvent accessible surface area (SASA) of 39.371 and the amino acids volume of 13.707 (Figure 7). Molecular docking was performed by PyRx 0.9 software to know the interaction among the target protein and ligands. These ligands are, respectively, tecovirimat, cidofovir, and ribavirin. A strong solid binding affinity was found for the hypothetical protein and tecovirimat. Their docking score was comparatively very well. It was about −7.1. The 3D and 2D structures with protein-ligand (HP and tecovirimat) complex are placed in Figure 8. Many binding interactions of tecovirimat (CID_ 16124688) with hypothetical protein (URK21192.1) are presented in the Table 5; the ligand binding sites were also observed by FTsite software (Figure 10). Identifying binding sites is a classic problem important significant for many applications, including protein engineering and drug design, structure-based prediction of function, and elucidating functional relationships between proteins.⁴³ Molecular dynamic simulation was done with the best docking result compound tecovirimat (CID_ 16124688). It showed comparatively good results, so we went for the simulation. The result shows moderate level of stability after analyzing the whole molecular dynamic simulation with this compound. The interactions with ligands were previously reported with varied compounds.^44,45

Figure 10.

Identification of binding site by FTsite after modifying by BIOVIA discovery studio.

This hypothetical protein annotation strategy helps in designing of effective medicines/vaccines against MPX. Studying individual domains and/or effectors also helps to understand antiviral mechanisms. Finally, this clearly underlines the importance of continued research into Chordopoxvirus A20R and it’s their effectors not only in monkeypox but also in other pathogenic microorganisms to develop future treatment strategies.

Conclusion

This protein contains numerous characteristics in the system that are severe threat to the human health. In recent years, significant work has been done made in to understand the functions of the Chordopoxvirus A20R protein. Besides, many structural and functional aspects of this protein and its effectors are still unknown. Annotation of the hypothetical protein may help in design an effective drug or vaccine. The study will also help understand antiviral mechanisms. Further research and experimental validation are needed to confirm our findings about this crucial protein.

Footnotes

Acknowledgments

We thank to the Department of Biotechnology and Genetic Engineering, Noakhali Science and Technology University, Noakhali-3814, Bangladesh, for giving us opportunities to use the facilities to conduct the research. Moreover, we are very thankful to Md. Aktaruzzaman and the Laboratory of Pharmaceutical Technology, Department of Pharmacy, Jashore University of Science and Technology, Jashore, Bangladesh for partially providing us with computer-based support.

Author contributions

MIH and MNI performed the experiments, interpreted the data and wrote the draft manuscript. MEM, MNI, ASMF, and SHJ analyzed the data and edited the draft manuscript. MUSK, THE, MAH, FA, and MAK reviewed and edited the manuscript. SHJ formatted the figures and tables. MAM contributed to the study conceptions, design, analysis of the data, review, and editing manuscript. All the authors have approved the final version.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Sadia Hossain Jeba

References

McCollum

Damon

. Human monkeypox. Clin Infect Dis 2014; 58:260–267.

Alakunle

Moens

Nchinda

, et al. Monkeypox virus in Nigeria: infection biology, epidemiology, and evolution. Viruses 2020; 12:1257.

Chavda

Vora

Apostolopoulos

. Monkeypox: a new face of Outbreak. Expert Rev Vaccines 2022; 21:1537–1540.

Nolen

Osadebe

Katomba

, et al. Extended human-to-human transmission during a monkeypox outbreak in the Democratic Republic of the Congo. Emerg Infect Dis 2016; 22:1014–1021.

Khodakevich

Ježek

Messinger

. Monkeypox virus: ecology and public health significance. Bull World Health Organ 1988; 66(6):747–752.

World Health Organization . Multi-country monkeypox outbreak in non-endemic countries. Geneva: World Health Organization 2022. https://www.who.int/emergencies/disease-outbreak-news/item/2022-DON385 (accessed 29 October 2023).

Centers for Disease Control and Prevention . 2022 Mpox outbreak global map. Atlanta, GA: Centers for Disease Control and Prevention 2023. https://www.cdc.gov/poxvirus/Mpox/response/2022/world-map.html (accessed 29 October 2023).

Bragazzi

Kong

Mahroum

, et al. Epidemiological trends and clinical features of the ongoing monkeypox epidemic: a preliminary pooled data analysis and literature review. J Med Virol 2023; 95:e27931.

Choi

H-P

Juarez

Ciordia

, et al. Biochemical characterization of hypothetical proteins from helicobacter pylori. PLoS One 2013; 8:e66605.

10.

Morozova

Marra

. Applications of next-generation sequencing technologies in Functional Genomics. Genomics 2008; 92:255–264.

11.

Shahbaaz

Bisetty

Ahmad

, et al. Current advances in the identification and characterization of putative drug and vaccine targets in the bacterial genomes. Curr Top Med Chem 2016; 16:1040–1069.

12.

Lubec

Afjehi-Sadat

Yang

J-W

, et al. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 2005; 77:90–127.

13.

Kootery

Sarojini

. Structural and functional characterization of a hypothetical protein in the RD7 region in clinical isolates of mycobacterium tuberculosis — an in silico approach to candidate vaccines. J Genet Eng Biotechnol 2022; 20: 55.

14.

Expasy

. The Proteomics Server for in-depth protein knowledge and analysis. Nucleic Acids Res 2003; 31:3784–3788.

15.

Wang

Chitsaz

, et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res 2020; 48(D1):D265–D268.

16.

Mistry

Chuguransky

Williams

, et al. Pfam: the protein families database in 2021. Nucleic Acids Res 2020; 49(D1):D412–D419.

17.

Blum

Chang

H-Y

Chuguransky

, et al. The InterPro protein families and domains database: 20 Years on. Nucleic Acids Res 2020; 49(D1):D344–D354.

18.

Bateman

Coin

Durbin

, et al. The PFAM protein families database. Nucleic Acids Res 2004; 32:D138–D141.

19.

Altschul

Gish

Miller

, et al. Basic local alignment search tool. J Mol Biol 1990; 215:403–410.

20.

Geourjon

Deléage

. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics 1995; 11:681–684.

21.

Jones

. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999; 292:195–202.

22.

Schwede

Kopp

Guex

, et al. Swiss-model: an automated protein homology-modeling server. Nucleic Acids Res 2003; 31:3381–3385.

23.

Krieger

Joo

Lee

, et al. Improving physical realism, stereochemistry, and side‐chain accuracy in homology modeling: four approaches that performed well in CASP8. Proteins 2009; 77:114–122.

24.

Laskowski

MacArthur

Moss

, et al. Procheck: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 1993; 26:283–291.

25.

Lüthy

Bowie

Eisenberg

. Assessment of protein models with three dimensional profiles. Nature 1992; 356:83–85.

26.

Benkert

Biasini

Schwede

. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 2011; 27:343–350.

27.

Colovos

Yeates

. Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 1993; 2:1511–1519.

28.

Wiederstein

Sippl

. Prosa-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 2007; 35:W407–W410.

29.

Tian

Chen

Lei

, et al. CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res 2018; 46:W363–W367.

30.

Fatriansyah

Rizqillah

Yandi

, et al. Molecular docking and dynamics studies on propolis sulabiroin-A as a potential inhibitor of SARS-CoV-2. J King Saud Univ Sci 2022; 34(1): 101707.

31.

Doty

Malekani

Kalemba

, et al. Assessing monkeypox virus prevalence in small mammals at the human–animal interface in the democratic republic of the Congo. Viruses 2017; 9:283.

32.

Tiee

Harrigan

Thomassen

, et al. Ghosts of infections past: using archival samples to understand a century of monkeypox virus prevalence among host communities across space and Time. R Soc Open Sci 2018; 5: 171089.

33.

Lamb

Kolakofsky

. Paramyxoviridae: the viruses and their replication. In Fields

Knipe

Howley

(Editors). Fields Virology. Philadelphia: Lippincott-Raven Press 1996.

34.

Chen

Liszewski

, et al. Virulence differences between monkeypox virus isolates from West Africa and the Congo Basin. Virology 2005; 340:46–63.

35.

Likos

Sammons

Olson

, et al. A tale of two clades: monkeypox viruses. J Gen Virol 2005; 86:2661–2672.

36.

Kmiec

Kirchhoff

. Monkeypox: a new threat? Int J Mol Sci 2022; 23:7866.

37.

Low

Mohtar

Ang

, et al. Connecting proteomics to next‐generation sequencing: proteogenomics and its current applications in biology. Proteomics 2019; 19:e1800235.

38.

Sen

Verma

. Functional annotation and curation of hypothetical proteins present in A newly emerged serotype 1c of Shigella flexneri: emphasis on selecting targets for virulence and vaccine design studies. Genes 2020; 11:340.

39.

Ishii

Moss

. Mapping interaction sites of the A20R protein component of the vaccinia virus DNA replication complex. Virology 2002; 303(2):232–239.

40.

Delhon

Tulman

Afonso

, et al. Genomes of the parapoxviruses ORF virus and bovine papular stomatitis virus. J Virol 2004; 78:168–177.

41.

Upton

Stuart

McFadden

. Identification of a poxvirus gene encoding a uracil DNA glycosylase. Proc Natl Acad Sci U S A 1993; 90:4518–4522.

42.

Uddin

Niloy

Aktaruzzaman

, et al. Neuropharmacological assessment and identification of possible lead compound (apomorphine) from Hygrophila spinosa through in-vivo and in-silico approaches. J Biomol Struct Dyn 2024; 22: 1–16.

43.

Mahmud

Biswas

Kumar Paul

, et al. Antiviral peptides against the main protease of SARS-CoV-2: a molecular docking and dynamics study. Arab J Chem 2021; 14(9): 103315.

44.

Fakhar

Khan

AlOmar

, et al. ABBV-744 as a potential inhibitor of SARS-CoV-2 main protease enzyme against COVID-19. Sci Rep 2021; 11:234.

45.

Kumar

Bhardwaj

Kumar

, et al. Reprofiling of approved drugs against SARS-CoV-2 main protease: an in-silico study. J Biomol Struct Dyn 2020; 40:1–15.