Abstract
Human papillomavirus 16 (HPV16) is considered to be strongly correlated with the development of cervical cancer. Among the 8 HPV16 genes, the E6 constitutes a remarkable marker to follow the evolutionary history and spatial phylodynamics of HPV16 in the Mediterranean basin. Thus, this work aims to decipher the major evolutionary events and crosstalks in the Mediterranean basin with a focus on Tunisian strains regarding the E6 oncogene. In this study, we first extracted the available and annotated Mediterranean strains of HPV16 E6 gene sequences (n = 155) from the NCBI nucleotide database. These sequences were aligned, edited, and used for the downstream phylogenetic analyses. Finally, a Bayesian Markov Chain Monte Carlo approach was applied to reconstruct the evolutionary history of HPV16 migration. Our results showed that the HPV circulating in Tunisia derived from a Croatian ancestor around the year 1987. This starting point spreads to most European countries to reach northern Africa through the Moroccan gateway in 2004.
Introduction
Human papillomavirus (HPV) is the most common sexually transmitted virus. Around 10% of the infected women will undergo precancerous lesions in their cervical tissue (Yim and Park, 2005) 1 . Depending on their oncogenic potential, HPVs can cause either genital warts when associated to low-risk types (6, 11, 40, 42, 43, 44, and 54) or cervical cancer when it comes to high-risk alpha-HPV types (16, 18, 31, 33, 35, 39, 45, 51, 52, and 58). 2 HPV16 and 18 are the most common high-risk HPV genotypes, accounting for nearly 80% of all cervical cancer cases. 3 This cancer is the fourth most common cancer affecting women worldwide, 4 with an estimation of 604 000 new cases in 2020, mainly affecting women aged from 45 to 59 years. 5 A scopus on the Tunisian prevalence for this cancer positioned it as the fifth cancer affecting women in the country representing 5.74% of total cancer cases in 2020, leading to 185 deaths per year. 6
This double-stranded DNA (dsDNA) virus is organized into 3 genomic regions: the early (E), the long control region (LCR), and the late (L) region. Its genome has shown a slow evolution rate compared with other virus families. Indeed, HPV coding region substitution rates are estimated to be between 2 × 10−8 and 5 × 10−9 substitutions per site per year. 7 Typically occurring in dsDNA viruses, this low rate results from the absence of DNA polymerase, prompting the virus to hijack the host’s cellular repair machinery during replication. The E6 oncogene, encoding 158 amino acids, is involved in cell cycle alteration and cell differentiation inhibition, 8 making it a suitable genetic marker for tracking phylogenetic and phylogeographic evolution of papillomaviruses. Previous phylogenetic analyses focused on the prevalent HPV serotypes in Tunisia and North Africa characterized solely the lineages. 3 To our best knowledge, the transmission routes of this virus were not yet elucidated. This work represents an unprecedented phylogeny and phylogeography analysis of the virus, aiming to explore the evolutionary processes of the HPV16 E6 gene that shape the genetic diversity of the HPV strains occurring in Tunisia and its surrounding Mediterranean areas.
Material and Methods
Dataset
Human papillomavirus 16 (HPV16) E6 gene sequences were extracted from the GenBank database. The sequences used for the phylogenetic analysis belonged to the following countries: Tunisia (n = 15), Croatia (n = 4), Greece (n = 16), Spain (n = 15), Italy (n = 34), Egypt (n = 4), Algeria (n = 20), and Morocco (n = 71). Sequences from Egypt and Algeria were not retained in the phylodynamic analysis, as there is no indication of isolation or sampling dates.
Phylogenetic analyses
A total of 179 sequences were aligned using MUSCLE (Edgar, 2004) 9 . Multiple sequence alignment was manually edited using BioEdit software (Hall, 1999) 10 .
The F81 substitution model was inferred as the most appropriate by applying the jModelTest v2.1.10 program (Posada, 2008) 11 . The phylogenetic analysis was performed using PhyML v3.3.3 (Guindon and Gascuel, 2003) 12 , and the resulting tree was visualized with FigTree v1.4.3. To examine the temporal signal in time-stamped sequences, the maximum likelihood (ML) tree was analyzed with TempEST v.1.5.3. 13
The time of the most recent common ancestor (tMRCA) estimates were calculated as the height 95% highest posterior density (HPD) years before the most recent sampling dates.
Phylogeographic analyses
The BEAST v1.8.4 package was used to construct Bayesian maximum clade credibility (MCC) phylogenetic trees for the E6 gene. To infer the geographic origins of Tunisian HPV16 viruses, the location trait was analyzed as a discrete trait diffusion model. To estimate the transition rates between different locations, a Bayesian stochastic search variable selection (BSSVS), with a symmetrical discrete trait substitution model with a strict clock assumption was used. The Markov chain Monte Carlo (MCMC) was run under an Extended Bayesian Skyline model for 200 million generations, sampling every 1000 states. An effective sample size (ESS > 200) was observed for the major parameters. The ESS values were evaluated with the TRACER v1.8.4 tool. TreeAnnotator v1.8.4 was used to generate the MCC tree.
Ultimately, the MCC tree was visualized using the FigTree v1.4.3 and Nextstrain programs, thus allowing the computing of the tMRCA and the HPD at 95% for Tunisian HPV16 viruses. A posterior probability threshold of (pp) > .95 was used to assess the tree nodes with SPREAD3. 14
To study the spatiotemporal diffusion of HPV, a Keyhole Markup Language (KML) file was generated from the MCC tree with SPREAD3 14 and then visualized with Google Earth Pro.
SPREAD3 was also used to calculate the Bayes factors (BFs) for pairwise diffusion rate between locations. BFs greater than 3 were considered as statistically satisfying.
Results and Discussion
Predominance of European lineage among the Mediterranean HPV16 isolates
The generated phylogenetic tree showed that the isolates were separated into clades corresponding to HPV16 lineages (Figure 1). Most of the isolates formed a cluster with lineage A, except for the Egyptian ones. Among these isolates, Tunisian HPV16 variants belonged to the European German sublineage (A2) while Moroccan variants were represented ubiquitously in all A sublineages (European, German, and Asian). Overall, 12 sequences from Algeria, Egypt, Greece, Italy, and Spain were grouped with the American lineage (D), while 6 Algerian and Spanish sequences were derived from the African lineage 1 (B) sequences. Meanwhile, 3 Greek and Spanish strains were classified with African lineage 2 (C).

Phylogenetic analysis of the E6 gene with selected HPV16 sequences.
These results suggest that the variants circulating in these areas (Algeria, Egypt, Italy, Spain, and Greece) could be recombinant strains. Such results might be explained by the heterogeneous mixture of these populations and by the migration patterns between countries, which led us to perform further phylodynamic analysis.
HPV16 evolves more rapidly than other subtypes
Through the bayesian analysis, we estimated the mean evolutionary rate to 5.532 × 10−4 substitutions/site/year (95% HPD interval 2.4626 × 10−4, 8.8793 × 10−4 substitutions/site/year) This is in line with the findings by Firth et al 15 reporting a value of 4.94 × 10−4 (95% HPD interval 0.7490 × 10−4, 9.812 × 10−4 substitutions/site/year) based on the L1 gene. In contrast, other studies have reported that HPV mutation rates vary in the range of 10−8 to 10−7 substitution/year, 16 thus suggesting that HPV16 evolves more rapidly. Our findings might be explained by the selective force exerted by vaccines on high-risk strains of HPV, thus leading to a competition between strains to be maintained in the niche (Paul A. Orlando et al 2011) 17 .
2005 as the pivotal year of population increase
Herein, we estimated that the population size followed 3 distinct phases (Figure 2A). The initial phase showed that the population size remained constant from the date of our sampling to mid-2004. The second phase showed a log-equivalent increase in population size in 2005. The population size then remained stationary prior to a phase of decrease that occurred in 2012.

Estimated evolution of HPV16 in the Mediterranean countries. (A) Bayesian skyline plot of HPV16. The Bayesian Skyline plot reconstruction of the E6 gene shows the evolution in population size. Median (dark line) and upper and lower 95% HPD (blue region) estimates of effective population size (y-axis) through time in years (x-axis) are shown. (B) Spatiotemporal dynamics of HPV16 circulating in the Mediterranean region between 1987 and 2016. (C) The MCC tree of E6’s gene HPV16 sequences summarizing all of the trees obtained during the MCMC search.
The HPV evolutionary trajectory from Europe to North Africa
As shown in both Figures 2B and C, HPV16 emerged in Croatia in 1987 (Table 1), then spread to Italy in 1994 prior to an appearance in Spain 5 years later in 1999. These transitions were strongly supported with BF = 37.8; pp = .95 and BF = 1419.8; pp = .99, respectively. Interestingly, a transition was also observed between Greece and Spain with BF = 2371.4 and pp = 1. Our analysis also suggests that the observed strains arose in Greece in the early 2000s (BF = 2371.3, pp = .99), potentially explaining the previously observed prevalence increase in Greece in 2008 18 with a peak reached in 2011. 19 Interestingly, the period between 2010 and 2011 coincided with an increase in HIV infections, thus leading to higher prevalence of HPV (Lui, 2018) 20 .
Estimated tMRCA of HPV16 based on E6 gene for all the countries.
Abbreviations: HPD, highest posterior density; HPV16, human papillomavirus 16; tMRCA, the most recent common ancestor.
In our study, the introduction to North Africa was initiated through Morocco in 2004 from Greece (BF = 5.1; pp = .76) and Spain (BF = 0.7; pp = .31). Despite a low BF, the transition between Spain and Morocco remains plausible, given the fluctuations and the very active Spanish–Moroccan cross-border population exchanges.
Since the introduction of the virus in Morocco in 2004 (Figure 2B), the size of the viral population has continued to grow (Figure 2A). Indeed, the prevalence of HPV16 and 18 in Moroccan women with cervical cancer increased from 51.7% in 1999 to 2000 to 89.6% in 2009 to 2012, according to the studies conducted in Rabat and Casablanca. 21 Finally, HPV16 reached Tunisia in 2011 from Morocco. The latter transition is statistically significant (BF = 34.6; pp = .95).
The latter transition should not overshadow other potential transitions. Indeed, an analysis of the phylodynamic patterns of HPV58 showed that the introduction of HPV was initiated from West Africa. 22 A synchronous transition emanating from this part of the continent for the HPV16 remains possible, especially since the migratory flows toward Tunisia have been very important during the last decade.
Sampling bias
Sensitivity analysis was performed using a “randomized tip swap” approach to evaluate the influence of sampling bias for the location trait diffusion model. 23
The main analysis predicted that Croatia was the most likely ancestral root (root state probability). As shown by the comparison of the root state posterior probability (Table 2) between the main analysis and the random “tip swap” analysis, 23 this result was not influenced by sample selection bias (main analysis, pp = .93; tip swap analysis, pp = .02).
Mean transition rates estimation between main and tip swap analysis.
Only 2 identified transitions in spatiotemporal dynamics appeared to be affected by sampling, as they were characterized by a large variation in transition rate between the main and tip swap analysis. These biased transitions included the virus transition event between Italy and Spain and the one between Spain and Greece.
Conclusions
Despite a recent renewed interest, there is an obvious gap in the analysis of sequences originating from North Africa. 24 This is particularly true when analyzing the viral phylodynamics of DNA viruses. Indeed, only a marginal number of analyses of this type have been performed on DNA viruses and most focus on RNA viruses given their higher mutation rates. However, phylodynamic analyses applied to slowly evolving pathogens can help elucidate longer-term evolutionary processes, such as speciation or host-pathogen co-divergence, as in the case of HPV. 25 This neglected aspect in the phylogenetic analyses was solely treated by Li et al 22 who characterized the major routes of the HPV58.
This study reported preliminary results of a retrospective spatiotemporal evolution of HPV16 in the Mediterranean basin countries. Our analysis suggests that HPV16 circulating in North Africa had Croatian ancestral strains and reached North Africa by the Moroccan gateway via various transitions. Nevertheless, due to the lack of information and annotations on HPV16 strains from countries sharing a historical border, the estimation of some transitions might be biased. Thus, incorporating E6 sequence strains from Algeria, France, Turkey, and Egypt would certainly increase the accuracy of the viral spread pathway results.
Despite these limitations, this is the first study describing the phylogeny and phylogeography of HPV16 in the Mediterranean countries, which have long witnessed active exchange, whether through colonization or immigration.
Footnotes
Acknowledgements
The authors thank Marwa Arbi, Mariem Hannachi, and Alia Benkahla for their contribution and help.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Laboratory of Bioinformatics, Biomathematics, and Biostatistics (LR16IPT09), Institute Pasteur of Tunis.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
Conceptualization O.S.; data curation, A.S and O.S ; formal analysis and investigation : A.S and O.S ; methodology A.S and O.S ; supervision OS.; writing—review and editing, AS ans OS All authors have read and agreed to the published version of the manuscript.
Information on Patient Consent
Not applicable.
