Abstract
Background:
G-quadruplexes (G4s) are secondary structures in DNA and RNA that impact various cellular processes, such as transcription, splicing, and translation. Due to their numerous functions, G4s are involved in many diseases, making their study important. Yet, G4s evolution remains largely unknown, due to their low sequence similarity and the poor quality of their sequence alignments across several species. To address this, we designed a strategy that avoids direct G4s alignment to study G4s evolution in the 3 species kingdoms. We also explored the coevolution between RBPs and G4s.
Methods:
We retrieved one-to-one orthologous genes from the Ensembl Compara database and computed groups of one-to-one orthologous genes. For each group, we aligned gene sequences and identified G4 families as groups of overlapping G4s in the alignment. We analyzed these G4 families using Count, a tool to infer feature evolution into a gene or a species tree. Additionally, we utilized these G4 families to predict G4s by homology. To establish a control dataset, we performed mono-, di- and tri-nucleotide shuffling.
Results:
Only a few conserved G4s occur among all living kingdoms. In eukaryotes, G4s exhibit slight conservation among vertebrates, and few are conserved between plants. In archaea and bacteria, at most, only 2 G4s are common. The G4 homology-based prediction increases the number of conserved G4s in common ancestors. The coevolution between RNA-binding proteins and G4s was investigated and revealed a modest impact of RNA-binding proteins evolution on G4 evolution. However, the details of this relationship remain unclear.
Conclusion:
Even if G4 evolution still eludes us, the present study provides key information to compute groups of homologous G4 and to reveal the evolution history of G4 families.
Introduction
DNA and RNA can fold onto themselves to form secondary structures. Among these structures, G-quadruplexes (G4s) are stable non-canonical structures, made with Hoogsten pairings instead of Watson-Crick pairings. In G4s, the Hoogsten pairings occurs between 4 guanines to form a G-quartet. These G-quartets can stack on top of each other to create a G4 structure.1,2
Because they occur in both DNA and RNA, 3 G4 structures impact many biological processes. Indeed, DNA G4s are known to influence telomere homeostasis, epigenetics and transcription.4 -6 RNA G4s (rG4s) have been demonstrated to affect several post-transcriptional regulation mechanisms, such as those in messenger RNA (mRNA) with its impact on splicing, polyadenylation, non-coding RNA like miRNA regulation, translation and RNA transport.7 -12 For more information about G4s and rG4s functions, please refer to Varshney et al.’s review. 13
G4 structures are involved in biological mechanisms associated with several pathologies, such as cancer and neurodegenerative diseases.
13
For instance, a G4 can fold at the
All these discoveries stimulated G4s massive prediction and detection. Initially, G4s were predicted using a canonical motif

Schematic representation of a canonical G4 (A) and 3 non-canonical G4s: 2 G quartet G4 (B), long loop G4 (C) and bulge (D).
These computational tools are employed to identify predicted G4s (pG4s) within the entire genomes and transcriptomes of various species. G4s are enriched in telomeres and promoters within the human genome. Additionally, the prediction also showed that rG4s are mainly enriched in 5′UTR of coding transcripts, with a modest enrichment in 3′UTR.31 -33 High-throughput G4s and rG4s detection corroborate most of these findings.34 -36 Yet, neither the prediction nor the detection is perfect. Indeed, the former yields some false negatives (ie, failing to predict G4s in sequences that fold into G4s), while the latter yields some false positives (ie, detecting G4s when they should not). Moreover, it has been shown that G4s in transcriptomes seem to be globally unfolded. 37 This study led to the hypothesis that rG4s have co-evolved with RNA-binding proteins (RBPs), which assist rG4s to stay unfolded when they are not required. RNA G4s would be globally unfolded to avoid negative impact on cell transcriptomes and translations, since their DNA counterpart are known as genomic instability marker.38,39 This hypothesis aligns with the fact that bacteria inhabiting warm environments, where rG4s can freely fold and unfold, possess more rG4s than closely related bacteria in temperate environments. 40
Considering all these elements, studying the evolution of rG4s and their potential coevolution with RNA binding sites of RBPs might help to understand their distributions and functions. Over the last years, several databases containing information on RNA G4 Binding Proteins (RG4BPs) have been made available. For example, G4IPDB contained over 60 RG4BPs, 41 but is not available anymore. More recently, QUADRatlas (https://rg4db.cibio.unitn.it/) was introduced, featuring data on rG4s overlaid with binding sites of RBPs, presenting information on over a thousand RG4BPs are presented. 42 Among these RG4BPs, many are known to have a meaningful impact on biological processes. One well-known example is DHX36, an RBPs known to bind rG4s, and scientists recently reviewed their interactions. 43 This review delves into many functions of the interaction between the helicase DHX36 and its G4 targets, discussing their implications in diseases like cancer, neurodegenerative diseases, and the aging process. Many studies have started to focus on the interaction between RBPs and G4s, and new RG4BPs continually being discovered, such as G3BP1. 44 This discovery was made by comparing rG4-seq data and eCLIP (enhanced version of the CrossLinking and ImmunoPrecipitation (CLIP) assay) data.36,45 All these studies show that the comparison between RBPs and G4s still have a lot to reveal.
Until now, and according to our knowledge, limited research has been dedicated to the evolution of G4s. These studies have highlighted their limited conservation.46 -49 rG4s evolution within transcriptomes remains largely unexplored, apart from looking for the conservation at specific rG4s. For instance, a conserved rG4 within ribosomal RNA in mammals has been reported. 50 The complexity of studying the evolution of G4s and rG4s arises from the challenges of working with low-quality sequence alignments. However, their distribution has been examined across various phyla, including mammals, vertebrate, yeast, eukaryotes, bacteria and archaea).46,51 -54 The most advanced study on G4s’ evolution utilized orthologous genes alignment and suggested that G4s did not seem to be conserved. 47 Nevertheless, recent advances in predicting G4s across entire genomes and transcriptomes have revealed their distribution in every living kingdom and in numerous species.46,51 -54 Since G4s are widespread, a common origin could be possible. We expect to find at least some G4 families shared abroad a spectrum of species, indicating a common evolutionary history.
In the present study, we aim to untie G4s evolution by first using the core principle of the Frees et al 47 method on orthologous genes of eukaryotes, archaea, and bacteria. The principle of this method is to utilize gene alignments of closely related species in order to find related G4s. Secondly, given the apparent importance of RBPs in rG4s folding, we will assess RBPs’ propensity and the locations of their binding sites in conjunction with predicted rG4s of different species to explore their interactions. This final step aims to confirm the possible coevolution between rG4s and RBPs.
Material and Methods
pG4 family identification
Data were retrieved from the Ensembl Compara database. 55 This dataset comprises information on genes and transcripts, including their genomic fasta sequences, and details all homology relationships (paralogs and orthologs) among coding genes from 60 species (with 25 eukaryotes, 12 archaea and 23 bacteria, as outlined in Supplemental data Table 1). To ensure the quality of our data, we filtered homology relationships to avoid low sequence conservation among homologous genes. Consequently, only homologous genes with one-to-one ortholog relationships were retained (ie, genes that are derived from a speciation event and present one copy in each species). This stringent criterion enabled us to obtain good quality gene alignments from multiple distantly related species. Orthologous gene families were retrieved via the default homologies file from the pan genome Ensembl release 46. In total, we extracted 9094 genes belonging to 4763 gene families from the Ensembl Compara database.
Subsequently, these gene sequences were aligned using kAlign (https://www.ebi.ac.uk/Tools/msa/kalign/), a tool for semi-global multiple sequence alignments able to align distantly related sequences. 56 Alignments were filtered based on their conservation identity and their ratio of nucleotides (number of nucleotides of a column in the alignment divided by the number of sequences), because some alignments mainly included aligned gaps due to the high phylogenetic distance between species. Thus, only alignments with an average nucleotide ration exceeding 55% were retained. Within these gene alignments, pG4s were positioned. The prediction of G4s was carried out using G4RNA screener (http://scottgroup.med.usherbrooke.ca/G4RNA_screener/), as explained below. Then, overlapping pG4s in the alignment were detected to identify pG4 families, without taking in account alignment identity. The alignments were not manually adjusted to make the pG4s coincide. Figure 2A summarizes the main steps of the pG4 family identification process. We used the G4RNA screener on the sequences of genes to predict G4s. 28 Default parameters were used for the prediction: windows of 60 nucleotides, step of 10 nucleotides between windows, threshold of 0.9, 0.5 and 4.5 for respectively G4 hunter, G4NN and cGcC. The G4s prediction process was previously reported in.33,54

Schematic representation of the main methods. (A) The pG4 family retrial process comprises 3 steps: alignments of orthologous genes, positioning of pG4s, and then identification of pG4 families through detection of overlapping pG4s. The panel (B) demonstrates the gene sequences shuffling used to know how much G4s are predicted by chance. (C) Shows randomly relocated prG4s dataset, which consist in getting the total number of prG4s in a species, and randomly relocate them in transcripts of the species.
Defining pG4 families as groups of overlapping pG4s within gene alignments also allowed to predict additional pG4s by homology, thereby increasing the number of pG4s in pG4 families. In the alignments columns where some gene sequences had pG4s and some other did not, we used GGRS Mapper (https://bioinformatics.ramapo.edu/QGRS/index.php) 57 and G4RNA screener to find pG4s in the genes with missing pG4s. The G4RNA screener scanning results in this step were like the initial one, but the prediction made using QGRS Mapper with the default parameters predicted more G4s (see Supplemental data Table 1). In total, 36 548 G4s were predicted, with 14 569 identified by G4RNA screener and 22 019 through homology. These computational steps were conducted on a cluster of computers provided by the Digital Research Alliance of Canada.
The pG4 family identification have been limited to DNA pG4s due to disparities in the transcript annotation across different species. This incongruity has made the interpretation RNA pG4 families identification too challenging at present.
pG4s trees computation and comparison with gene trees
pG4 family alignments were computed using Align AI package from the Biopython library, version 1.79. 58 For each pG4 family, we computed a phylogenetic tree with the PhyML option of SeaView (http://pbil.univ-lyon1.fr/software/seaview3) and the default parameters based on pG4s sequences alignment. 59 To facilitate visual comparisons between pG4 family trees and their corresponding gene trees, the branches of pG4 family trees were swapped to closely resemble the gene trees. Next, we used a custom R script to generate mirror trees with the gene trees and pG4 family trees to help the visual comparison. Finally, the python library ete3 was employed to calculate the normalized Robinson-Foulds distance between pG4 family trees and gene trees.60,61 This metric enables the quantification of the distance between phylogenetic trees, with higher values indicating grater dissimilarity.
In our analysis, species are organized into species trees constructed using super tree methods to combine several species trees from studies.62 -65 These trees were used as relational indicators between the species. Within these trees, we display pG4s densities, which were computed by normalizing the number of pG4s in a species by the length of genes and expressed in kilo base pair (kbp). This normalization process helps mitigate biases of species/genes having longer genes sequences, which might have a higher likelihood of containing pG4s.
RBP data retrieval and process
To compare RBPs CLIP data and pG4s, RNA pG4s were used. This was achieved by considering transcript locations rather than gene locations. As mentioned previously, G4s are predicted using G4RNA screener, a tool primarily designed for RNA G4s prediction, although it is also capable of predicting DNA G4s.
54
To prevent any confusion, pG4s refer to DNA pG4s, while prG4s denotes RNA pG4s. prG4 seconds data were compared with the RNA binding sites of RBPs obtained from CLIP data. We used the CLIP data from ENCORE (https://www.encodeproject.org) and POSTAR (http://111.198.139.65), which are derived from Cross-Linking Immuno Precipitation experiments focused on the binding of an RBP of interest to RNA45,66,67 (Supplemental data Tables 2 and 3). In essence, when RBPs are bound to transcripts, a digestion was carried out to remove the unbound RNA. Then, immunoprecipitations were made on the RBPs. The recovered RNA sequences were then sequenced. This procedure led to mapping the RNA binding sites of RBPs onto the transcriptome. For
Shuffle and random datasets
To ensure the validity of our results, we generated different random or shuffled datasets. A first type of dataset was designed to evaluate the construction of pG4 families. In this dataset, we generated multiple shuffled sequences to establish a control for the normal density of pG4s. This comparison allowed to control if there were more or less pG4 than what was expected by chance. Additionally, the shuffled density can be considered as a background that can be subtracted from the normal density. To accomplish this, we used the python library “ushuffle” to shuffle entire genes sequences. 69 We generated 3 types of shuffles: mononucleotide, dinucleotide and trinucleotide (ie, 1-, 2-mer and 3-mer nucleotide shuffling). Each of these shuffling processes was repeated 10 times and only the average appears. This results in most case in decimal numbers.
The second dataset was generated to compare CLIP data and to evaluate if RBPs are binding rG4s more or less than expected by chance. Hence, the number of prG4s of each species was retrieved from the GAIA database (https://gaia.cobius.usherbrooke.ca/) and was randomly relocated in the transcripts of the species, creating a “fake prG4” dataset. 70 Then, the fake prG4s were randomly selected, and their locations were randomly allocated throughout the transcriptome. The random data set was generated 10 times, and Figure 2C provides a simplified illustration of the concept.
Results
pG4 evolution inside species trees
The initial objective of our study was to get an overview of pG4 evolution. The evolution of G4s and rG4s remains unresolved. Therefore, we decided to initially focus on gene level since more information is currently available regarding their evolution. Our strategy was to identify families of pG4s within orthologous genes. Therefore, pG4 families were computed through multiple sequence alignments of orthologous genes (refer to Figure 2A for the illustrated method). We identified overlapping pG4s in these alignments as pG4 families, without requiring a minimum overlap. Figure 3 illustrates the overall data. When comparing the prediction of G4s in normal and shuffle datasets, more G4s are predicted than expected by chance in eukaryotes. Conversely, for archaea and bacteria there are as many, or fewer pG4s than expected by chance. These observations were in good agreement with a previous study,
54
where a negative pressure of selection was observed in prokaryotes while positive in eukaryotes. This result is further discussed in the discussion. Also,

pG4 evolution inferred in a species tree: (A) eukaryote species, (B) archaea species, and (C) bacteria species. Normal densities (pG4/kbp) appear in blue and the average of the mononucleotides shuffled density (pG4/kbp) appears in green. Numbers displayed at the nodes of trees indicate the count of conserved pG4 families, while numbers at the tree leaves indicate the number of predicted G4s within the orthologous groups.
In prokaryotes, pG4s densities in the shuffled dataset are overall higher than in the normal ones (see Figure 3B and C). This phenomenon may be attributed to the absence of certain mechanisms in prokaryotes, potentially rendering pG4s a non-advantageous feature for them, leading to a negative selection pressure against them. However, in the case of archaea, more pG4s families are conserved in the normal dataset in common ancestor than in the shuffled one. For instance, between
When comparing species groups, we observe that the number of conserved families are higher in eukaryotes. However, in prokaryotes, pG4s families exhibit conservation in more ancient common ancestors. For example, in eukaryotes, the most ancient conserved families are found in plants or vertebrates, but none are shared by all animals or fungi (Figure 3A). In contrast, in bacteria at least one pG4 family is conserved in the most ancient common ancestors. As a reminder, species trees were built using a super tree method, see Material and Methods for more information.
Globally, these results also appeared when comparing the predictions on the normal dataset to the prediction on dinucleotide and trinucleotide shuffling datasets (see Supplemental Figures 1 and 2). Comparing the results for bacteria (Figure 3C, Supplemental Figures 1C and 2C), the number of conserved pG4s families for the common ancestor between
Since the quality of alignments was good for most orthologs gene groups after filterin, we conducted a detailed inspection of the sequence alignments for the 5 most conserved pG4 families in eukaryotes and prokaryotes (for more information on these families, please refer to Supplemental Figure 3). One alignment exhibited numerous insertions and deletions, and this alignment exclusively included eukaryotic genes (Supplemental Figure 3E). This observation can be attributed to the presence of long introns between exons in eukaryotic genes, which are less conserved in sequences. 71 Across all these alignments, we found that the regions of G-tracts were consistently well-aligned and conserved, even in genes where no G4s were initially predicted. As an example, consider Supplemental Figure 3A, where the gene FTT_0154 had a pG4 with the initial prediction, but GSU1819 did not, despite having some conserved G-tracts. However, the regions between the G-tracts, which represent putative loops of pG4s, were less conserved than G-tracts. Surprisingly, G-tracts were found to be most conserved in the gene group with the fewest predicted G4s predicted G4s (Supplemental Figure 3C). Based on this observation, a G4 prediction was made within these genes at the location of a pG4 family using QGRS Mapper with a wide motif. 57 The results revealed that this homology-based prediction approach helped identify more conserved pG4s (refer to Figure 4). For the expanded pG4 families, the gene tree and the pG4 family tree were compared (see Supplemental Figure 4). In some cases, the topology of the trees was very similar, while in others, the trees were very different. This distinction was confirmed by the mirror tree, which represents the relationship between the pG4 family tree and the gene tree, and the normalized Robinson-Foulds measure (RF). 61 The RF metric quantifies the difference between the sets of clades of 2 phylogenetic trees, providing an estimate of the distance between them. For instance, in Supplemental Figure 4A, the trees diverged considerably and exhibited an RF value of 0.79, whereas in Supplemental Figure 4D and 4E, the trees displayed few changes and had an RF value of 0.50. This demonstrates that the evolution of pG4 families and the gene families is not always congruent. Therefore, pG4 families may evolve differently from the genes that contain them.

pG4 evolution predicted using homology inferred in species tree: (A) eukaryote species, (B) archaea species, and (C) bacteria species. Normal densities (pG4/kbp) appear in blue and the average of the mononucleotides shuffled densities (pG4/kbp) appears in green. Numbers displayed at the nodes of trees indicate the count of conserved pG4 families, while numbers at the tree leaves indicate the number of predicted G4s within the orthologous groups.
Subsequently, the evolution of the expanded pG4 families in species trees was investigated using the same strategy employed for the initial prediction. The pG4 family’s evolution was inferred in species trees using Count, and the results are presented in Figure 4. As anticipated, pG4 densities in the normal dataset were higher compared to the initial prediction without homology-based prediction. However, the number of G4s predicted in this condition for the shuffled dataset remained similar to the initial prediction. Therefore, the normal densities were higher than the shuffled ones or at similar levels for prokaryotes. This was in contrast to the previous observation where pG4s densities were higher in the shuffled dataset than in the normal dataset. Among eukaryotes, pG4s exhibited higher conservation compared to the initial prediction. For example, there were initially 476 pG4 families between
Relationship between prG4s and RBPs
Previous studies highlighted differences in prG4s densities between eukaryotes and prokaryotes,37,54 yet the underlying reasons remain unclear. The prevailing hypothesis suggests that RBPs, particularly helicases, might account for this difference by potentially unfolding rG4s in the human transcriptome. 37 Therefore, our next objective was to investigate potential coevolution between RBPs and prG4s. Our strategy involved analyzing the relationship between prG4s densities obtained here and annotated RBPs from the RBP2GO database (https://rbp2go.dkfz.de). 72 The results suggest a complex relationship between these factors. Figure 5 presents the ratio of the helicases (ie, the number of helicases over the annotated RBPs number) versus the normalized prG4 densities. Since the annotation of RBPs varies among species, normalizing the number of helicases allows for comparisons between species. In Figure 5A, the helicase ratio is plotted against the normalized prG4 density (normal prG4 density minus the shuffled density). Firstly, the ratio of helicases appears higher in eukaryotes than in prokaryotes. Specifically, the helicase ratio ranged from 0.0025 to 0.025 in prokaryotes, while in eukaryotes, it ranged from 0.020 to 0.040. Globally, there seems to be a positive correlation between normalized prG4 densities and the helicase ratio. To confirm this observation, we performed a linear regression on this data, removing outliers using the interquartile range method. The results suggest a significant positive correlation between prG4s densities and helicases ratio (Figure 5B). However, when examining the correlation within each species group separately, the significant positive correlation is not consistent. Specifically, for archaea and bacteria, no significant correlations are found, whereas for eukaryotes, there is a strong significant negative correlation. In conclusion, there is a complex relationship between helicases and prG4 densities with significant correlations, but the nature of this relation varies depending on the species group. Moreover, contrary to our expectations, we found a negative correlation for eukaryotes, which contrasts with the hypothesis. This result is further discussed in the discussion, particularly regarding its interpretation and the use of the helicase ratio over RBPs.

prG4 densities and helicase ratio. (A) Distribution of species based on their helicase ratio and the normalized prG4 density (calculated by subtracting the shuffle density from the normal density); (B–E) depict correlations between the helicase ratio and prG4 densities for respectively all species, eukaryotes, archaea, and bacteria. The r-value represents to the Pearson correlation coefficient.
CLIP data versus pG4s
To further investigate the relationship between prG4s and RBPs, we examined the relative location of RBPs RNA binding sites and prG4s. This analysis provided an overview of the distribution of these locations across different species. To achieve this, we crossed rG4s prediction data with freely available CLIP data for

Top and bottom 10 proportions of prG4s near RBP binding sites relative to the total number of prG4s present on the transcripts bound by the RBP. (A) to (F) correspond respectively to
We also examined other model organisms using RBPs binding sites data from the POSTAR database for
Overall, the co-location of binding sites and prG4s does appear to support co-evolution. To further investigate the co-evolution between RBPs and prG4s, the effort should concentrate on specific RBPs known to interact with prG4s and study them across many more species.
Discussion
Although pG4s are prevalent in numerous species, only a few pG4 families are conserved. Figure 3 illustrates that few pG4s are conserved (less than half are shared between
The prediction of G4s through homology appears to be a promising approach for discovering additional G4s. However, the viability of this method should first be experimentally confirmed. This method also highlights some limitations of G4RNA screener predictions in some species. For example, the initial prediction identified few G4s in
The comparison of the gene tree with the pG4 family tree within these genes highlights instances where pG4s evolve differently from their host genes. This divergence may be attributed to pG4s experiencing distinct selection pressure compared to the entire gene. It is well-established that different regions within a gene may be under different pressures of selection. In Supplemental Figure 4, some pG4s trees closely resemble gene trees, while others do not. Those pG4s families that closely mirror the gene tree may not have an important role on their own and thus evolved similarly to their host gene. Conversely, pG4s trees with different topologies than their host gene could indicate that these pG4s have important functional roles. These pG4s may be subject to positive or negative pressure of selection depending on how they affect the gene function, resulting in a different evolution than the gene. This divergence could also be influenced by coevolution with other elements such as RBPs. However, it is essential to interpret these results cautiously, as genes with pG4s (both from the initial prediction and homology prediction) and those without pG4s were found as neighbors pG4 family trees. Non-parsimonious gain or loss of pG4s, as well as errors in the tree topology, might explain this phenomenon. Overall, it appears that stable pG4s are less conserved than unstable ones. This observation aligns with the number of G4s predicted through homology, although these predictions require an experimental confirmation. Notably, some specific cases revealed pG4 families with similar trees compared to their host genes, while in other cases, the opposite pattern was observed. The underlying selection pressure driving this evolution remains elusive but is pointing toward coevolution with RBPs. This coevolution could aid in either stabilizing less stable G4s or destabilizing highly stable ones.
Different studies showed have indicated the existence of differences in prG4 densities between eukaryotes and prokaryotes.37,54 However, the reason remains unknown. The prevailing hypothesis is that prokaryotes possess fewer RBPs and thus, if an rG4 forms in a prokaryotic cell, it is more likely to remain in this state, independently of the cell needs. In contrast, eukaryotes, which have a geater abundance of RBPs, may have more dynamic regulation of rG4s. A study showed that bacteria living in hot environments have more pG4s than other closely related bacteria living in normal temperatures. 40 This suggests that in an environment where prG4s can fold and unfold freely, such as hot environments, more prG4s are present. This aligns with the concept that with more RBPs to help rG4 fold and unfold, there are more prG4s. Figure 5A and B appear to support this hypothesis, but Figure 5C shows a contrary trend in eukaryotes. Several explanations can be considered for these results. First, it is essential to recognize that a correlation does not imply the causality of an event. The correlation between the helicases ratio and prG4s density might occur by chance, and the strength or direction of this correlation could vary among different species groups we selected. Another possibility is that we may be examining at the wrong parameters to evaluate the coevolution of RBPs with prG4s. Our analyses has focused solely on helicases, while chaperone or other RBPs, might have a closer relationship with prG4s. Notably, since there are fewer helicases among RBPs in eukaryotes when the prG4 density becomes high, it might be due to the number of RBPs rising, thus lowering the helicase ratio. RBP2GO was used to retrieve gene ontology associated with RBPs, and only 2 chaperones were identified. 72 This limited availability of chaperone data may have hindered the investigation of the relationship between prG4s RBPs, although it remains an interesting lead for further research. Additionally, there could be other unaccounted-for factors influencing these results. In summary, a clear correlation exists between helicases ratio and prG4s densities across all species, supporting the notion of coevolution between rG4s and RBPs. Yet, for eukaryotes, this correlation is negative, suggesting that higher prG4 densities are associated with a lower ratio of helicases among RBP.
Based on these results, we looked at the co-location BSs and prG4s to determine whether most prG4s interacted with RBPs or not. We used eCLIP data to obtain BSs of 150 RBPs and compared them with normal pG4s and randomly relocated pG4s. Our analysis revealed that RBPs engage with prG4s in different ways, depending on the cell type and on species. This was expected, considering the distinct cellular environments and functions that necessitate different regulatory mechanisms. Surprisingly, even some known RG4BPs appeared to bind fewer prG4s than anticipated.to bind prG4s. For instance, HNRNPL, which has demonstrated interactions with G4s,
74
ranked lower in Figure 6. This outcome may be explained by another study indicating that HNRNPL preferentially binds regions rich in CA repeats,
75
which might explain a higher co-location with false prG4 exists compared to the normal dataset. Additionally, certain RBPs, despite their established prG4-binding capabilities, may bind only specific prG4s and a subset of them. Some other RBPs are expected to appear at the bottom of the list since they are known to interact with short noncoding transcripts where few G4s are predicted (eg, YBX3, LIN28B and LARP7). In Supplemental Figure 7, like some RG4BPs, helicases exhibit diverse binding profiles contingent on the RBP and cell lines. Yet these results should be mitigated considering many points. Firstly, not all binding sites are comprehensively annotated. While eCLIP data for
The identification of G4 families yields a complex view of G4 evolution. Most G4s are not grouped into G4 families; however, specific G4 families are found across bacteria or archaea species used in this study, indicating shared ancestral origins for these G4s. Additionally, our study reveals the presence of an intricate relationship exists between RBPs and pG4 density.
Supplemental Material
sj-docx-2-evb-10.1177_11769343231212075 – Supplemental material for Toward a Better Understanding of G4 Evolution in the 3 Living Kingdoms
Supplemental material, sj-docx-2-evb-10.1177_11769343231212075 for Toward a Better Understanding of G4 Evolution in the 3 Living Kingdoms by Anaïs Vannutelli, Aïda Ouangraoua and Jean-Pierre Perreault in Evolutionary Bioinformatics
Supplemental Material
sj-xlsx-1-evb-10.1177_11769343231212075 – Supplemental material for Toward a Better Understanding of G4 Evolution in the 3 Living Kingdoms
Supplemental material, sj-xlsx-1-evb-10.1177_11769343231212075 for Toward a Better Understanding of G4 Evolution in the 3 Living Kingdoms by Anaïs Vannutelli, Aï/da Ouangraoua and Jean-Pierre Perreault in Evolutionary Bioinformatics
Footnotes
Author Contributions
AV, AO, and JPP conceived the study and its design. AV wrote the program, collected the data, ran the experiments, created the figures, and drafted the manuscript. AO and JPP critically revised the manuscript. All authors read and approved the final manuscript.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Natural Sciences and Engineering Research Council of Canada (NSERC) Graduate Scholarship (to J.M.G.); Canada Research Chair in Computational and Biological Complexity (CRC Tier2 Grant 950-230577 to A.O.); Chaire de recherche de l’Université de Sherbrooke en Structure et Génomique de l’ARN (to J.P.P.); Fonds de Recherche du Quebec Nature et Technologies (FRQ-NT); Natural Sciences and Engineering Research Council of Canada (NSERC RGPIN-155219-17 to J.P.P., RGPIN-05552-17 to A.O.); Centre de Recherche du CHUS (to J.P.P.); Université de Sherbrooke.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
