Abstract
Telomere length dynamics plays a crucial role in regulation of cellular processes and cell fate. In contrast to epidemiological studies revealing the association of telomere length with age, age-related diseases, and cancers, the role of telomeres in regulation of transcriptome and epigenome and the role of genomic variations in telomere lengthening are not extensively analyzed. This is explained by the fact that experimental assays for telomere length measurement are resource consuming, and there are very few studies where high-throughput genomics, transcriptomics, and/or epigenomics experiments have been coupled with telomere length measurements. Recent development of computational approaches for assessment of telomere length from whole genome sequencing data pave a new perspective on integration of telomeres into high-throughput systems biology analysis framework. Herein, we review existing methodologies for telomere length measurement and compare them to computational approaches, as well as discuss their applications in large-scale studies on telomere length dynamics.
Introduction
Telomeres are nucleoprotein complexes located at the ends of eukaryotic chromosomes. The telomeric DNA sequence comprises short consecutive repeats, which in humans are in the form of “TTAGGG” and normally span 2–10 kb, depending on age, tissue type, and cell state. 1 They protect chromosome ends from being recognized as double strand breaks and from forming end-to-end fusions. Because of the “end replication problem”, whereby the replicative machinery of the cell is not able to fully replicate the linear chromosomes, telomeres are gradually shortened after each cell division.2–4 Replicative burden in most somatic cells leads to telomere shortening, and eventually to senescence and apoptosis, 5 whereas stem cells and cancer cells overcome the telomere shortening problem by developing mechanisms for their elongation. 6 Changes in telomere length dynamics have been implicated in physiological aging and in the development of age-related diseases and cancers. 7
Based on the importance of telomere length dynamics in the development of certain diseases, they have been studied for a long time. A number of experiments have been performed to identify the genes affected by Telomere Position Effect (TPE), 8 long distance effects, 9 and genomic variations affecting telomere length. 10 These studies have been based on the experimental measurement of telomere length. Despite the importance of such studies, data on the association of telomere length with gene regulatory mechanisms are scarce. This is caused by two main factors: (1) a part is based on low-throughput data, such as measuring telomeres and small number of genes in the same experiment, and (2) the other part is based on high-throughput data on genotypes, epigenetic modifications, and gene expression, for which experimental measurement of telomere lengths is also available. Unfortunately, the latter cases are extremely rare, thus limiting the data source for analysis of telomere length regulatory networks in the cell. Finally, the experimental assays of telomere length measurement usually require a large amount of source DNA material, or viable dividing cells, which lead to sample availability issues for the telomere measurement assays as well. For this reason, most of the studies on telomere length have been performed on blood leukocytes. 11
Emergence of high-throughput sequencing technologies has provided grounds for the development of novel data analysis methodologies to capture the information on telomere length from the existing DNA sequencing data. This allows for integrating information on telomere length into linked data on gene variants, and possibly gene expression, chromatin immunoprecipitation sequencing (ChIP-seq), and DNA methylation. This new approach performed on high-throughput data should allow for fostering integrative analysis of telomere regulatory networks in health and disease.
The purpose of this review is to stress on the possibility of efficiently obtaining telomere-related information from the currently available high-throughput technologies and show the importance of such computations to foster telomere biology research. For this, we will sum up the state-of-the-art experimental methods for measurement of telomere length and their application in large-scale studies, and then will talk about current computational methods for telomere length measurement and existing data on integration of telomere length into high-throughput data analysis pipelines.
Telomere Length Dynamics in Aging and Disease
Regulation of telomere length dynamics is a complex interplay between telomere attrition and elongation processes. Studies indicate that replicative telomere attrition caused by the end replication problem accounts for only up to 50–100 bp shortening per each cell division. 12 Other factors, such as reactive oxygen species, radiation, and a number of additional stressassociated environmental factors, as well as genetic background, might contribute to faster attrition rates. 13 The main telomere elongation mechanism, which is employed by most cells with high proliferative capacity, is dependent on the expression of telomerase, an RNA-dependent reverse transcriptase, which elongates the telomeres.6,14,15 A part of cancer cells elongate their telomeres via alternative elongation mechanisms, dependent on recombination events between sister chromatids.16–19 These processes are regulated by various players in the telomere elongation and maintenance pathway. 20
Changes in telomere length homeostasis are differently implicated in aging and disease. 7 While telomere lengthening and maintenance machinery are shown to be active in cancer cells, senescence-related diseases are mostly characterized by accelerated telomere shortening. The latter may lead to two main consequences: (1) extremely short telomeres lead to telomere dysfunction and chromosome instability21,22 and (2) changes in telomere length up to critically short levels are associated with the changes in expression and epigenetic modifications of several genes.8,9 This association has been explained by telomere position effect, reversible silencing of genes located near telomeres, 8 and by long distance interactions based on physical localization of chromosomes within the nucleus. 9 Alternative direct and indirect effects of telomere length dynamics on regulation of gene expression is still an open field for investigation.
Telomere shortening is considered as one of the factors leading to organismal aging, though it is not known whether it results from replicative arrest in somatic and stem cells, or activation of telomere shortening-induced signaling cascades.23–25 The onset of many complex human diseases is associated with aging and partly attributed to the telomere shortening. A number of studies have implicated leukocyte telomere length (LTL) as a risk and/or prognostic marker, for cardiovascular diseases, ischemic strokes, type 2 diabetes, and neurodegenerative diseases; however, the direct link between telomere shortening and disease development is not always clear.26–31 Additionally, telomere length is crucial for preserving replicative capacity and clonal exhaustion of actively dividing immune cells. 32 Age-dependent telomere shortening in B- and T-lymphocytes has been linked to defective immune responses and development of many age-related diseases. In some cases, telomere shortening in T-cells has been linked to development of autoimmune diseases. 33 Individuals with short telomeres or those suffering from premature aging syndromes have susceptibility of developing infectious diseases, because of telomere shortening-associated decline in immune functions. 32 Some diseases are directly linked to telomere dysfunction, such as dyskeratosis congenita, where mutations occur in genes responsible for telomere maintenance. 34
The role of telomeres in development of cancers is more complex. In normal cells, telomere shortening up to critical levels leads to cell growth arrest and apoptosis. Overcoming the cell crisis checkpoint at this stage leads to chromosomal instability in many cancer cells.22,35 While chromosomal instability is considered an important factor for cancer onset, maintenance of telomeres is important for cell survival in cancer progression. 22 Most of the cancer cells express telomerase to elongate the shortened telomeres, 14 while others develop alternative lengthening mechanisms. 17 Depending on the cancer type, both short and long telomeres have been described as markers of risk. 36 In some cases, telomerase expression is considered as a marker of aggressiveness and poor prognosis. 37 Therapies inhibiting the action of telomerase have been shown to be effective in limiting cell growth in some cancers. 38
Telomere Length Measurement
The existing methodologies for telomere length measurement can be roughly divided into experimental and computational approaches. Experimental methods assume direct measurement of telomere lengths, while computational methods assume meta-analysis of the existing sequencing data for length estimation (Table 1).
Comparison of experimental and computational methods for telomere length measurement.
Experimental methods
Experimental methods for telomere length measurement include, but are not limited to, terminal restriction fragment (TRF) analysis, quantitative polymerase chain reaction (qPCR), quantitative fluorescence in situ hybridization (qFISH), 39 etc., which measure the mean and/or chromosome-specific telomere lengths.
The first method developed for measuring the mean telomere length (MTL) was TRF. It first performs a test to select intact DNA, and then uses restriction enzymes to cut the whole DNA sequence except for the telomeric regions. The remaining long telomeric fragments are then separated using agarose gel electrophoresis. The isolated fragments are analyzed with Southern blot using telomere probe ligation. 40 Since the fragments are of nonuniform length, they appear as dispersive smears. A crucial factor in the accuracy of the average telomere length calculation is accounting for the smear length and the intensity of probe binding. TRF results obtained from different experiments are dependent on various factors, such as source DNA quality and quantity, choice of restriction enzymes, gel density, signal intensity calculations, and length adjustment. 41 Moreover, the distance of the furthest subtelomeric restriction site to the telomeres is estimated to be 2.5–4 kb, depending on the chromosomes and the restriction enzyme used. Thus, the results of different studies should be compared with caution and account for the difference of restriction enzymes used. Finally, TRF requires large amounts of source DNA and is also not capable of capturing short telomeres. 39 Since the TRF has been the first technique for MTL measurement, it has served as a reference for further emerging methods and is thus considered the “gold standard”.
The quantitative PCR approach is less elaborate compared to TRF. It is based on the amplification of telomeric regions via telomere-specific primers. The basic assumption is that the longer the telomeric sequence of the source DNA, the more there are places for the primers to attach, and more amplicons will be generated. Comparison of their relative quantity and that of the amplicons generated from single-copy gene PCR gives a telomeric over single-copy gene signals ratio, which is correlated with the overall telomeric content of the cell. One major disadvantage of this method is that it is strictly dependent on the initial calibration steps, and the results derived from different experiments are difficult to compare. 39 Compared to TRF, qPCR is thought to be more prone to measurement errors, which are being addressed by further modifications and amendments. 42 For example, in traditional qPCR, the reactions for the single-copy gene and telomeres are performed in different test tubes, which is a source of potential pipetting error. Thus, a modified version of this approach, such as monochrome multiplex qPCR, has been suggested, where the single-gene and telomere amplifications are performed in the same tube. 43 Besides measurement errors, telomeric over single-copy gene signals ratios obtained with these methods should be treated with caution, accounting for possible copy number and chromosome number variations. 39
Other methods are used less frequently and serve more specific purposes. Single telomere length analysis (STELA) utilizes a 3′ overhang specific linker and a subtelomeric primer to amplify telomeres at specific chromosome ends with PCR. 44 STELA uses subtelomeric primers of known lengths, and their distance to the true telomeric region is known, which makes it more accurate than the TRF. Additionally, it estimates telomere lengths at specific chromosomes and is able to measure short telomeres, which is important in telomere shortening-induced senescence studies and for accounting for the telomere length variability across chromosomes. However, sequence variability at most of the subtelomeric regions allows only for capturing telomeres at chromosome arms Xp, Yp, 2p, 11q, 12q, and 17p. 45 Another limitation of STELA is that it is not able to capture long telomeres (lengths more than 8 kb are usually not captured).39,44
qFISH is used for measuring telomere lengths at metaphase chromosomes via ligation of telomere-specific fluorescent probes. The signal intensity is then compared to a standard of known telomere length. 46 This method is accurate and is advantageous of providing arm-specific telomere lengths. The main drawback is that cells should be able to divide, and thus, it is not applicable to cell cycle arrested cells. Also, because the method is hybridization based, it makes it difficult to quantify extremely short telomeres.
There is a wide variety of other techniques, each aimed at overcoming limitations of the existing ones. 39 However, there have been concerns about the comparability of the results obtained in different experiments, raising the need for proper calibration of the techniques based on the gold standard reference.47,48
Another major limiting factor for understanding the complex picture of telomere regulatory network is the lack of high-throughput data coupled with experimental results on telomere length. Only a limited number of studies exist, where telomere length measurement experiments have been coupled with high-throughput genome, transcriptome, or epigenome data. Importantly, in order to put the data obtained from these two experiments in the same context, these should be performed on the same samples at the same time. Existing data do not allow for making appropriate analysis, thus posing the necessity of computational techniques that could obtain telomere length information from genome sequencing data.
Computational methods
The advent of next-generation sequencing (NGS) techniques has allowed for obtaining highthroughput whole genome, transcriptome, and epigenome data. Compared to low-throughput experiments, these techniques allow for performing systems biology analysis with integration of data from different sources into a single outlook. Currently, large amounts of coupled whole genome sequencing (WGS), RNA sequencing, ChIP-seq, and bisulfite NGS data for individual organisms are available. 49 These data are valuable for making systems-level association studies with telomere length dynamics.
Telomere length information can be obtained from WGS data, which is presented in the form of short read fragments (20–150 bp). During the last five years, a few methodologies have been developed for estimating telomere length from these data, all of which are based on capturing “telomeric” reads, which are presumably derived from telomeric regions of the genome. The first attempts were based on simple counts of reads that contain a certain number of intact telomeric TTAGGG repeats.50,51 Read count based software TelSeq scans for reads containing more than a threshold amount of TTAGGG repeats and compares it to the number of genomic reads with the same GC content. This relative count is then multiplied by a GC-normalized genome length constant, which gives an estimate of absolute MTL. 50 This method provides correlation of telomere length estimates with experimental results in certain settings (for 100 bp long Illumina reads); however, it has several limitations. The output of TelSeq is dependent on the threshold chosen for telomeric repeat counts, and there is no established way of calibrating this threshold in different sequencing settings, thus making the results of TelSeqbased counts noncomparable across studies. Accounting for this repeat count limitation with optimum performance with a threshold of 7 repeats in 100 bp reads, TelSeq performs poorly on read lengths less than 50 bp or more than 100 bp. The results are also nonrobust to sequencing errors, which easily obscure the telomeric repeat pattern and change the length estimate. Finally, TelSeq is hard-coded for GC-normalized lengths and is not suitable for calculations based on nonhuman chromosomes or nondiploid chromosome sets.
Another software package Computel has been recently developed for MTL estimation from WGS data. 52 It estimates MTL by aligning short reads to a special telomeric reference. The reference is designed in a way to uniquely capture telomeric reads coming from pure telomeric regions of the genome, or from the junctions of telomeres and subtelomeres. It also minimizes the possibility of capturing interstitial telomeric repeat rich reads, which do not originate from telomeres, but from other regions of the genome. 53 In their study, the authors have conducted detailed analysis of the performance of Computel on synthetic and experimental reads and performed comparative analysis with TelSeq. 52 Importantly, Computel is more robust to sequencing errors and is flexible for variations in sequencing platforms, accounting for read length and depth of coverage. 52 Finally, the results of Computel obtained from various settings are reliably comparable, which makes it suitable for integrative analysis in telomere length association studies performed on high-throughput sequencing data, across experiments. One major limitation of Computel, as well as TelSeq, is that these software do not count the telomere length at individual chromosomes, since the latter is an important marker of genome stability. 54 However, despite this limitation, MTL is proven to be an informative surrogate marker of individual telomere lengths in a large amount of association studies.55–57
Except for WGS data, other types of -
Application of Telomere Length Measurement Approaches in Large-Scale Studies
Epidemiological studies on telomere association with age and age-related diseases
Epidemiological studies on telomere length and its association with aging and diseases in humans have mostly been performed using TRF and qPCR techniques discussed above. These assays require a large amount of source DNA, which is easily available from blood leukocytes. For this reason, LTL has been considered as a biomarker of biological aging in a considerable amount of epidemiological studies. Interestingly, LTL is assumed to be correlated with telomere lengths of other body cells. 5 Most of these studies point on reverse association of LTL with chronological age, but there are some exceptions to this rule, which are at times hypothesized to be population or ethnicity specific.61–63 Some epidemiological studies indicate that telomere length is also predictive of mortality, but a meta-analysis of these studies indicates that this predictive association diminishes with age. 64 Two studies performed on the very old population have measured telomere length in blood cells using qPCR and TRF assays and have shown that there is no association between telomere length and mortality in these cohorts.65,66 Some studies pinpoint on the fact that interindividual differences between LTL are largely determined at birth or within the first few years after birth, and then diminish at constant rates during life.1,67 As indicated by a recent review, existing data on the association of telomere length with age does not imply causality, and the role of telomeres in the biology of aging still needs further investigation. 68
Based on its association with age in the general population, telomere length has been studied in the context of its association with diseases of the aging population, such as cardiovascular diseases, cerebrovascular diseases, and type 2 diabetes. Moreover, since chronological age is not the perfect marker of organismal aging, telomeres have been considered as better indicators of biological aging. 11 The results obtained from various studies clearly indicate that atherosclerosis is associated with shortened telomeres in the cells of the vascular endothelium, as well as in leukocytes. The latter is explained by the correlation of telomere length in different tissues in the body, as well as by clonal exhaustion of immune cells and hematopoietic somatic cells in the presence of chronic inflammation. 69 Myocardial infarction and ischemic stroke, as well as other cardiovascular and cerebrovascular diseases, have also been shown to be mostly associated with shorter LTL. 11 However, there is a lot of discrepancy in the associations found, which raise doubts about comparability, reproducibility, and accuracy of the applied measurement techniques, especially stressing the difference in technologies and intra-study measurement errors. 70 A meta-analysis of 15 cohort and 12 case-control studies performed before 2013 by D'Mello et al has shown that myocardial infarction, ischemic stroke, and type 2 diabetes, but not coronary artery disease, are significantly associated with shorter LTL. 11 In five of these studies, LTL was measured with TRF essay, while the rest were measured with qPCR. Interestingly, the studies using TRF essay showed a greater effect size for stroke, which can be explained by measurement bias in qPCR, 11 which reduces the effect of possible association.
An interesting study has been conducted recently to measure the association of telomere length in the largest cohort to date in a single experimental setting via robotics-based automation of qPCR experimental procedures. 71 The study has confirmed that telomere length negatively correlates with age up to 75 years, while having positive correlation with age in older individuals. It has also confirmed that after the age of 50 years, females have longer telomeres than males. Finally, the study has found that telomere length variance between individuals increases with age. 71 Due to automation, the authors have eliminated the inter-study variation of telomere length measurements and have increased the statistical power of the tests. This partially overcomes the limitations of experimental approaches for telomere length measurement, but stresses the resources required to perform accurate telomere length association studies.
Genome-wide association studies on MTL
Evidence points that telomere length at birth and attrition rates may be largely conditioned by genetic background. Thus, genomewide association studies (GWASs) for MTL are valuable to find the possible causality and reveal the role of these genes in the regulation of telomere length and telomere biology. To date, several large GWASs for MTL have been performed on various populations mainly via qPCR- or TRF-based measurement of LTL and genotyping arrays. The first GWAS has been performed for mean TRF length of leukocytes and found
In contrast to GWAS, few attempts have been made to link MTL to high-throughput gene expression studies. An interesting study on integration of Hi-C and 3D-FISH along with gene expression microarray analysis has revealed a mechanism for telomere length-regulated expression of three genes, namely,
Use of computational approaches for integration of telomere length into high-throughput data analysis framework
Based on the wide availability of transcriptome and epigenome NGS data coupled with WGS, computational approaches for telomere length measurement will soon fill the gap of knowledge in the biology of telomere length dynamics. One of the pioneering works in this direction has been the study by Parker et al on association of telomere length dynamics in pediatric cancers, where gain and loss of telomeric DNA has been estimated from WGS data by counting the number of short reads containing at least four consecutive telomeric repeats. 51 Although the method they have used for estimation of telomere length was not calibrated to return accurate results, they have shown that it is a promising approach. The number of similar studies started to increase with the introduction of computational pipelines for estimation of telomere length from WGS data (TelSeq 50 and Computel 52 ). These software packages have already been used in a number of studies.62,83–87
Using TelSeq, it has been shown that adversity in major depression is associated with shorter telomeres and increased copies of mitochondrial DNA.
83
Analysis of WGS data in diffuse glioma has revealed that mutations in
Another set of studies have used Computel for measurement of telomere length from the available NGS data. A quantitative trait association study performed on whole genomes of 168 South Asians has revealed
In summary, the applications of computational methods to high-throughput data show the potential of integrative view on telomere length dynamics and its association with molecular events. Most of the controversies and knowledge gaps on telomere biology stem from the fact that those are not really viewed at systems level. Telomeres are peculiar structures that are tightly regulated by complex protein machinery and themselves regulate the expression of certain genes. These relationships should form a complex network including positive and negative feedback loops. Development of appropriate instruments to model this network and analyze its behavior is a key to get deeper insight into the role that telomeres play in the cell development.
Conclusion
Biological role of telomeres in the establishment of cellular processes and cell fate still remains a mystery. There is a vast amount of studies aimed at finding an association and a causal relationship between telomere length dynamics and development of cellular senescence and organism al aging, as well as development and progression of complex human diseases and cancers. However, these studies, although mostly linking telomere length with aging, age-related diseases, and some cancers, fail to establish the causality of these associations and the biological role of telomere length dynamics. The various low-throughput assays for measurement of telomere length that have been used to date, such as TRF, qPCR, and qFISH, have drawbacks and limitations and are resource consuming. While high-throughput sequencing and technologies have opened a new era in systems biology research allowing for integration of data on genome, transcriptome, and epigenome for thousands of genes simultaneously, very limited number of studies have also performed coupled assays for telomere length measurement. Novel computational approaches for measuring telomere length from the already available WGS data allow for integrating telomeres into the high-throughput analysis frameworks. These approaches have short history, but have already been used in several studies, and will likely contribute to a large amount of findings in telomere biology in the future.
Author Contributions
Wrote the manuscript: LN. Agreed with manuscript results and conclusions: LN. Made critical revisions and approved the final version: LN. The author reviewed and approved the final manuscript.
