Abstract
The true importance of cell-free DNA in human biology, together with the potential scale of its clinical utility, is tarnished by a lack of understanding of its composition and origin. In investigating the cell-free DNA present in the growth medium of cultured 143B cells, we previously demonstrated that the majority of cell-free DNA is neither a product of apoptosis nor necrosis. In the present study, we investigated the composition and origin of this cell-free DNA population using next-generation sequencing. We found that the cell-free DNA comprises mainly of repetitive DNA, including α-satellite DNA, mini satellites, and transposons that are currently active or exhibit the capacity to become reactivated. A significant portion of these cell-free DNA fragments originates from specific chromosomes, especially chromosomes 1 and 9. In healthy adult somatic cells, the centromeric and pericentromeric regions of these chromosomes are normally densely methylated. However, in many cancer types, these regions are preferentially hypomethylated. This can lead to double-stranded DNA breaks or it can directly impair the formation of proper kinetochore structures. This type of chromosomal instability is a precursor to the formation of nuclear anomalies, including lagging chromosomes and anaphase bridges. DNA fragments derived from these structures can recruit their own nuclear envelope and form secondary nuclear structures known as micronuclei, which can localize to the nuclear periphery and bud out from the membrane. We postulate that the majority of cell-free DNA present in the growth medium of cultured 143B cells originates from these micronuclei.
Introduction
Molecular analysis of cell-free DNA (cfDNA) marks a new point of departure in the application of genomic methods and ultra-sensitive technologies for the non-invasive diagnosis, prognosis, and monitoring of a wide range of diseases, especially cancer.1,2 Many studies have demonstrated the translational potential of cfDNA as a powerful and multifaceted biomarker. However, virtually no tests have yet been incorporated into routine clinical testing.3,4 While the difficulty to do so can, to a certain extent, be attributed to the lack of analytical consensus among research groups,5–8 a growing number of reports indicate that a dearth of knowledge on the origin and molecular properties of cfDNA is another obstacle that substantially precludes the rapid translation of basic research to clinical practice.9–15
Next generation sequencing (NGS) is one of the most powerful methods for elucidating the molecular properties of cfDNA. While sequencing of cfDNA has increased considerably in recent years, research has been mainly clinically motivated, focusing on size evaluation,16,17 ultra-deep amplicon 18 or exome sequencing, 19 and methylation-specific sequencing.20–22 In contrast, few researchers perform thorough sequence characterization. One group sequenced the cfDNA obtained from the serum of 51 healthy individuals. Although most sequences indicated an apoptotic origin, an uneven distribution of apoptotic and necrotic DNA across the genome, as well as an overrepresentation of Alu and L1 repetitive elements (REs), was observed. 23 This uneven representation is likely due to the complexity of the in vivo setting, that is, both the quantitative and qualitative characteristics of cfDNA in the blood of an individual at any instance are modulated by various internal processes and environmental factors.9,15 Moreover, since most cells release DNA, the aggregate cfDNA profile comprises a muddled blend of mutated and wild-type DNA released by various cells from different tissues and organs by different mechanisms. This makes it very difficult to investigate the biological properties of cfDNA in vivo.
Since cell cultures are insulated from most external elements, we argue that many of the difficulties encountered in in vivo experiments can be circumvented by in vitro models. For example, DNA is released from only a specific cell type in a typical cell culture experiment, rather than the hundreds of different kinds of cells in a whole organism. In a previous study,24,25 we characterized the DNA present in the culture medium of human bone osteosarcoma (143B) cells. After 4 h of incubation, only a small number of 166 bp DNA fragments are present. However, after 24 h, there is a significant increase in the amount and size of DNA (∼2000 bp). Typically, DNA with a size of 166 bp is a product of apoptotic fragmentation, while a size of 2000 bp can be explained by neither apoptosis nor necrosis. These results were confirmed by two flow cytometric assays.
These observations raise the important question: if not apoptosis or necrosis, which mechanisms may give rise to the presence of cfDNA, with an approximate size of 2000 bp, in the growth medium of 143B cells after 24 h of incubation? Therefore, the major aim of the experiments presented in this article was to characterize the nucleotide sequence of the cfDNA that is released by cultured 143B cells after 24 h of incubation. Using this information, we infer a possible origin and function.
Methods
Cell culturing and growth medium processing
The human bone cancer (osteosarcoma) cell line 143B was obtained from the American Type Culture Collection (ATCC® CRL-8303™) and was cultured in accordance with ATCC protocols. Cells were grown in Dulbecco’s modified Eagle’s medium (Hyclone DMEM/high glucose; Thermo Scientific; #SH30243.01) containing 4 mM
Extraction and quantification of cfDNA and genomic DNA
The cfDNA was extracted with the NucleoSpin Gel and PCR Clean-Up kit (Macherey-Nagel, Düren, Germany; #1502/001), according to the manufacturer’s PCR clean-up instructions, with slight modifications. Briefly, samples were thawed at 37°C in a temperature-controlled water bath. After incubation, the samples were vortexed and centrifuged briefly. For each biological replicate, 12 mL of growth medium was mixed with 24 mL of binding buffer NTI. Samples were then vortexed, the entire volume of growth media added to the spin column in small regiments, and moved through the filter using a vacuum manifold setup and pump. Hereafter, the columns were washed twice, followed by the elution of cfDNA into 50 µL of elution buffer. The duplicate samples were then pooled. Genomic DNA was isolated from 143B cells using the FlexiGene DNA kit (Qiagen), according to the cultured cells protocol. DNA samples were quantified using the Qubit® 2.0 Fluorometer (Invitrogen, Life Technologies) and Qubit dsDNA HS Assay kit, and then stored at 4°C.
DNA library generation and sequencing
Extracted cfDNA and genomic DNA (∼100 ng) were sheared into 100 to 700 bp fragments by sonication with the Bioruptor UCD-200 (Diagenode), by applying three 5-min cycles of 30 s on/30 s off, on the medium setting. After shearing, the fragment ends were end-repaired and polished with the Ion Plus Fragment Library kit. Ion A and P1 adaptors were then ligated onto the polished fragments, and 330 bp fragments were size-selected using 2% agarose gels on the Egel® system. These libraries were then amplified using the Ion Plus Fragment Library kit and quantified using the Qubit dsDNA HS Assay kit. The libraries were diluted to 100 pM and a sequencing template for each library was prepared on the Ion OneTouch2 system using the Ion PGM Template OT2 200 kit. Each sample’s templated ISPs (Ion Sphere® Particles) were manually loaded onto an Ion 318v2 chip and sequenced on the Ion PGM using the Ion PGM Sequencing 200 kit v2. Except where indicated, all kits and reagents were obtained from Thermo Fisher Scientific. Default parameters on the Ion Torrent Suite (v4.4) were used to perform base calling, trimming and filtering, and alignment to hg19. A good quality run was ensured, with higher than 85% ISP loading density and approximately 6 million reads. Approximately 1.365 GB of data (6,184,354 reads) were generated for cfDNA.
Sequence-analysis pipeline
Binary alignment matrix (BAM) files, containing raw sequencing reads, were used to perform a strict de novo assembly in the CLC genomics workbench (Version 7.0), followed by read-mapping. Selected contigs were converted into a FASTA format, after which the files were exported and converted to text-file format for further analysis. To screen for REs and regions of low complexity, we used RepeatMasker, a program utilizing RepBase (a service of the Institute for Systems Biology). Hereafter, when applicable, three consecutive local alignment analyses were conducted using the BLAST (Basic Local Alignment Search Tool) search engine to compare the generated array of cfDNA sequences with known sequences, including the human genome. The first search was against the ENSEMBL human GRCh38.p5 genome database. For the queries that did not return any hits, a second search against the National Center for Biotechnology Information (NCBI) nucleotide (nr/nt) database was performed using the megablast algorithm. Then again, for the queries that returned no results, a third search was performed using the blastn algorithm. In all cases, default search parameters were used. This procedure was repeated for the hits that covered less than 50% of the query sequence length in the ENSEMBL BLAST search. For each sequence, the highest scoring BLAST hit, its physical location, the overlapping gene (where relevant), and genomic location (e.g. in the centromere, in telomeres, in one gene) were recorded. This information was used to categorize the cfDNA and genomic DNA sequences.
Results
The spread of cfDNA sequencing data
To investigate the composition of the DNA released by 143B cells after 24 h of incubation, the cfDNA present in the culture medium was isolated and sequenced. First, the distribution of cfDNA sequence coverage was evaluated. This showed that the data are skewed significantly to the right, indicating the presence of at least two distinct cfDNA populations, that is, a large number of sequences with a relatively low coverage and a small number of sequences with a very high coverage. The details of these analyses are described in Supplementary file 1, Section 1.
Masking and representation of REs
To screen for REs and regions of low complexity, we used RepeatMasker Open software (4.0). To characterize the REs of the entire cfDNA population and to account for a non-normal distribution (as indicated in Supplementary Fig. 1), three data subsets were investigated, namely, (a) all contigs with coverage greater than 20, (b) contigs with coverage between 20 and 100, and (c) contigs with coverage greater than 100. The details of this method are described in Supplementary file 1, Section 2. In all subsets investigated, only a very small portion of the cfDNA population consists of unique regions, while the amount of REs notably exceeds any value predicted for the human genome26–28 (Figure 1). Therefore, the DNA released by 143B cells after 24 h of incubation is comprised mainly of REs. Furthermore, when considering the entire cfDNA population (Subset a), long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), satellites, and simple repeats (mini satellites) make up the majority of the cfDNA population and are overrepresented compared to the human genome (Figure 1(a)). Very interestingly, satellites and mini satellites are significantly overrepresented, while long terminal repeat (LTR) elements and DNA elements are underrepresented. As shown in Subset c (Figure 1(c)), which depicts only the contigs with a coverage greater than 100, satellites, mini satellites, and LINEs are significantly overrepresented in the cfDNA population as a result of a small number of sequences that have a very high coverage. Therefore, when these contigs (which significantly skew the data) are taken out of consideration (Subset b; Figure 1(b)), it becomes clear that, regardless of the overall masking of each repeat class, specific elements in each class are significantly overrepresented, while others are significantly underrepresented or occur at levels comparable to the human genome.

Representation of repetitive elements in cell-free DNA. Representation of the repetitive elements in each of the cfDNA subsets that were analyzed (as described in the “Masking and representation of repetitive elements” section), including (a) all contigs with coverage greater than 20, (b) contigs with coverage between 20 and 100, and (c) contigs with coverage greater than 100, respectively. Pie graphs illustrate the representation of the different repetitive element classes in the cfDNA population of each subset. Each repeat class is denoted by a different color, as indicated by the legend. In addition, each chart shows the percentage of the cfDNA population that does not contain repetitive elements (denoted by “unique regions”). Values were obtained by dividing the total number of bases masked by each repetitive element by the number of bases that constitute the entire corresponding subset. Bar graphs illustrate the representation of each repeat class, as well as the repetitive element types in each class, expressed as observed/expected ratios (i.e. the percentage of cfDNA masked by each repetitive element was divided by the percentage of the human genome masked by the corresponding element). Gray bars illustrate the different repeat classes, while blue bars indicate the repetitive element types contained within the preceding class. The dashed line denotes the expected ratio of masking for all repeats. Thus, values that extend above this line indicate an overrepresentation of the repetitive element, while values below this line indicate an underrepresentation of the repetitive element.
As a control, genomic DNA isolated from 143B cells was sequenced, screened for REs, and compared to 143B cfDNA (Supplementary file 1, Section 2.1). 143B genomic DNA shows almost normal levels of REs (∼59% of sequences are masked) and normal levels of non-repetitive DNA (gene sequences). In contrast, it is clear that REs are significantly overrepresented in cfDNA (∼82% of sequences are masked) while cfDNA contains virtually no unique gene sequences (Supplementary Fig. 2). This suggests that cfDNA does not represent merely fragmented genomic DNA but is actively released from specific parts of the genome. This is further discussed in the “Possible pathways for the extrusion of cfDNA” section.
Evaluation of individual repeat classes and associated subfamilies
Since the different RE families are not equally represented in the human genome, the coverage distribution of the contigs that comprise each RE type/family was evaluated. The purpose of this was to identify sequences that are significantly overrepresented within the context of each group. To do this, the coverage distribution of the contigs that comprise each RE population was displayed as a box plot, where data points above the 90th percentile are considered to be significantly overrepresented. Figure 2 illustrates the RE populations that were not overrepresented, including mammalian-wide interspersed repeat (MIR; Figure 2(a)), L2 (Figure 2(b)) elements, endogenous retrovirus (ERV) class I (Figure 2(c)), ERV (L) class II (Figure 2(d)), and hAT-Charlie (Figure 2(e)). Although mini satellites are also overrepresented, the sequences of this population are too diverse to be grouped and displayed, and were therefore included in (Figure 2(f)) and not in Figure 3. Figure 3 illustrates the RE populations that were overrepresented, including ERV (K) class II (Figure 3(a)), mammalian non-LTR retrotransposons (MaLR; Figure 3(b)), and TcMar-Tigger (Figure 3(c)). Box plots could not satisfactorily illustrate the spread of the data/identify outliers for Alu, L1, and satellites. Alternatively, these data were displayed as scatterplots, wherein 5% of the contigs with the highest coverage (which are considered to be significantly overrepresented) were distinguished from the remaining 95% of the data (Figure 4). Significantly overrepresented elements are summarized in Supplementary file 2. This includes the contig ID, the matching repeat, the length that it masked, the coverage of the sequence with which it aligned, as well as the total bases masked. In addition, the FASTA sequences of each of these overrepresented elements are indexed in Supplementary file 3 and can be located by its contig ID. To evaluate the composition of each RE type, the representation of its corresponding subfamily was determined. This was done only for the RE types that were shown to be overrepresented in comparison with the human genome (see Figure 1), namely, ERV (K) class II (Figure 3(d)), MaLR (Figure 3(e)), TcMar-Tigger (Figure 3(f)), Alus (Figure 4(a)), L1 (Figure 4(b)), and satellites (Figure 4(c)). Furthermore, regarding the three most abundant REs in the cfDNA population, Alus, L1, and satellites, the representation of their corresponding subfamilies within the top 5% and bottom 95% of the data was determined. This shows that there are individual subfamilies that contain contigs with a significantly high coverage.

Identification of significantly overrepresented repetitive elements—Part I. Coverage distribution of the contigs that constitute each of the different repetitive element populations, including (a) MIRs, (b) LINE2, (c) ERV class I, (d) ERV (L) class III, (e) hAT-Charlie, and (f) simple repeats. Contigs with coverage values above the 90th percentile are significantly overrepresented. Significantly overrepresented elements are summarized in Supplementary file 2.

Identification of significantly overrepresented repetitive elements—Part II. Coverage distribution of the contigs that constitute the repetitive element populations that are significantly overrepresented (see Figure 1), including (a) ERV (K) class II, (b) MaLR, and (c) TcMar-Tigger. Contigs with coverage above the 90th percentile are considered significantly overrepresented (significantly overrepresented repetitive elements are summarized in Supplementary file 2). Furthermore, pie charts illustrate the representation of the subfamilies that comprise each of the aforementioned repetitive element populations, including (d) ERV (K) class II, (e) MaLR, and (f) TcMar-Tigger. Although simple repeats are also overrepresented, they were not included in this figure because the sequences of this population are tremendously diverse.

Identification of significantly overrepresented repetitive elements—Part III. Scatterplots illustrating the coverage distribution of the contigs that constitute each of the repetitive element populations of which significantly overrepresented elements could not be distinguished using box plots, including (a) Alus, (b) LINE1, and (c) satellites. Each dot represents a single contig. The superimposed blue box indicates the contigs that constitute the top 5% of the data when sorted according to increasing coverage, while the pink superimposed box indicates contigs that constitute the bottom 95% of the data. Dots within the blue box, therefore, indicate significantly overrepresented contigs. Significantly overrepresented elements are summarized in Supplementary file 2. Pie charts illustrate the representation of the subfamilies that constitute each of the aforementioned repetitive element populations, including ① the top 5% of the data, ② the bottom 95% of the data, and ③ the entire repeat element population.
Chromosomal distribution
To further investigate the origin of the cfDNA sequences, chromosomal distribution was evaluated for each of the different RE populations that were shown to be significantly overrepresented (Figure 5). The number of bases of each RE type on each chromosome of the human genome differs notably. Therefore, to normalize the data, the total number of bases for each RE type on each chromosome (the number of bases multiplied by coverage) was divided by the number of bases that the corresponding RE type occupies on each chromosome of the human genome. The number of bases that each RE type occupies on each chromosome of the human genome was calculated by subjecting the pre-masked human genome to repeat masking analysis. Furthermore, the chromosomal distribution of 143B cfDNA was compared with the chromosomal distribution of 143B genomic DNA (Figure 5). Interestingly, specific chromosomes were significantly overrepresented, especially chromosomes 1 and 9, while the remaining chromosomes were significantly underrepresented. Moreover, the chromosomal distribution patterns of the cfDNA sequences very closely mirror those of 143B genomic DNA sequences. This observation is further discussed in the “Possible pathways for the extrusion of cfDNA” section.

Chromosomal distribution of cell-free DNA. Chromosomal distribution of significantly overrepresented repetitive element groups (see Figure 1), including (a) Alu, (b) LINE1, (c), ERV (K) class II, (d) MaLR, (e) TcMar-Tigger, (f), satellite DNA, and (g) simple repeats. The chromosomal distribution of these elements in 143B cfDNA is plotted on the right y-axis (red) and is compared with 143B genomic DNA, plotted on the left y-axis (blue). Significantly overrepresented chromosomes are indicated by black arrows. Normalized values were obtained by dividing the total number of bases determined for both 143B cfDNA and 143B genomic DNA of each repetitive element group on each chromosome by the actual number of bases that each of the different repetitive element groups occupies on each chromosome of the human genome (hg.19).
Additional analyses and supplementary material
Additional analyses that have been performed on the cfDNA sequences are described in detail in supplementary material. This includes (1) local alignment analyses and annotation of unique (unmasked) cfDNA sequences (Supplementary file 1, Section 3 and Supplementary Fig. 3), (2) the assessment of potential sequencing bias and procedural errors (Supplementary file 1, Section 4 and Supplementary Fig. 4), and (3) identification of sequences that did not align with the human genome (Supplementary file 1, Section 5 and Supplementary file 4). All supplementary figures are included in Supplementary file 1, Section 7.
Discussion
In a previous study, we have shown that the cfDNA present in the growth medium of cultured 143B cells after 24 h of incubation is mainly a product of an active release mechanism and not a consequence of apoptosis or necrosis. 25 The objective of this study was to investigate the origin and purpose of this specific population of cfDNA by means of sequence analysis. After sequencing, cfDNA sequences were re-assembled and then screened for REs, followed by local alignment analyses and annotation. Initial RE screening results showed that most of the cfDNA consists of repetitive DNA (88%; Figure 1), which exceeds any value predicted for the human genome (typical estimates range between 50% and 66%).26–28 Further analysis of the RE screening data showed that specific RE types are overrepresented (i.e. Alus, L1, ERV (K) class II, MaLR, TcMar-Tigger, satellites, and simple repeats (mini satellites)), while others are underrepresented or occur at levels comparable to the human genome (Figure 1). Moreover, it was demonstrated that different subfamilies within each RE type are significantly overrepresented (Figures 2–4). Local alignment analysis of the overrepresented RE types showed that they originate from specific chromosomes, especially 1 and 9 (Figure 5). Interestingly, while 143B genomic DNA show normal levels of REs, some chromosomes, especially 1 and 9, appear to harbor extra copies of satellites, mini satellites, and some other REs, mirroring the cfDNA chromosomal distribution profile. This may be the result of a specific subtype of genomic instability (see “Possible pathways for the extrusion of cfDNA” section). In the following sections, we discuss possible reasons why satellites and each of the different REs is over- or underrepresented, and why they originate mainly from specific chromosomes. We also discuss the potential implications of these findings.
The representation of class I transposons correlates with transposition activity
SINEs
Alu elements are notably overrepresented (Figure 1(b)). However, when the different Alu subfamilies are compared, it is clear that the cfDNA population contains nearly 10 times more elements originating collectively from the AluS and AluY subfamilies than from the AluJ family (Figure 4(a)). Interestingly, the former subfamilies harbor more functionally intact elements and are, therefore, more likely to transpose in the human genome, whereas the AluJ lineage is the least active and is considered to be functionally extinct. 29 Like AluJ, MIR elements are also underrepresented in the cfDNA population (Figure 1(b)) and no longer possess any transposition activity in the human genome. 30 In vitro studies indicate that Alu elements are capable of retrotransposing in somewhat differentiated cells,31,32 which suggests the possibility of activity in somatic tissues.
LINEs
The underrepresentation of L2 and L3/CR1 also correlates with its inactivity, 33 while the overrepresentation of L1 elements correlates with its active transposition status. Although there are more than 500,000 L1 copies in the human genome, most are considered to be molecular fossils. 34 It has been estimated that an average human genome contains only 80 to 100 retrotransposition-competent copies, which all belong to the human-specific L1 (L1HS) subfamily. L1HS elements are then further stratified into the L1-Ta subfamily (pre-Ta, Ta-0, Ta-1, Ta1-d, Ta1-nd), 35 which is estimated to account for ∼31.5% of all L1HS elements, 36 and the older L1P1 subfamily (L1PA2 and L1PA3) that comprises the remainder of the L1HS subfamily. 37 As illustrated in Figure 4(b), the L1P1 subfamily constitutes nearly half of the L1 population. Therefore, it is possible that the representation of the different L1 elements may be related to its transposition activity. Furthermore, there is a relatively small number of these L1P1 elements that have a very high coverage (Figure 4(b)), which may relate to a regulated process, such as gene amplification (see “Possible pathways for the extrusion of cfDNA” section).
LTRs
To date, 22 different human endogenous retrovirus (HERV) families have been identified; however, most of these have been rendered inert by the accumulation of mutations and incomplete sequences. 38 Those that do possess some activity are, in the main part, transcriptionally repressed by methylation. 39 However, a number of reports highlight cases in which specific ERVs can become transcriptionally active and even transpose. For example, in some cancers (e.g. bladder, ovarian, and testicular), the degree of HERV demethylation has been found to increase concomitantly with malignancy and is often accompanied by transcript expression.40,41 Furthermore, the deletion of DNA methyltransferase 3-like (Dnmt3L), a gene involved in the de novo methylation of retrotransposons in the germ cells of adult male mice, prevents the methylation of both LTR and non-LTR retrotransposons, which results in the reactivation of transposition activity. 42 Other researchers have achieved similar effects by the deletion of lymphoid-specific helicase (Lsh or HELLS). 43
The representation of class II transposons correlates with evolutionary age
As shown in Figure 3(f), TcMar-Tigger elements are overrepresented in the cfDNA population, while hAT-Charlie elements are underrepresented. Most DNA transposons are thought to be fixed within the human genome due to internal deletions and/or end-truncations, 44 and there is currently no evidence that they are currently active. 45 However, we can suggest one plausible explanation for the differential representation of these elements in the cfDNA. The TcMar-Tigger subfamily is comprised of several more subfamilies (Tigger1, Tigger2, Tigger2a, Tigger3, etc.), where the numerical suffix correlates with the age of the element and refers to the point in time when it diverged structurally from older subfamilies. This divergence then correlates with transposition activity. 46 In other words, the youngest elements are those that have transposed the most recently. Thus, when the different TcMar-Tigger subfamilies were further analyzed, it was found that two-thirds consist of Tigger1 (Figure 3(f)), the youngest member of the TcMar-Tigger family. In addition, the representation of the remaining Tigger elements also correlates with age. 46 Therefore, the representation of DNA elements appears to correlate with its capacity to transpose.
Although there is currently no evidence for the movement of DNA transposons in the human genome, there is a possibility that it has been occurring unnoticed. In Supplementary file 1, Section 6, we summarize some points to illustrate that the movement of DNA transposons in the human genome, within and between cells, is possible.
Taken together, the literature discussed above indicates that the cfDNA present in the growth medium of 143B cells after 24 h of incubation is composed primarily of satellites, mini satellites, and transposons that exhibit the capacity to become activated and transpose (i.e. AluS, AluY, L1P1, THE, and Tigger1). This raises the following important questions: why is the cfDNA enriched in satellites, mini satellites, and potentially active transposons? And why do they originate only from specific chromosomes?
The association between hypomethylation of repetitive DNA and cancer
The cancer genome is often characterized by a decrease in global 5-methylcytosine levels. 47 Since single-copy genes account for less than 5% of human DNA, 26 this large decrease of methylcytosine content mainly reflects methylation changes of CpG-rich non-coding DNA, in particular peri- and juxtacentromeric sat2 and sat3 (classical satellites), Alu, L1, and LTR-containing repeats (especially HERVs), but also includes the less CpG-rich centromeric α-satellite DNA. 48 At the basis of their q-arms, chromosomes 1, 9, and 16 have large satellite-repeat arrays. In healthy adult somatic cells, these chromosomal regions are normally densely methylated; 49 however, they are preferentially hypomethylated in many cancers, such as ovarian epithelial tumors, 50 breast adenocarcinomas, 49 and Wilms tumors. 51 This same phenomenon occurs in Immunodeficiency, Centromeric instability, and Facial anomalies (ICF) syndrome. ICF is a very rare autosomal recessive disorder characterized by variable immunodeficiency, mild facial anomalies, and centromeric decondensation.52,53 The most consistent molecular observation of this disease is the hypomethylation of the centromeric and pericentromeric regions of chromosomes 1, 9, and 16, while global DNA hypomethylation patterns present within the normal range.53–55 Typically, hypomethylation of these specific chromosomal loci in lymphocytes from ICF patients are accompanied by significant loss of organization, increased instability, followed by cytogenetic abnormalities, including (1) extension of the juxtacentromeric heterochromatin of metaphase chromosomes 1, 9, and 16; (2) formation of complex multiradiate chromosomes with multiple p- and q-arms derived from chromosomes 1 and 16; (3) breakage and rejoining of Sat2 resulting in the duplication of q-arms on chromosome 1; and (4) extrusion of self-associated Sat2 into secondary nuclear structures called micronuclei or nuclear buds. 56 Similarly, many cancers exhibit a very high frequency of non-random rearrangements in the centromeric and pericentromeric heterochromatin of chromosomes 1, 9, and 16, 57 which constitute chromosomal aberrations that are identical to those observed in ICF syndrome. 56
Further evidence that the centromeric and pericentromeric regions of chromosomes 1, 9, and 16 are predisposed to hypomethylation and chromosomal instability comes from numerous in vitro studies in which the genotoxic effects of various clastogens, in particular base analogs, have been evaluated. In these studies, administration of various compounds such as 5-Azacytidene, 5-azadeoxycytidine, 2,6-diaminopurine, idoxuridine, and mitomycin C to cells resulted in increased hypomethylation of the centromeric and pericentromeric regions mainly of chromosomes 1, 9, and 16. Moreover, this hypomethylation was usually accompanied by various chromosomal abnormalities, including heterochromatic decondensation, DNA breaks, and inclusion of DNA fragments into micronuclei (as observed in ICF and cancer).58–62 Perhaps it is also worthy to mention that the same chromosomal abnormalities arise in cells of which the centromeric and pericentromeric regions of chromosomes 1, 9, and 16 become spontaneously hypomethylated over time. 63
Therefore, we propose the hypothesis that the cfDNA present in the growth medium of 143B cells after 24 h of incubation is associated with the hypomethylation, and consequently the instability, of the centromeric and pericentromeric regions mainly of chromosomes 1 and 9 (some other chromosomes may also be involved to a lesser extent). This raises the next important question: why and how does the hypomethylation of these regions result in the presence of these specific cfDNA fragments in the extracellular environment? A detailed discussion of the mechanisms that are likely involved in this phenomenon is beyond the scope of this article and is the subject of a prospective review paper. Here, we will briefly outline possible explanations for this phenomenon.
Possible pathways for the extrusion of cfDNA
The structural fidelity of the centromere/kinetochore is crucial for the proper segregation of chromosomes during mitosis. 64 As discussed above, hypomethylation of the centromeric and pericentromeric regions of chromosomes 1 and 9 alters the structure of centromeric DNA, and, therefore, directly affects the formation of kinetochores on centromeres, consequently altering its interaction with microtubules. When these components malfunction, various microtubule attachment errors arise.65,66 Here, an important example is merotelic attachments, in which one kinetochore binds to microtubules growing from opposite spindle poles. Although most merotelic chromosomes segregate correctly during anaphase (i.e. move toward the poles), a small fraction may remain at the spindle equator, yielding lagging-chromosomes. 65 Depending on where the cleavage furrow ingresses during cytokinesis, the lagging chromosome can be included in either of the daughter cells. Interestingly, when a lagging chromosome is far enough from the remaining chromatin mass, it recruits its own nuclear envelope at the end of mitosis, forming a micronucleus. Consequentially, an interphase daughter cell contains two types of nuclei: the primary nucleus, which encapsulates the main chromatin mass, and an indefinite number of micronuclei that contain the mis-segregated chromosomes. While chromosome lagging and micronucleation can occur at low levels in normal cells, increased levels of micronuclei have been found in various pathologies, especially in cancer cells with chromosomal instability.67–70
Micronuclei can also arise during mitosis as a result of mis-repaired double-stranded DNA breaks (DSBs). 71 When a chromosome that has lost its telomere, due to a DSB, is replicated the two ends of the sisters fuse, producing a chromosome with two centromeres (dicentric chromosome).72,73 During anaphase, the two centromeres are pulled to opposite poles, generating an anaphase bridge.74–76 When an anaphase bridge breaks irregularly and in a gene region, which is very often the case, the broken chromosomes lose their telomere sequences at the broken end and usually fuse with their copy after replication.77–80 When the centromeres of the fused sister chromatids are pulled apart during the next anaphase, the resulting bridge breaks again. Following the next round of replication, the sister chromatids fuse once again, perpetuating the breakage-fusion-bridge (BFB) cycle through several rounds of mitosis, thus amplifying the DNA located near the break- or fusion point.81,82 This process usually proceeds until the marker chromosome acquires a new telomere. This may explain why certain REs are overrepresented in 143B genomic DNA. Interestingly, these amplified genes are eventually eliminated (looped out) from the aberrant chromosome when homotypic sequences within the amplified genes recombine to form mini-circles of acentric and atelomeric DNA (double minutes (DMs)). These DMs can either be replicated or localize to the nuclear periphery where they are eliminated from the nucleus via budding during S-phase, after which they can be extruded from the cell.80,83 Interestingly, these DMs most frequently consist of amplified copies of oncogenes, which may explain why specific transposons are significantly overrepresented in the cfDNA population. As this process can reduce the dosage of tumor suppressor genes,84–86 it may play a key role in the acquired resistance of cancer cells to different therapies and, therefore, constitute a step in tumorigenesis and propagation of cancer. Alternatively, the generation of DMs and its extrusion from cells may serve as a pathway for eliminating amplified oncogenes, resulting in the loss of the malignant phenotype.71,87 These nuclear buds generally have the same morphology as micronuclei, except that they are physically connected to the nucleus by a stalk of nucleoplasmic material at some stage of the budding process. 77
A number of studies have reported the occurrence of BFB intermediates in different osteosarcoma cell lines.88–91 By quantifying anaphase bridge configurations and dicentric chromosomes in four different osteosarcoma cell lines, it was demonstrated that BFB cycle events occur at a high enough frequency to serve as a mechanism for increasing the de novo formation of cytogenetic abnormalities in osteosarcoma. In the same study, an increase in the occurrence of micronuclei, chromatin strings, and nuclear blebs were observed compared to controls, which represents ruptured anaphase bridges.92,93 Another significant finding of this study is the observation of interstitial alpha-satellites in dicentric chromosomes that are preferentially located at chromosome ends. There is evidence that the alpha-satellites found in the centromeric and pericentromeric regions of human chromosomes represent hotspots for translocations and recombination in some solid tumors.94,95 Since these sequences have been observed to be specifically amplified in osteosarcoma, 96 this may explain why rearrangements often occur in the pericentromeric regions in osteosarcoma tumors and cell lines. 97 Moreover, in the SAOS-2 osteosarcoma cell line, chromosome 1 exhibits various complex rearrangements, including one rearrangement in which DNA from 1p35-p36 is duplicated and inverted in a der(9)t(1:9). 92 If some micronuclei are derived from ruptured anaphase bridges in which DNA fragmentation was initiated adjacent to the centromeres of these two chromosomes, it may explain why the cfDNA released by 143B cells is overrepresented in DNA originating from the centromeric and pericentromeric regions of chromosomes 1 and 9.
As discussed earlier, transposons are normally repressed by methylation. However, in cancer cells, hypomethylation can result in the reactivation of certain transposons, enabling transcription and transposition. There is evidence that transposons can generate DNA breaks. When replication forks traverse a DNA strand with a single-stranded nick, the fork can collapse and generate a DSB.98,99 In addition, it has been demonstrated that the nuclease produced by L1 can also generate DSBs. 100 As mentioned in the “LINEs” section, an average human genome contains about 80 to 100 L1 copies that are retrotransposition competent (or contain intact ORFs). By cloning 82 of these L1 copies, it was demonstrated that 40 of them are active in a 143B thymidine kinase negative (TK) TK cultured cell retrotransposition assay. 101 Of these 40 active L1s, 6 hot L1s accounted for 84% of the total activity of all 82 L1s tested. Using sequence divergence and allele frequency of each L1 as surrogate markers for age, it was shown that younger L1s have high activity in cultured cells, while older L1s displayed virtually no activity. As discussed in the “LINEs” section, cfDNA released by 143B cells is overrepresented in young L1 elements. Moreover, only four elements are significantly overrepresented, which may represent hot L1s.
As another example, there is extensive sequence similarity between centromere-protein B (CENP-B), a protein associated with the centromeres of most human chromosomes, and the transposase encoded by the human Tigger DNA transposon. In addition, the terminal inverted repeats (TIRs) of the Tigger2 elements contain a near perfect match to the CENP-B binding site (or CENP-B box), 102 which consist of a 17-bp alpha satDNA motif. 103 These structural similarities suggest two intriguing possibilities. First, CENP-B, akin to transposases, may also possess the ability to cause single-stranded breaks (SSBs). 102 Second, transposases could induce SSBs adjacent to the CENP-B box. There is strong evidence that both of these phenomena occur in higher organisms.104–106 It is also possible that larger DNA fragments could be excised by this mechanism.107–110 Thus, the generation of DSBs by transposons may play a role in the instability of the centromeric and pericentromeric regions of specific chromosomes, especially chromosomes 1 and 9, and result in the formation of dicentric chromosomes, anaphase bridges, and eventually the formation and extrusion of micronuclei that harbor satellites, mini satellites, and transposons that are enriched in the specific region.
An important question is whether the observations made in vitro are also reflective of the in vivo setting, that is, whether micronuclei are actually released into the bloodstream. As far as we know, there have been no attempts to isolate micronuclei directly from human blood. However, there is some indirect evidence that micronuclei, or at least the contents of micronuclei, are present in the extracellular environment: DNA fragments with a size of ∼2000 bp, such as those investigated in this study, and which are also present in the growth medium of several cultured cell lines,25,111 have been detected in human blood. However, it is readily dismissed as mere cellular DNA contamination (refer to application note 112 ). Therefore, in vitro studies provide compelling evidence that this may be an erroneous assumption. Interestingly, it has recently been demonstrated that automated cfDNA isolation systems (e.g. KingFisher), which are the most commonly used in clinical assays, are tailored for short fragments and typically fail to isolate DNA fragments in the range of ∼2000 bp (refer to aforementioned application note 112 ). Since the majority of cfDNA research is clinically motivated (and focused on short cfDNA fragments), it may explain why the larger ∼2000 bp cfDNA population is not often encountered. Furthermore, a recent paper reports for the first time the presence of extrachromosomal circular DNA (whose cellular extrusion is facilitated by inclusion into DMs) in human blood. 113 This finding has been corroborated by a different research group, who have demonstrated the presence of a heterogeneous population of extrachromosomal circular DNA, with a size range between 30 and 20,000 bp, in human blood. 114 Since these DNA fragments are larger than apoptotic DNA fragments, it is unlikely that this pool of cfDNA molecules have yet been investigated for clinical purposes. In this regard, the commonly held assumption that apoptosis is the main origin and most relevant fraction of cfDNA in human blood may be incorrect, restrictive, and should be reconsidered. It may be a major breakthrough in the field of clinical diagnostics if the DNA actively released by cultured cells bears any resemblance to the DNA originating from its in vivo counterpart. Therefore, this warrants the implementation of comparative studies. Further investigation of the molecular characteristics of cfDNA, and inquiry into the mechanisms involved in its release, may provide deeper insight into the correlations observed between the properties of cfDNA and clinicopathological data, and may also expedite the search for different diagnostic, prognostic, and theranostic cfDNA biomarkers.
Conclusion
The cfDNA present in the culture medium of 143B cells after 24 h of incubation is neither a consequence of apoptosis nor necrosis.24,25 Herein, we have shown that this DNA consists mainly of satellites, mini satellites, and transposable elements (TEs) that are either currently active in the human genome or show the capacity to become reactivated. We have also shown that these cfDNA fragments originate mainly from chromosomes 1 and 9. We propose the following hypothesis for the origin of these cfDNA fragments: the centromeric and pericentromeric regions of chromosomes 1 and 9 are predisposed to increased hypomethylation. On one hand, this can lead to the activation of transposons that are capable of inducing DSBs. On the other hand, structural changes within the centromeric and pericentromeric regions can directly impair the formation of proper kinetochores and subsequently result in chromosome mis-segregation. The type of instability caused by these changes serves as a precursor to the formation of various nuclear anomalies, including lagging whole or fragmented chromosomes, and different types of anaphase bridges. While the exact sequence of events following the formation of these nuclear anomalies remains to be elucidated, numerous studies indicate that DNA fragments derived from these abnormal nuclear structures eventually recruit their own nuclear envelope, forming secondary nuclear structures like micronuclei and DMs (that contain extra-chromosomally amplified DNA). These structures are often localized to the nuclear periphery, after which they form protrusions and can be eliminated from the cell. Therefore, we hypothesize that the cfDNA present in the growth medium of 143B cells originates from these secondary nucleic structures. A summary of this hypothesis is illustrated in Figure 6.

A provisional hypothesis for the origin of actively released cell-free DNA in cancer. In normal cells, the repetitive elements in the centromeric and pericentromeric regions of chromosomes 1 and 9 are normally densely methylated. ① In cancer cells these regions are predisposed to increased hypomethylation. ② This can result in the derepression of transposons, followed by aberrant translocations and the induction of DSBs. Alternatively, hypomethylation can impair the formation of proper kinetochores and centromeres. ③ When a chromosome that has lost its telomere is replicated the two ends of the sisters fuse, producing a dicentric chromosome. ④ During anaphase, the two centromeres are pulled to opposite poles, generating an anaphase bridge. Anaphase bridges can break in different regions, generating DNA fragments of various sizes and content. ⑤ These DNA fragments can recruit their own nuclear envelope, forming secondary nuclear structures known as micronuclei. ⑥ Micronuclei are then localized to the nuclear periphery, after which they form protrusions and can become eliminated from the cell. Once in the extracellular space, these micronuclei can be referred to as cell-free DNA. Because of a complex mechanism, the generation of double minutes via anaphase breakage-fusion-bridge (BFB) cycles has not been illustrated in this figure. However, the mechanism is shown in Supplementary Fig. 5.
If it turns out that we can in fact measure and distinguish between different kinds of micronuclei that are present in the circulation of a given individual, this will give us a unique window through which we can non-invasively monitor genome stability, as well as peer into the inner workings of cancer (and other genotoxic diseases) in a very non-invasive, more effective, manner. While observations of chromosome 1 and 9 centromeric and pericentromeric instability, anaphase bridging, and micronucleus formation in osteosarcoma tumors and cell lines lend some credibility to our hypothesis, it currently lacks direct experimental evidence. This can be unequivocally tested by correlating the characteristics of the cfDNA population with the biology of cells that are investigated under various conditions, including the (1) application of direct force to kinetochores using micromanipulation, (2) targeted chemical destabilization of kinetochore-microtubule attachments, (3) genetic knock-down of key mitotic machinery proteins and methylation enzymes, (4) chemical- or irradiation-induced DSBs, (5) deprivation of folate or similar metabolic precursors, and (6) prevention of normal methylation using base analogs (e.g. 5-Azacytidene, 2,6-diaminopurine, idoxuridine).
One limitation of this study is the low number of replicates (n = 2) that were sequenced and the investigation of only one cell line. It will be of particular interest to compare the cfDNA sequence profiles of different cell lines, especially between healthy and malignant cell lines. One caveat of using cell cultures for elucidating the origin of cfDNA is that slight differences in culture conditions, such as the flask, medium composition, and temperature fluctuations, can result in significant changes in the biology of these cells (even if they are clones). Culturing a large panel of cell lines under identical settings may overcome this problem. 115 Nevertheless, the information revealed in this study should contribute to our understanding of the origin of cfDNA and the extent of its biological significance. It may also serve as an entry point for bridging in vitro and in vivo studies.
Supplemental Material
TUB_Supplementary_file_1 – Supplemental material for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells
Supplemental material, TUB_Supplementary_file_1 for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells by Abel Jacobus Bronkhorst, Johannes F Wentzel, Vida Ungerer, Dimetrie L Peters, Janine Aucamp, Etienne Pierre de Villiers, Stefan Holdenrieder and Piet J Pretorius in Tumor Biology
Supplemental Material
TUB_Supplementary_file_2 – Supplemental material for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells
Supplemental material, TUB_Supplementary_file_2 for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells by Abel Jacobus Bronkhorst, Johannes F Wentzel, Vida Ungerer, Dimetrie L Peters, Janine Aucamp, Etienne Pierre de Villiers, Stefan Holdenrieder and Piet J Pretorius in Tumor Biology
Supplemental Material
TUB_Supplementary_file_3 – Supplemental material for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells
Supplemental material, TUB_Supplementary_file_3 for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells by Abel Jacobus Bronkhorst, Johannes F Wentzel, Vida Ungerer, Dimetrie L Peters, Janine Aucamp, Etienne Pierre de Villiers, Stefan Holdenrieder and Piet J Pretorius in Tumor Biology
Supplemental Material
TUB_Supplementary_file_4 – Supplemental material for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells
Supplemental material, TUB_Supplementary_file_4 for Sequence analysis of cell-free DNA derived from cultured human bone osteosarcoma (143B) cells by Abel Jacobus Bronkhorst, Johannes F Wentzel, Vida Ungerer, Dimetrie L Peters, Janine Aucamp, Etienne Pierre de Villiers, Stefan Holdenrieder and Piet J Pretorius in Tumor Biology
Footnotes
Acknowledgements
We would like to thank Dr E van Dyk (Thermo Fisher Scientific) for performing the sequencing of cell-free DNA.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: A.J.B. (Grant ID: SFH13092447078) and J.A. (Grant ID: SFH14061869958) were supported by post-graduate scholarships from the National Research Foundation (NRF), South Africa. The financial assistance of the NRF is hereby acknowledged. Opinions expressed and conclusions arrived at are those of the authors and are not to be attributed to the NRF. A.J.B. also thanks the African-German Network of Excellence in Science (AGNES) for granting a Mobility Grant in 2016; the Grant is generously sponsored by the German Federal Ministry of Education and Research and supported by the Alexander von Humboldt Foundation.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
