<italic>In Silico</italic> Separation of <italic>in Vitro</italic> Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization

The polymerase chain reaction (PCR) amplification process of deoxyribonucleic acid (DNA) libraries can introduce bias in the sequence ratios. Consequently, several recent genomic and transcriptomic methods employing next-generation sequencing (NGS) utilize in vitro transcription (IVT) to amplify template polynucleotide chains. IVT amplifies nucleic acid sequences linearly, making it less susceptible to bias than the exponential amplification of PCR. Chromatin integration labeling sequencing (ChIL-seq), a tool for analyzing transcription factor binding and histone modifications, has incorporated IVT by replacing PCR in the DNA amplification step, enabling the analysis of small sample sizes, including single cells. In this study, we discovered that many of the excluded sequences known as PCR duplicates during the pre-processing step of ChIL-seq data analysis contain amplification products derived from IVT. Furthermore, we developed an in silico method to selectively eliminate PCR duplicates from NGS data while retaining IVT-derived amplification products. The method prevents excessive data reduction and significantly improves the utilization efficiency of NGS data.

Keywords

transcription, PCR duplicate, data pre-processing, amplification bias, chromatin

Introduction

Next-generation sequencing (NGS) enables large-scale omics analyses. In some experiments, such as chromatin immunoprecipitation (ChIP),¹ only a very small amount of deoxyribonucleic acid (DNA) can be recovered as a template for the NGS library. In such cases, amplification of nucleic acid chains in vitro is necessary for NGS analysis. Typically, polymerase chain reaction (PCR)² is employed to attach common sequences for sequencing reactions to NGS libraries and to amplify the DNA library. However, PCR amplification can introduce sequence-dependent bias during the exponential amplification step.³ Therefore, an in silico pre-processing step removing duplicated sequences (hereafter referred to as “PCR duplicates”)⁴ from the raw data is often introduced. Recent omics techniques using NGS employ in vitro transcription (IVT) in place of or addition to PCR as the amplification step.⁵ IVT is a linear amplification method for polynucleotide chains; therefore, it has been developed to experimentally avoid PCR-derived bias.⁶ Therefore, in principle, NGS libraries amplified by IVT do not require in silico removal of duplicates as a pre-processing step. In the chromatin integration labeling sequencing (ChIL-seq) technique, an alternative to ChIP-seq, the IVT method using the bacteriophage-derived T7 promoter is combined with a low-cycle PCR reaction when constructing NGS libraries.⁷ However, we found that DNA sequences derived from IVT amplification were eliminated along with PCR duplicates in the current data analysis pipeline for ChIL-seq.⁷ In this study, we present a pre-processing method that distinguishes and selectively retains IVT-derived duplicate sequences from PCR-derived duplicate sequences using paired-end NGS data to identify the duplicates to be removed. Finally, we confirmed that the method prevented excessive and unnecessary data reduction, and significantly improved the utilization efficiency of NGS data.

Methods

Isolation of mouse embryonic fibroblasts

Mouse embryonic fibroblasts (MEFs) were prepared from C57BL6 female mouse embryos at 13.5 days post-coitum and cultured in MEF medium as described previously.⁸

Retrovirus production

As described previously,⁸ mouse Hnf4α and Foxa2 cDNAs were obtained by reverse transcription PCR and subcloned into pGCDNsam (a gift from M. Onodera). To produce recombinant retroviruses, plasmid DNA was transfected into Platinum-E (Plat-E) cells using linear polyethylenimine (PEI; Polysciences, Warrington, PA) following the method described previously.⁹ Three days before transfection, Plat-E cells (1.6 × 10⁶) were plated on poly L-lysine-coated 10 cm dishes. For transfection, 36 μL of 1 mg/mL PEI and 12 μg of retroviral plasmid DNA were diluted in 1 mL of Dulbecco’s modified Eagle medium (DMEM) and incubated for 15 minutes at room temperature. The mixture was then added dropwise to plated Plat-E cells. After 6 hours of incubation at 37°C under 5% CO₂, the medium was replaced with fresh MEF medium, and the culture was continued. Supernatants from the transfected cells were collected at 24 hours after medium was replaced, filtered through 0.2 μm cellulose acetate filters (Sartorius), and concentrated by centrifugation at 10,000 × g for 16 hours at 4°C. Viral pellets were resuspended in Hanks’ balanced salt solution (Nissui, Tokyo, Japan) at 1/140 of the initial supernatant volume.

Transduction of cells and generation of induced hepatic progenitor cells

As described previously,^9,10 MEFs were grown on gelatin-coated 12-well plates until they reached 20%–30% confluency. The cells were then incubated in MEF medium containing the concentrated viral supernatants and 5 μg/mL protamine sulfate (Nacalai Tesque, Kyoto, Japan) for 6 hours. Viral infection was performed twice using 20 uL virus solutions per factor. Two days after the final infection, the cells were re-plated in 6-well plates coated with type I collagen (Iwaki, Tokyo, Japan) and cultured in a hepato-medium. The hepato-medium consisted of a 1:1 mixture of DMEM and Ham’s F-12 medium (Nacalai Tesque), supplemented with 10% fetal bovine serum, 1 μg/mL insulin (Wako, Tokyo, Japan), 10⁻⁷ M dexamethasone (Sigma-Aldrich, St. Louis, MO), 10 mM nicotinamide (Sigma-Aldrich), 2 mM L-glutamine, 50 mM β-mercaptoethanol (Nacalai Tesque), penicillin/streptomycin 20 ng/mL hepatocyte growth factor (PeproTech, Cranbury, NJ), and 20 ng/mL epidermal growth factor (Sigma-Aldrich). After reaching confluence, cells were passaged at a density of 1 × 10⁵ cells/passage.

ChIL-seq

ChIL-seq was performed after confirming the appearance of induced hepatic progenitor cells (iHepCs), which were defined as albumin/E-cadherin double-positive cells. A total of 1 × 10⁴ iHepCs were re-plated in 96–well plates coated with type I collagen (Iwaki) and cultured in a hepato-medium. After 16 hours, cell fixation and subsequent ChIL-seq were performed as described previously.⁷

Quality check and detection of duplicates

Raw reads were trimmed and duplicates were removed using fastp (v0.23.4) with the parameters -D –dup_calc_accuracy 6 (related to Figure 1B). The ratio of unique reads was calculated as the proportion of raw reads retained before and after duplicate removal, using fastp (related to Figure 3). The reads were counted using the fqtools count program (v2.1).

Figure 1. Comparison of duplication rates between IVT-based and non-IVT-based methods: (A) Schematic representation of ChIL-seq library preparation. (B) Duplication rates of IVT-based methods (ChIL-seq and TIP-seq) and non-IVT-based methods (ChIP-seq and CUT&RUN). The effect size of the difference in the duplication rates between Read1 and Read2 was evaluated using Cohen’s h. Schematic of ChIL-seq library prep, Comparison of duplication rates for IVT-based methods vs. non-IVT for ChIP-seq, ChIL-seq, CUT&RUN, TIP-seq. Cohen's h indicates effect size of duplication rate differences.

Mapping to the reference genome

Duplicate-removed reads were mapped to the reference genome (GRCm38/mm10 for mice and GRCh38/hg38 for humans) using HISAT2 (v2.0.5) with the parameters -k 1 –no-spliced-alignment. After mapping, reads aligned to the mitochondrial genome (chrM) were excluded from subsequent analyses using grep with the parameter -v chrM. SAM files with the mitochondrial genome removed were converted to BAM using SAMtools (v1.9) with the -F 4 option to remove unmapped reads. The BAM files were sorted and indexed. The strand-specific BAM files were filtered using the parameter f = 16.

Peak call

Peak calling was performed using Read1 processed in the single-end mode. The transcription factors (TFs) peaks were identified using MACS (v2.2.9.1; https://github.com/taoliu/MACS/) with the options: -g mm/hs -q 0.01 –nomodel –shift 0 –extsize 200 –keep-dup all (related to Figure 2), or -g mm -q 0.000000001 –nomodel –shift 0 –extsize 200 –keep-dup all (related to Figure 4).

Figure 2. Distribution of Read1 and Read2 data. Distribution of Read1 and Read2 in IVT-based (ChIL-seq and TIP-seq) and non-IVT-based methods (ChIP-seq and CUT&RUN). Genome strand-specific Read1 and Read2 coverage around ± 0.5 kb of TF peaks is plotted. The Read1 and Read2 data were used after removing duplicates, respectively. The correlation between the two aggregation plots was calculated using Spearman’s rank correlation coefficient (ρ). Distribution of Read1 and Read2 data.Roles of ρ of Read1 and Read2 in non-IVT and IVT-based methods.

Signal quantification

Signal coverage was calculated using deepTools (v3.5.4) in the bamCoverage mode with the following parameters: –normalizeUsing CPM –smoothLength 100 –minMappingQuality 60 –binSize 10.

Signal aggregation around peaks

Aggregation plots were generated using the deepTools (v3.5.4) reference-point mode with the following parameters: –referencePoint center –upstream 500 –downstream 500 –skipZeros –missingDataAsZero. Strand-specific signals around the TF peaks were plotted using Read1 processed in single-end mode (related to Figure 2).

De novo motif detection and comparison with known motif datasets

The summits data obtained from MACS were extended by 1 kb on both sides using the bedtools (v2.31.1) and categorized based on their overlaps between the datasets. Motif analysis was performed for each category using the findMotifsGenome.pl program in the HOMER package (v4.11) with the following parameters: -size 500 -mask -p 16 S 50.

Data and code availability

The deep-sequencing data generated in this study, ie, ChIL-seq data, were deposited in the Gene Expression Omnibus (GEO) under the accession code GSE282869. Previously published datasets reanalyzed in this study, including ChIP-seq, CUT&RUN, and targeted insertion of promoter sequencing (TIP-seq)¹¹ data, are available under the following accession codes: GSE213460 (ChIP-seq, mouse liver, Hnf4α), GSE84474 (CUT&RUN, K562 cells, CTCF), and GSE188512 (TIP-seq, HCT116 cells, CTCF). The shell scripts used for the analyses in this study are available from the following GitHub repository: https://github.com/kenichihorisawa/pipeline_for_IVT_duplicates.

Result

In ChIL-seq, Tn5 transposase inserts the T7 promoter sequence and Read1 adaptor sequence, which enables the use of NGS primer,¹² into the genomic DNA around the target proteins. T7 RNA polymerase, which is added to the sample, binds to the T7 promoter sequence and initiates the IVT reaction by transcribing RNAs including the Read1 adaptor sequence and genomic sequences around the target proteins. The IVT reaction products are then reverse transcribed into single-stranded DNAs using reverse transcriptase and a 9-mer random primer containing the Read2 adaptor sequence. Subsequently, the DNAs were added index sequences and converted into double-stranded DNAs through a low-cycle PCR reaction. Finally, the resulting NGS library is analyzed using NGS instruments (Figure 1A). The developers of ChIL-seq used Read1 sequences as single-end analysis data because the Read1 sequences were closer to the target sites than the Read2 sequences. Thus, they did not read or discard Read2 sequence data. In the series of library preparations, the Read1 sequence was considered to be dependent on the position of the transcription start site (TSS) of the IVT reaction, whereas the Read2 sequence was considered to be dependent on the termination point of the IVT reaction and/or the position where the random primer anneals (Figure 1A).¹³ Therefore, it can be hypothesized that Read1 sequences have a high degree of overlap, whereas Read2 sequences have a high degree of diversity and a low degree of overlap. To test this model, we examined the proportion of overlap in the Read1 and Read2 sequences for IVT-based methods (ChIL-seq and TIP-seq) and non-IVT-based methods (ChIP-seq and CUT&RUN¹⁴). Non-IVT-based methods showed very little difference in the duplication rates between Read1 and Read2 (Cohen’s h = 0.0052–0.0117). However, as expected, the IVT-based methods revealed a higher degree of overlapping sequences in Read1 compared than in Read2 (Cohen’s h = 0.3012–0.5767) (Figures 1B and S1). These results suggest that, in addition to PCR duplicates, a significant number of duplicate sequences derived from IVT amplification were present in the library, as confirmed by Read2. According to the principles of the IVT, this phenomenon is likely to be common in IVT-based methods. We define “IVT duplicates” as duplicates where only Read1 matches and Read2 does not.

During the IVT reaction, the T7 RNA polymerase binds to the T7 promoter and then transcribes the genome-integrated adaptor and genomic sequences in the 5′ to 3′ direction. Therefore, if Read1 sequences are prone to duplication, as hypothesized, and Read2 sequences exhibit greater diversity and are less susceptible to duplication, we would expect Read2 to be widely distributed and detected along the direction of IVT progression. To examine this, a DNA strand-specific aggregation plot was created around the ChIL-seq peak detected in the Read1 data, and the distributions of Read1 and Read2 were analyzed (Figures 2 and S2). The results showed that in the non-IVT-based method, the Read1 and Read2 sequences were distributed at almost the same position [Spearman’s rank correlation coefficient (ρ) = 0.9910–0.9992], whereas in the IVT-based method, the Read2 sequences were broadly distributed along the direction of IVT progression, which consequently resulted in a relatively low correlation between the two plots, as expected (ρ = 0.3508–0.8880). This result supported our hypothesis that duplicate IVT sequences were present in the resulting NGS data.

In the pre-processing of the analysis, PCR duplicates are often removed as they can lead to incorrect interpretations, also in the conventional ChIL-seq data pipeline, duplicates were removed from single-end data (Read1). However, the algorithm for removing duplicate sequences cannot distinguish IVT duplicates from PCR duplicates, which may lead to the excessive removal of sequences from raw data, including IVT-derived production. To resolve this issue, we acquired paired-end ChIL-seq data and used both sets of paired-end data (Read1 and Read2) to distinguish IVT and PCR duplicates. Consequently, a significantly larger number of sequences remained than when only Read1 data were used to remove PCR duplicates (Figures 3 and S3). Therefore, we concluded that we could selectively eliminate only PCR duplicates without losing IVT duplicates by using paired-end data and paired-end data-based pre-processing for PCR duplicate removal.

Figure 3. Removal of PCR duplicates in paired-end analysis significantly increases the number of unique reads. Ratio of unique reads in the IVT-based (ChIL-seq and TIP-seq) and non-IVT-based methods (ChIP-seq and CUT&RUN). “Single” refers to data in which duplicates were removed using only Read1. “Pair” refers to data in which duplicates were removed using both Read1 and Read2. The effect size of the difference in unique read counts between Single and Pair deduplication methods was evaluated using Cohen’s h. Comparison of unique read counts in non-IVT-based and IVT-based methods: ChIP-seq, ChIL-seq, CUT&RUN, and TIP-seq, based on Single or Pair deduplication. The h values indicate the effect size.

Furthermore, we examined whether differences in duplicate removal methods lead to qualitative differences in analysis outcomes. Using the same raw ChIL-seq data for Hnf4α, we applied two duplicate removal strategies: one based on Read1 alone and another using both Read1 and Read2, as described earlier. After the removal of the duplicates, only the retained Read1 data were then mapped to the genome, followed by peak calling. Read1 data were mapped into the reference genome sequence. As a result, duplicate removal using both Read1 and Read2 identified more peaks than the method using Read1 alone. Moreover, nearly all peaks detected with Read1-only deduplication were included in those obtained using both Read1 and Read2 (Figure 4). In addition, de novo motif analysis of sequences within the overlapping peaks and those uniquely detected by the method using both Read1 and Read2 revealed significant enrichment of the Hnf4α binding motif [HNF4a(NR), DR1/HepG2-HNF4a-ChIP-Seq(GSE25021)/Homer] in both cases. These findings suggest that duplicate removal using both Read1 and Read2 enables more comprehensive biological data extraction from the same dataset compared to the conventional pre-processing method (Figure 4).

Figure 4. Results of peak calling and motif analysis from pre-processed datasets with two different methods. The workflow for mapping and peak calling using two pre-processed datasets (Single and Pair) derived from the same raw data is presented. It also shows the number of detected peaks and their overlap. The detected motifs in the datasets were compared with the known Hnf4α binding motif. The p-value is from de novo motif search, and the similarity score is from motif comparison analysis with known motif datasets. Cite this image in a science article

Discussion

In the present study, we found that the NGS data obtained from IVT-based methods contained a large number of DNA fragments that completely match the Read1 data but not the Read2 data (IVT duplicates). Based on Cohen’s h, the IVT-based methods demonstrated effect sizes around the medium range (h = 0.5) in the difference in duplicate rates between Read1 and Read2, whereas the non-IVT-based methods exhibited effect sizes well below the threshold for a small effect (h = 0.2). Furthermore, we found that only PCR duplicates could be selectively eliminated using paired-end data and pre-processing pipelines. This improvement can be achieved with previously unused Read2 sequencing data and making only minor changes in the in silico pipeline, and without requiring any changes to the experimental protocol. As a result, this method prevents the unintended removal of sequences and significantly increases the number of unique reads available for analysis. Upon evaluation of the proportion of remaining unique reads after deduplication, the two types of approaches produced widely different effect sizes. This indicates an improved data usage efficiency. However, after duplicate removal in paired-end mode, single-ended analyses using Read1 may preferentially focus on regions near the target protein’s binding sites. IVT duplicates show different characteristics in Read1 and Read2 data. In Read1, the TSSs of the IVT reaction (5′ end of the IVT duplicates) are constrained by the T7 promoters which are integrated into the genome during the experimental procedure, whereas the termination of the IVT reaction is stochastic (3′ end of the IVT duplicates). In addition, it is assumed that there is diversity in the annealing positions of the random primers used for reverse transcription. A combination of these factors may be responsible for the diversity of IVT duplicates in Read2. The analysis reveals that the aggregation plots for Read1 and Read2 are nearly identical in non-IVT-based methods. Conversely, in IVT-based methods, the distributions show a lesser degree of overlap, even though they maintain a positive correlation. This observation is consistent with the above discussion.

During nucleic acid chain amplification, IVT amplifies nucleic acid chains linearly, unlike PCR, which amplifies DNA fragments exponentially.^2,4,15 As a result, the artificial bias due to sequence differences is theoretically very low in IVT. Therefore, distinguishing IVT duplicates from PCR duplicates and selectively removing PCR duplicates should retain more unique sequence reads, thereby improving data usage efficiency. Indeed, ChIL-seq data processed while retaining IVT duplicates showed a significantly higher number of peaks compared to data processed with the conventional pre-processing protocol using the same raw data. Furthermore, the additionally detected peaks exhibited a strong concentration of Hnf4α-specific binding sequences, similar to the intersected peaks between the datasets from the two pre-processing methods. These findings suggest that the conventional pre-processing method using single-end data has excessively and unnecessarily reduced the data, and the deduplication method with paired-end data may provide deeper biological insights than the conventional method.

Although many IVT-based NGS analysis techniques have been developed, a standardized method for data processing has yet to be established. For example, some studies analyzed only Read1 data,^7,11,16,17 whereas others utilized both Read1 and Read2 data.¹⁸ Understanding the duplication bias present in the data and performing appropriate pre-processing are essential. However, the characteristics and effects of IVT duplicates on data analysis have not been fully elucidated; therefore, IVT duplicates may introduce unexpected biases or misinterpretations. Further validation and evaluation of the characteristics of IVT duplicates are required. If IVT duplicates are determined to be artifacts that should be removed, the appropriate use of IVT during pre-analytical processes and NGS library preparation should be reconsidered. These efforts are expected to improve the quality and productivity of NGS analysis by avoiding the erroneous removal of valid duplicates.

Unique molecular identifiers (UMIs) represent a modern approach to removing duplicates from NGS data. Multiplexed indexed unique molecule T7 amplification end-to-end sequencing (MINUTE-ChIP), an enhanced ChIP-seq protocol using both IVT and UMI technologies,^19,20 exemplifies this by reducing the excessive filtering of reads compared to in silico deduplication via Picard—a process technically identical to our proposed paired-end deduplication.¹⁹ This finding suggests the potential benefits of integrating UMIs into IVT-based protocols such as ChIL-seq and TIP-seq. Nevertheless, such an approach requires modifications to the library construction protocol. For this reason, and due to significant advantages like its compatibility with existing public data, we argue that our proposed analysis pipeline is also a valuable tool.

Conclusion

Our study presents an in silico method to distinguish and selectively retain IVT-derived duplicates while eliminating only PCR duplicates in NGS data generated by IVT-based methods. By leveraging paired-end sequencing, the proposed approach prevents unnecessary data loss, significantly increasing the number of usable reads without altering the experimental protocol. Our findings demonstrate that conventional single-end deduplication pipelines may overlook valuable IVT-derived reads, thereby underestimating true biological information. The enhanced data utilization achieved by this method enables more comprehensive analysis from the same dataset, offering deeper biological insights and improving the overall quality of IVT-based NGS investigations.

Supplemental Material

sj-docx-1-bbi-10.1177_11779322251365042 – Supplemental material for In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization Supplemental material, sj-docx-1-bbi-10.1177_11779322251365042 for In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization by Ryoga Suzuki, Kenichi Horisawa, Kazumitsu Maehara, Yasuyuki Ohkawa and Atsushi Suzuki in Bioinformatics and Biology Insights

Supplemental Material

sj-pdf-2-bbi-10.1177_11779322251365042 – Supplemental material for In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization Supplemental material, sj-pdf-2-bbi-10.1177_11779322251365042 for In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization by Ryoga Suzuki, Kenichi Horisawa, Kazumitsu Maehara, Yasuyuki Ohkawa and Atsushi Suzuki in Bioinformatics and Biology Insights

We thank Mariko Tasai, Yoshimi Iwasaki, Mitsuhiro Kurata, Yuuki Honda, Kanako Ichikawa, Ryo Ugawa, and Emiko Koba for excellent technical assistance. This study was performed on the infrastructure of Omics Science Center Secure Information Analysis System, Medical Institute of Bioregulation at Kyushu University.

References

1 Solomon MJ, Larsen PL, Varshavsky A., Mapping proteinDNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell. 1988;53:937-947. doi:10.1016/S0092-8674(88)90469-2
2 Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H., Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol. 1986;51:263-273. doi:10.1101/SQB.1986.051.01.032
3 Smith EN, Jepsen K, Khosroheidari M, , et al. Biased estimates of clonal evolution and subclonal heterogeneity can arise from PCR duplicates in deep sequencing experiments. Genome Biol. 2014;15:420. doi:10.1186/s13059-014-0420-4
4 Rochette NC, Rivera-Colón AG, Walsh J, Sanger TJ, Campbell-Staton SC, Catchen JM., On the causes, consequences, and avoidance of PCR duplicates: towards a theory of library complexity. Mol Ecol Resour. 2023;23:1299-1318. doi:10.1111/1755-0998.13800
5 Milligan JF, Uhlenbeck OC, . [5] Synthesis of small RNAs using T7 RNA polymerase. In: Dahlberg JE, Abelson JY, , eds. Methods in Enzymology. Vol 180: RNA Processing Part A: General Methods. Academic Press; 1989:51-62. doi:10.1016/0076-6879(89)80091-6
6 Hoeijmakers WAM, Bártfai R, Françoijs KJ, Stunnenberg HG, . Linear amplification for deep sequencing. Nat Protoc. 2011;6:1026-1036. doi:10.1038/nprot.2011.345
7 Harada A, Maehara K, Handa T, , et al. A chromatin integration labelling method enables epigenomic profiling with lower input. Nat Cell Biol. 2019;21:287-296. doi:10.1038/s41556-018-0248-3
8 Sekiya S, Suzuki A., Direct conversion of mouse fibroblasts to hepatocyte-like cells by defined factors. Nature. 2011;475:390-393. doi:10.1038/nature10263
9 Miura S, Suzuki A., Generation of mouse and human organoid-forming intestinal progenitor cells by direct lineage reprogramming. Cell Stem Cell. 2017;21:456-471.e5. doi:10.1016/j.stem.2017.08.020
10 Horisawa K, Udono M, Ueno K, , et al. The dynamics of transcriptional activation by hepatic reprogramming factors. Mol Cell. 2020;79:660-676.e8. doi:10.1016/j.molcel.2020.07.012
11 Bartlett DA, Dileep V, Handa T, , et al. High-throughput single-cell epigenomic profiling by targeted insertion of promoters (TIP-seq). J Cell Biol. 2021;220:e202103078. doi:10.1083/jcb.202103078
12 Launen AL, . Illumina sequencing (for Dummies) -An overview on how our samples are sequenced. kscbioinformatics. Published February 13, 2017. Accessed July 11, 2024. https://kscbioinformatics.wordpress.com/2017/02/13/illumina-sequencing-for-dummies-samples-are-sequenced/
13 Stangegaard M, Høgh Dufva I, Dufva M., Reverse transcription using random pentadecamer primers increases yield and quality of resulting cDNA. Biotechniques. 2006;40:649-657. doi:10.2144/000112153
14 Skene PJ, Henikoff S., An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife. 2017;6:e21856. doi:10.7554/eLife.21856
15 Kebschull JM, Zador AM., Sources of PCR-induced distortions in high-throughput sequencing data sets. Nucleic Acids Res. 2015;43:e143. doi:10.1093/nar/gkv717
16 Zhao L, Xing P, Polavarapu VK, , et al. FACT-seq: profiling histone modifications in formalin-fixed paraffin-embedded samples with low cell numbers. Nucleic Acids Res. 2021;49:e125. doi:10.1093/nar/gkab813
17 van Galen P, Viny AD, Ram O, , et al. A multiplexed system for quantitative comparisons of chromatin landscapes. Mol Cell. 2016;61:170-180. doi:10.1016/j.molcel.2015.11.003
18 Chen C, Xing D, Tan L, , et al. Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI). Science. 2017;356:189-194. doi:10.1126/science.aak9787
19 Kumar B, Elsässer SJ., Quantitative multiplexed ChIP reveals global alterations that shape promoter bivalency in ground state embryonic stem cells. Cell Rep. 2019;28:3274-3284.e5. doi:10.1016/j.celrep.2019.08.046
20 Kumar B, Navarro C, Yung PYK, , et al. Multiplexed chromatin immunoprecipitation sequencing for quantitative study of histone modifications and chromatin factors. Nat Protoc. 2025;20:779-809. doi:10.1038/s41596-024-01058-z

Bioinformatics and Biology Insights

In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization

Article Information

Abstract

Keywords

Introduction

Methods

Result

Discussion

Conclusion

Supplemental Material

Supplemental Material

References

In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization

Sign in to my Account

Login with

Register for an Account

Bioinformatics and Biology Insights

In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization

Article Information

Abstract

Keywords

Introduction

Methods

Result

Discussion

Conclusion

Supplemental Material

Supplemental Material

References

In Silico Separation of in Vitro Transcription-Derived Duplicates From PCR Duplicates to Enhance Sequence Data Utilization