Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis

Abstract

Objectives:

This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.

Methods:

Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.

Results:

Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.

Conclusion:

In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.

Keywords

transcript isoforms regulatory RNAs colon cancer spheroids transcriptomics

Introduction

Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and remains a leading cause of cancer-related mortality.¹ Key molecular hallmarks of CRC include aberrant activation of the WNT/β-catenin pathway, mutations in KRAS and TP53, and widespread dysregulation of gene expression and alternative splicing.² Tumors primarily originate from specific genetic lesions, yet their progression and phenotypic diversity are shaped by a complex interplay of molecular alterations, many of which are reflected in gene expression profiles. A global disruption of transcriptional regulation is now recognized as a hallmark of cancer, reinforcing the concept that malignancy represents a disease of aberrant gene expression.³ Recent pan-cancer studies have shown that changes in promoter activity, rather than overall gene expression, may more accurately capture the transcriptomic reprograming of malignant cells.⁴ Most mammalian protein-coding genes are transcribed from multiple promoters, giving rise to a spectrum of transcript isoforms.⁵ Although 1 dominant transcript typically prevails at each protein-coding locus, many alternative isoforms contribute to transcriptomic diversity without necessarily expanding proteomic output.⁶ Importantly, shifts in transcript isoform usage, independent of total gene expression, have been associated with distinct tumor phenotypes, disease progression, and prognosis across multiple cancer types.^7
-9 While promoter regulation has traditionally been studied through the lens of DNA methylation, emerging evidence highlights additional mechanisms, including G-quadruplex motifs and signatures of accelerated somatic evolution, that influence promoter selection and transcription initiation.^10,11 The transcriptional patterns generated by different promoters and their functional relevance are poorly explored, but it is clear that the relationship between transcript isoform usage and gene expression is far from straightforward.¹²

The non-coding transcriptome has gained prominence as a key regulatory layer in both normal physiology and disease states, including cancer.¹³ The generation of multiple transcript isoforms from a single gene adds regulatory and functional complexity to the genome, with implications for development, stress response, and pathological transformation. Isoforms produced by alternative promoter usage, splicing and polyadenylation in protein-coding genes are of particular interest, as they may be utilized as biomarkers for cancer detection and prognosis, or as therapeutic targets, especially once their oncogenic or tumor-suppressive properties are validated. Those molecular mechanisms not only generate multiple protein-coding isoforms but also contributes to the expression of long non-coding RNAs (lncRNAs) and dual-function coding/non-coding RNAs (cncRNAs). These RNA isoforms, characterized by rapid evolutionary divergence and cell type specificity, regulate diverse biological processes including differentiation, development, and tissue homeostasis.^5,14 These transcripts may have potential regulatory roles, supported by characteristics including translation into truncated proteins, predicted exosomal localization, and enrichment in repetitive elements.^15
-18 According to Ensembl annotation, each protein-coding gene is represented by 1 canonical transcript isoform, designated as the Ensembl canonical or MANE (Matched Annotation from NCBI and EMBL-EBI) Select transcript, which corresponds to the principal RNA species most consistently expressed across tissues and best supported by experimental evidence. All other isoforms are considered non-canonical transcripts, differing from the canonical form in their primary sequence due to alternative promoter usage, splicing, or polyadenylation. Based on coding potential, Ensembl classifies transcripts as coding if they give rise to a translated protein product, and non-coding if no translation occurs. Proteins translated from alternative transcript isoforms may be full-length or truncated relative to the canonical protein.¹⁹

Despite advances in RNA sequencing technologies, most transcriptomic analyses have predominantly focused on gene-level expression, often overlooking isoform-specific regulation. Consequently, a gap remains in our understanding how transcript-level dynamics contribute to disease biology of cancer. To address this gap, the present study compared transcriptomic profiles from spheroid-cultured colon cancer cell lines and patient-derived tumor tissues. The primary aim was to identify transcript isoforms derived from protein-coding genes that may have a role during the malignant transformation of gut epithelium and the initiation and progression of colon cancer.

Methods

The reporting of this study conforms to the STROBE statement (Supplemental File 1).²⁰

Cell Lines

Several human immortalized cell lines originating from the colon tissue were used in this study: the immortalized epithelial cell line HCEC-1CT (Evercyte, CkHT-039-0229) and malignant cell lines representing different stages of colon cancer – HCT116 (ATCC, CCL-247; early-stage primary tumor), DLD-1 (ATCC, CCL-221; late-stage primary tumor), and SW620 (ATCC, CCL-227; model of metastatic colon cancer). All cell lines were maintained at 37°C and 5% CO₂ in Dulbecco’s Modified Eagle’s – Medium (DMEM; Capricorn Scientific, Germany) supplemented with 10% fetal bovine serum (FBS; Capricorn Scientific, Germany) and 1% antibiotic/antimycotic solution (Capricorn Scientific, Germany). Cell morphology was routinely monitored by microscopy, and mycoplasma contamination was evaluated by polymerase chain reaction (PCR). Additionally, cell lines were cultured as three-dimensional (3D) spheroids. For spheroid generation, adherent cells were detached using 1× trypsin/EDTA (Capricorn Scientific, Germany) and counted with a standard hemocytometer. Approximately 2 × 10⁵ cells per well were seeded into 24-well Nunclon™ Sphera™ plates (Thermo Fisher Scientific, USA), which feature a low-attachment surface, in 1 mL of DMEM supplemented with 10% FBS and 1% antibiotic/antimycotic solution. Spheroids were maintained for 7 days in a humidified incubator at 37°C with 5% CO₂. Compact, debris-free spheroids were collected under a microscope for subsequent total RNA extraction.

RNA Extraction and Sequencing

Total RNA was isolated from spheroids cultured in two 24-well plates using PureLink™ RNA Mini Kit (Thermo Fisher Scientific, USA) according to the manufacturer’s protocol. RNA concentration and purity were assessed by absorption at 260 and 280 nm using BioSpec-nano spectrophotometer (Shimadzu, Japan).

High-throughput next generation RNA sequencing was performed by Novogene (UK) Company Limited (Cambridge, United Kingdom). Prior to library preparation, total RNA was subjected to quality check (QC) using a combination of 1% agarose gel electrophoresis (for RNA integrity), NanoDrop spectrophotometry (for RNA quantity and purity), and Agilent2100 Bioanalyzer (for RNA Integrating Number, RIN).

Ribosomal RNA was depleted during library preparation to enrich for both coding and non-coding transcripts. Sequencing was performed using Illumina’s NovaSeq6000 platform that generated paired-end reads of 150 bp.

Bioinformatics and In Silico Analyses

RNA-seq data were processed using Novogene’s validated analysis pipeline, which included read quality filtering, alignment to the human reference genome, and transcript quantification. Final gene expression levels were reported as FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Transcripts with FPKM values below 0.3 were considered not expressed.²¹ Post-NGS analysis was performed by an in-house Python script, and file preparation was conducted by filtration in Excel using built-in functions. To compare our RNA-seq results from colon spheroids with patient-derived colon tissue samples, we analyzed the published data from a study by Demircioğlu et al.⁴

Transcript expression level in colon samples was retrieved from the UCSC Xena Functional Genomics Explorer using TCGA TARGET GTEx dataset (Toil recompute; dataset ID: TcgaTargetGtex_rsem_transcript_fpkm), while colon cancer stages data were retrieved from TCGA Pan-Cancer (PANCAN) dataset (TcgaPancancer_rsem_transcript_fpkm).²² Transcript expression levels were quantified in FPKM units, and statistical comparisons was performed between TCGA primary tumor (288 samples), TCGA adjacent non-tumor tissue (41 samples) and GTEx normal tissue (308 samples) using Kruskal-Wallis test followed with Dunn’s multiple comparisons test. For colon cancer stages, statistical comparison was performed between stage I (44 samples), stage II (109 samples), stage III (80 samples), and stage IV (40 samples) using Kruskal-Wallis test followed with Dunn’s multiple comparisons test. Differences were considered statistically significant if P < .05.

To further characterize dysregulated candidate transcripts and predict their potential roles in malignant transformation, several in silico tools were utilized. The predictions of the 3D structure of the truncated and full-length protein isoforms, translated from candidate transcripts, were analyzed using UCSF ChimeraX (version 1.10.1).²³ Protein Data Bank (PDB) files were retrieved from AlphaFold for every pair of canonical/non-canonical isoform, aligned and root mean square deviation (RMSD) values were computed.²⁴ A Sashimi plot of the Binary Alignment Map (BAM) files was generated using the Integrative Genomics Viewer (IGV). The presence of repeat elements was analyzed by the AnnoLnc2 and the Censor tools, while prediction of subcellular localization was assessed by lncLocator and iLoc-LncRNA tools.^25
-28 The Internal Ribosome Entry Sites (IRES) within the 5′ untranslated regions (5′UTRs) were predicted using IRESite.²⁹ RNA secondary structure stability was estimated using RNAfold by calculating the Minimum Free Energy (MFE).³⁰

To evaluate the suitability of dysregulated transcripts as candidate biomarkers or therapeutic targets, expression thresholds were applied to identify cancer-specific expression profiles. Transcripts were classified as strong biomarker candidates if their expression was undetectable in the non-malignant colon epithelial cell line HCEC-1CT (FPKM < 0.3) and markedly elevated in colon cancer cell lines (FPKM > 5). Conversely, transcripts that may exert therapeutic effects were required to exhibit robust expression in HCEC-1CT (FPKM > 5) and low or undetectable expression in cancer cell lines (FPKM < 0.3).

Results

Transcriptomic data obtained by the analysis of spheroid-cultured colon cell lines were deposited in the GEO database (accession number GSE291181). A total of 53.434 transcripts were detected in the immortalized cell line from the gut mucosa, HCEC-1CT (Figure 1A). An increased number of transcripts was detected in the cell lines that served as the model of early- and late-stage primary tumor – 55.095 in HCT116 (primary tumor, Dukes’ A) and 61.626 in DLD1 (primary tumor, Dukes’ C). A total of 53.876 transcripts were detected in the cell line SW620, which served as a model of metastatic colon cancer. The number of detected transcripts in SW620 was similar to that in HCEC-1CT, but one-third of the detected transcripts were different in metastatic in comparison to the normal gut mucosa cell line. To better visualize transcriptome-wide expression patterns, we compared FPKM distributions across the 4 analyzed cell lines (Figure 1B). Tumor cell lines DLD1, HCT116 and SW620 have a slightly higher fraction of transcripts expressed at moderate-to-high levels (log10(FPKM + 1) > 1) compared to HCEC-1CT (HCEC-1CT: 2.49%, HCT116: 2.99%, DLD1: 3.05%, SW620: 3.03%).

Figure 1.

(A) Distribution of expressed transcripts across analyzed colon cancer cell lines and non-malignant cell line, (B) scatter plot shows log10(FPKM + 1) values for all detected transcripts in analyzed cell lines. DLD1 – primary tumor, Dukes’, and (C) HCEC-1CT – immortalized cell line from the gut mucosa, HCT116- primary tumor, Dukes’ A and SW620 – cell line which served as a model of metastatic colon cancer.

We compared transcriptomic profiles from colon cell lines cultured as spheroids with RNA sequencing data from patient-derived colon tissue samples. This analysis focused on 375 transcripts previously identified as significantly dysregulated in colon cancer (2). Only transcripts exhibiting an opposing expression pattern between the non-malignant colon epithelial cell line (HCEC-1CT) and cancer cell lines (DLD1, SW620, HCT116) were considered dysregulated in colon cancer. This comparative analysis identified 32 transcripts, annotated in the GRCh38 genome assembly, with expression patterns in spheroid-grown cell lines consistent with those observed in patient-derived tissues (Table 1). Among these, 2 transcripts were annotated by Ensembl as canonical isoforms, while the remaining 30 were classified as non-canonical and exhibited differences in their primary structure compared to the canonical forms.

Table 1.

Dysregulated Transcripts in Colon Cancer with Corresponding Expression Profiles in Colon Cell Lines and Patient-derived Tissues. TOP CANDIDATES are Bolded.

ID	Name	Transcript structure and features	Encoded protein	Status in cancer
ENST00000309048	CENPS-201	Canonical isoform	Full-length protein	Up-regulated
ENST00000372486	NTMT1-204	Non-canonical isoform; Contains unique 5′UTR	Full-length protein	Up-regulated
ENST00000422839	SMTN-206	Non-canonical isoform; Contains unique 5′UTR	Truncated protein (37aa vs 915aa canonical)	Up-regulated
ENST00000476901	TRMU-210	Non-canonical isoform; Unique non-coding exons 1 and 2	No protein	Up-regulated
ENST00000330232	ADA2-202	Non-canonical isoform; Unique 5′UTR/exon 1 junction	Truncated protein (270aa vs 511aa canonical)	Down-regulated
ENST00000395499	AGPAT1-206	Non-canonical isoform; Unique 5′UTR	Full-length protein	Down-regulated
ENST00000312033	AKNA-203	Non-canonical isoform; Unique 5′UTR and 3′UTR	Truncated protein (831aa vs 1439aa canonical)	Down-regulated
ENST00000374088	AKNA-206	Canonical isoform	Full-length protein	Down-regulated
ENST00000309334	ARID5B-202	Non-canonical isoform; Unique 5′UTR	Truncated protein (945aa vs 1188aa canonical)	Down-regulated
ENST00000568816	BLOC1S6-218	Non-canonical isoform; Unique 5′UTR	Truncated protein (75aa vs 172aa canonical)	Down-regulated
ENST00000371834	BRD3-202	Non-canonical isoform; Unique 5′UTR and 3′UTR	Truncated protein (556aa vs 726aa canonical)	Down-regulated
ENST00000546079	CLPTM1-203	Non-canonical isoform; Unique 5′UTR	Truncated protein (567aa vs 669aa canonical)	Down-regulated
ENST00000361740	CYB5R3-202	Non-canonical isoform; Unique coding exon 6	Extended protein (345aa vs 301aa canonical)	Down-regulated
ENST00000409567	DCTN1-205	Non-canonical isoform; Unique 3′ transcript end	Truncated protein (1253aa vs 1278aa canonical)	Down-regulated
ENST00000429013	DNPEP-206	Non-canonical isoform; Unique 5′UTR	Truncated protein (273aa vs 485aa canonical)	Down-regulated
ENST00000430206	DNPEP-207	Non-canonical isoform; Unique 5′UTR	Truncated protein (117aa vs 485aa canonical)	Down-regulated
ENST00000596558	FKBP8-206	Non-canonical isoform; Unique 5′UTR	Full-length protein	Down-regulated
ENST00000399494	HMGB1-205	Non-canonical isoform; Unique 5′UTR	Full-length protein	Down-regulated
ENST00000533926	NUCB2-217	Non-canonical isoform; Unique 3′UTR	Truncated protein (90aa vs 420aa canonical)	Down-regulated
ENST00000547014	NUDT4-204	Non-canonical isoform; Unique 5′UTR and 3′UTR	Truncated protein (129aa vs 180aa canonical)	Down-regulated
ENST00000418087	PLD1-203	Non-canonical isoform; Unique 5′UTR and 5′UTR/exon 1 junction	Truncated protein (141aa vs 1074aa canonical)	Down-regulated
ENST00000457586	PRNP-204	Non-canonical isoform; Unique 5′UTR	Full-length protein	Down-regulated
ENST00000423034	SEPTIN9-202	Non-canonical isoform; Unique 5′UTR	Truncated protein (579aa vs 586aa canonical)	Down-regulated
ENST00000533263	SLC43A1-208	Non-canonical isoform; Unique 5′UTR and 3′UTR	Truncated protein (110aa vs 559aa canonical)	Down-regulated
ENST00000510283	SPAG9-209	Non-canonical isoform; Unique 5′UTR	Truncated protein (1177aa vs 1321aa canonical)	Down-regulated
ENST00000502404	UBE2D3-211	Non-canonical isoform; Unique 5′UTR and 3′UTR	Truncated protein (118aa vs 147aa canonical)	Down-regulated
ENST00000473669	ADD3-208	Non-canonical isoform; Unique non-coding exon 1/exon 2 junction	No protein	Down-regulated
ENST00000522343	ASPH-222	Non-canonical isoform; Unique non-coding exon 2	No protein	Down-regulated
ENST00000528748	CDC27-206	Non-canonical isoform; Unique non-coding exon 1	No protein	Down-regulated
ENST00000488058	RNF38-208	Non-canonical isoform; Unique non-coding exon 1 and exon 2	No protein	Down-regulated
ENST00000549713	SLC41A2-210	Non-canonical isoform; Unique non-coding exon 1	No protein	Down-regulated
ENST00000534647	Novel transcript	Non-canonical isoform; Unique non-coding exons 1,2,4,6-11	No protein	Down-regulated

Further in silico analyses were performed to investigate the potential regulatory functions of the 32 transcripts, revealing that all exhibit at least 1 characteristic feature associated with regulatory RNA molecules such as encoding truncated proteins, localization to exosomes, or the presence of repetitive elements (Tables 1 and 2). Of these, only 2 transcripts (CENPS-201 and AKNA-206) are annotated as canonical isoforms by Ensembl and encode full-length proteins. CENPS-201 is predicted by lncLocator to localize to exosomes, while AKNA-206 harbors repetitive elements predicted by Censor, both traits commonly linked to regulatory functions of RNA molecules. The remaining 30 transcripts are classified as non-canonical isoforms, distinguished by unique transcript features including alternative 5′ UTRs or 3′UTRs and alternatively spliced exons. These structural variations contribute to distinct primary transcript architectures that may influence transcript stability, localization, and functional output. Among the non-canonical isoforms, five are predicted to encode full-length proteins, albeit with altered 5′ UTRs compared to their canonical counterparts, which could affect their translational regulation. The other 25 non-canonical isoforms encode either truncated proteins or are not translated into protein sequence.

Table 2.

Results of In silico Analysis of Repeat Elements and Subcellular Localization Performed for Transcripts Dysregulated in Colon Cancer Cell Lines and Patient-derived tissues. TOP CANDIDATES are Bolded.

Name	Predicted repeat elements by AnnoLnc2/Censor	Predicted localization by lncLocator/iLoc-LncRNA
CENPS-201	None/None	Exosome/Cytoplasm
NTMT1-204	SINE-MIR/None	Exosome/Exosome
SMTN-206	None/SINE-MIR3	Cytoplasm/Cytoplasm
TRMU-210	LTR-ERV1/LTR-ERV1	Cytosol/Nucleus
ADA2-202	LINE-L2/SINE-Alu; LINE-L2	Cytosol/Ribosome
AGPAT1-206	None/None	Cytoplasm/Cytoplasm
AKNA-203	None/SINE-Alu	Cytosol/Cytosol
AKNA-206	None/SINE-Alu; LTR-ERV3	Cytosol/Cytosol
ARID5B-202	None/None	Cytoplasm/Nucleus
BLOC1S6-218	SINE-Alu; hAT-Charlie; Simple_repeat ((TTTGC)n)/ SINE-Alu; LINE-L1	Nucleus/Nucleus
BRD3-202	None/None	Exosome/Cytoplasm
CLPTM1-203	None/hAT-MER97A	Cytosol/Exosome
CYB5R3-202	None/SINE-Alu; LINE-L2	Cytosol/Cytosol
DCTN1-205	LINE-L2/None	Cytosol/Cytosol
DNPEP-206	None/None	Cytoplasm/Nucleus
DNPEP-207	None/None	Cytoplasm/Exosome
FKBP8-206	None/None	Exosome/Nucleus
HMGB1-205	None/None	Nucleus/Cytoplasm
NUCB2-217	SINE-MIR/SINE-MIR	Cytoplasm/Cytoplasm
NUDT4-204	Simple_repeat ((CCCT)n)/None	Cytosol/Cytosol
PLD1-203	None/None	Cytoplasm/Nucleus
PRNP-204	Simple_repeat ((TA)n)/None	Cytosol/Exosome
SEPTIN9-202	Simple_repeat ((GTGT)n and ((GCCCGG)n); Low_complexity (G-rich)/None	Cytosol/Nucleus
SLC43A1-208	None/None	Exosome/Ribosome
SPAG9-209	None/None	Cytoplasm/Cytoplasm
UBE2D3-211	None/None	Cytoplasm/Cytoplasm
ADD3-208	None/None	Cytoplasm/Nucleus
ASPH-222	None/None	Cytoplasm/Ribosome
CDC27-206	None/None	Nucleus/Nucleus
RNF38-208	None/None	Cytoplasm/Ribosome
SLC41A2-210	None/None	Nucleus/Cytoplasm
Novel transcript ENST00000534647	LINE-L1/None	Cytosol/Cytosol

Predicted 3D structures of full-length and truncated protein isoforms, together with their structural superpositions, were presented in Figure 2. To assess the structural similarity between canonical and truncated isoforms, we performed sequence-guided structural alignments and calculated RMSD values for each pair (Figure 2D). Several truncated isoforms showed strong overlap with the corresponding regions of their full-length proteins, including SMTN-201 versus SMTN-206 (RMSD = 1.17 Å), DNPEP-201 versus DNPEP-206 (0.49 Å), and UBE2D3-210 versus UBE2D3-211 (0.24 Å). Similarly, BLOC1S6-201 versus BLOC1S6-218 (2.17 Å) and DNPEP-201 versus DNPEP-207 (0.37 Å) also displayed low RMSD values, consistent with preservation of local folding. However, even in these cases, the similarity applies only to the overlapping regions: truncated isoforms often lack entire domains that are present in the canonical protein, resulting in proteins that are structurally incomplete despite retaining a nearly identical fold in the preserved segment. In contrast, other isoforms exhibited markedly higher RMSD values, such as NCUB2-207 versus NCUB2-217 (31.46 Å), PLD1-201 versus PLD1-203 (23.67 Å), and SLC43A1-201 versus SLC43A1-208 (11.48 Å), reflecting substantial conformational divergence relative to the canonical isoforms.

Figure 2.

Structural comparison of full-length and truncated protein isoforms: (A) full-length protein isoforms, (B) truncated protein isoforms, (C) aligned superpositions, and (D) RMSD values (Å) between Cα atoms of full-length and truncated isoform pairs.

To identify cancer-specific expression profiles and evaluate the potential of dysregulated transcripts as biomarkers or therapeutic targets, we applied expression thresholds, designating transcripts as strong biomarker candidates if they were undetectable in the non-malignant HCEC-1CT cell line (FPKM < 0.3) and highly expressed in colon cancer cell lines (FPKM > 5), whereas transcripts with potential therapeutic effects were defined by robust expression in HCEC-1CT (FPKM > 5) and low or absent expression in cancer cell lines (FPKM < 0.3). Based on these criteria, NTMT1-204 (up-regulated), and BLOC1S6-218 and DCTN1-205 (down-regulated), emerged as the most promising candidates. Expression patterns of the 32 transcript candidates were further assessed using publicly available patient-derived RNA-seq data from the UCSC Xena Browser, which integrates TCGA colon tumor tissues, TCGA matched adjacent non-tumor tissues, and GTEx normal colon samples (Supplemental File 2).

NTMT1-204 was exclusively expressed in malignant cell lines (FPKM = 0 in HCEC-1CT; FPKM > 5 in DLD1, SW620, and HCT116). This observation was corroborated by publicly available patient data: in the UCSC Xena Browser, NTMT1-204 was expressed approximately sixfold higher in colon tumors compared to matched adjacent tissue in the TCGA dataset (p = 5.203 × 10⁻¹⁹), and more than 250-fold higher relative to healthy colon tissues from the GTEx dataset (p = 4.734 × 10⁻¹⁰²; Figure 3A). Also, comparison of NTMT1-204 expression across 4 tumor stages revealed that transcript expression remained elevated throughout disease progression, with significant difference observed between stage II and stage IV (P = .04; Figure 3B). This transcript encodes a full-length protein isoform (UniProt ID: Q9BV86-1) and features a unique 5′UTR composed of a non-coding exon 1 and a partially coding exon 2 (Figure 4A), implicating the usage of tumor-specific splicing events (Figure 5A). RNA secondary structure prediction using RNAfold revealed that the 5′UTR of NTMT1-204 has a minimum free energy (MFE) 68 kcal/mol lower than that of the major transcript NTMT1-203, indicating a more thermodynamically stable structure. A subsequent BLASTN search against experimentally validated internal ribosome entry sites yielded only short (11-12 nucleotide) matches with high E-values, suggesting no significant IRES elements in the 5′UTR of NTMT1-204.

Figure 3.

(A) Expression level of selected dysregulated transcripts NTMT1-204, BLOC1S6-218 and DCTN1-205 using patient data retrieved from UCSC Xena browser (n TCGA Tumor = 288, n TCGA Adjacent Non-Tumor Tissue = 41, n GTEx = 308). Data are presented as mean ± standard deviation (SD). ****P < .0001, n-number of samples. (B) Expression level of NTMT1-204 across colorectal cancer stages I-IV using patient data retrieved from UCSC Xena browser (n Stage I = 44, n Stage II = 109, n Stage III = 80, n Stage IV = 40). **P < .01, n-number of samples. Data are derived from TCGA bulk RNA-seq, and observed expression patterns may be influenced by tumor purity and the presence of non-epithelial cell populations.

Figure 4.

Transcript structures of selected differentially expressed transcripts. (A) NTMT1-204, (B) BLOC1S6-218, and (C) DCTN1-205. Transparent boxes represent non-coding exons/part of the exon, colored boxes represent coding exons and lines connecting boxes represent introns. Red arrows point to characteristic transcript isoform sequences. Modified from Ensembl Genome Browser (release 114).

Figure 5.

Sashimi plots showing alternative splicing patterns in the genomic regions of the selected dysregulated transcripts. (A) NTMT1 gene loci, (B) BLOC1S6 gene loci, and (C) DCTN1 gene loci. Exon expression is shown by read coverage histograms, while splice junction support is indicated by arcs annotated with junction read counts. Alternative exon junctions have been marked with red asterisk (*).

BLOC1S6-218, in contrast, showed strong expression in the non-malignant HCEC-1CT cell line (FPKM = 20) while being present in trace amounts in cancer cell lines (FPKM < 0.3). This expression pattern was not confirmed using patient data retrieved from the UCSC Xena browser (Figure 3A). It contains a unique 5′UTR due to the inclusion of a non-coding exon 3 (Figures 4B and 5B) and encodes a truncated 75 aa isoform, substantially shorter than the canonical 172 aa BLOC1S6 protein. The altered 5′UTR, low coding potential, nuclear localization, and presence of SINE/Alu elements suggest potential regulatory roles for this isoform.

DCTN1-205 was robustly expressed in HCEC-1CT (FPKM = 22) but was not detected in any malignant cell lines (FPKM = 0). This expression pattern was not confirmed using patient data retrieved from the UCSC Xena browser, although there is a trend of overexpression in non-malignant tissue (Figure 3A). Its structure diverges from canonical DCTN1 transcripts due to exon skipping and use of alternative splice sites, producing a distinct exon combination (Figures 4C and 5C). Also, it lacks a 3′UTR compared to the canonical isoform, making the transcript end a unique feature of this isoform. This isoform encodes a 1253 amino acid protein, slightly shorter than the canonical 1278 aa isoform. According to the UniProt, the DCTN1-205-derived protein lacks amino acid segments 132 to 151 and 1066 to 1070 relative to the full-length isoform, changes that could affect its structural integrity or interaction with components of the dynactin-dynein complex.

Building on previous patient-based studies that inferred promoter activity from transcript-level expression, we investigated whether similar transcript isoform shifts, potentially reflecting alternative promoter usage, occur between non-malignant and malignant colon cell lines. In our study, we analyzed transcriptomic data from a non-malignant colon mucosa cell line (HCEC-1CT) and 3 colon cancer cell lines (HCT116, DLD1, and SW620) to identify analogous patterns of transcript-level regulation. Specifically, we filtered for transcripts that showed at least a threefold change in expression between cancer cell lines and the non-malignant cell line. We focused on genes exhibiting at least 1 up-regulated and 1 down-regulated transcript isoform, consistent with a potential switch in promoter usage. This approach identified 2768 up-regulated and 2003 down-regulated transcripts arising from a total of 1338 genes, each exhibiting multiple transcript isoforms with differential expression between malignant and non-malignant conditions. However, among the genes previously reported to exhibit promoter switching in tumor and non-tumor tissues from colon cancer patients (including PHF19, PRKAR1B, CD81, and MCF2), such isoform switching patterns were not recapitulated in the transcriptomic profiles of the analyzed colon cancer cell lines.⁴

Discussion

To comprehensively understand the molecular complexity underlying colon cancer, it is essential to move beyond gene-level analyses and interrogate transcript-level regulatory events, including alternative promoter usage and non-canonical splicing. Building on the growing evidence that such mechanisms generate functionally distinct RNA isoforms with roles in tumor biology, this study set out to determine whether the transcriptomic features previously identified in patient-derived colon tumors, particularly those involving coding–noncoding duality and isoform-specific regulation, could be mirrored in spheroid-cultured colon cancer cell lines. Through direct comparison with non-malignant gut epithelial cells, we aimed to identify transcript isoforms derived from protein-coding genes that may exert regulatory functions during the malignant transformation of gut epithelium, offering insight into transcriptome remodeling events that drive colon cancer onset and progression.

The number of expressed transcripts progressively increased from non-malignant to malignant cell lines, consistent with previous observations that cancer cells often exhibit a more complex and dysregulated transcriptome.^31,32 In particular, the transition from HCEC-1CT (53 434 transcripts) to early-stage HCT116 (55 095 transcripts) and late-stage DLD1 (61 626 transcripts) models suggests that transcriptomic complexity correlates with tumor stage. A recent study analyzed single-cell RNA sequencing data from esophageal cancer and glioblastoma tissues, comparing them to normal tissues, where a significant number of novel transcripts, including splicing variants and non-coding RNAs, were present in cancer cells but not in normal tissues.³³ Interestingly, the metastatic SW620 cell line, despite having a transcript count similar to the non-malignant HCEC-1CT (53 876 transcripts), displayed a markedly different transcriptome composition. Approximately one-third of the transcripts expressed in SW620 were distinct from those in HCEC-1CT, indicating a shift in transcriptional programs associated with metastasis. This may be explained by the rewiring of gene expression programs during metastasis. A study demonstrated that liver-metastatic CRC cells acquire a reshaped epigenetic landscape, resulting in reprogramed, tissue-specific transcription.³⁴ These findings support the hypothesis that transcriptome diversification is a hallmark of tumor progression, driven mostly by mechanisms such as alternative transcription initiation and aberrant splicing, which generate cancer-specific transcript isoforms.^35,36 The fact that significant transcriptomic shifts were detected in the absence of changes in total transcript count (as seen in SW620) emphasizes the importance of isoform-level analysis.³⁷ This supports growing recognition of non-canonical transcript isoforms as key players in tumor biology, including colon cancer, with potential implications for biomarker discovery and therapeutic targeting.^38,39

The comparative transcriptomic analysis between spheroid-cultured colon cancer cell lines and patient-derived tumor tissues highlights a reproducible set of dysregulated transcript isoforms with potential roles in colon carcinogenesis. Out of 375 transcripts previously identified as significantly dysregulated in colon cancer,⁴ 32 showed consistent expression patterns across both patient tissues and cell line models, reinforcing their possible biological relevance and robustness across experimental systems. The majority of analyzed transcripts (30 out of 32) were classified as non-canonical isoforms, highlighting the transcriptomic complexity underlying cancer-specific gene regulation. These isoforms are characterized by unique 5′ and/or 3′UTRs and alternative splicing events, which can profoundly alter RNA fate, affecting stability, localization, and translational efficiency.^40
-42 Among these, 5 non-canonical transcripts retain full-length coding potential but differ from their canonical counterparts in their 5′ UTRs, potentially modulating translation initiation and interaction with RNA-binding proteins.⁴³ In contrast, the remaining 25 non-canonical isoforms either encode truncated proteins or lack coding potential altogether, suggesting that differential promoter usage and splicing may produce functional diversity or regulatory decoys during tumor development.^44,45 The comparative structural analysis highlights 2 distinct classes of protein isoforms encoded by dysregulated transcript isoforms. Protein isoforms with very low RMSD (<2 Å) such as DNPEP, SMTN, UBE2D3, and BLOC1S6 demonstrate that truncation can preserve the secondary and tertiary structure of the aligned region. However, these proteins remain incomplete: the absence of domains present in the canonical proteins indicates that, despite local structural similarity, truncated isoforms may lose essential binding motifs or regulatory elements. This raises the possibility that they could function in dominant-negative or regulatory roles by partially retaining fold integrity while lacking full-length functionality. By contrast, isoforms with high RMSD (>10 Å) such as NCUB2-217, PLD1-203, and SLC43A1-208 show extensive divergence from canonical folding, consistent with more profound alterations of protein architecture and potentially novel functions. Together, these findings emphasize that RMSD must be interpreted in the context of truncation: small values reflect similarity in preserved segments, but cannot capture the biological consequences of missing domains. Truncated proteins may exert dominant-negative effects or participate in signaling networks with altered specificity. A recent study showed that alternative splicing of ASPP2 leads to the generation of a truncated isoform (ASPP2κ), which lacks the C-terminal p53-binding domain and acts in a dominant-negative manner, promoting cellular migration and therapy resistance in soft tissue sarcoma.⁴⁶

Collectively, these observations suggest that isoforms previously classified based solely on coding potential may engage in diverse functional roles, including those traditionally ascribed to non-coding RNAs, thereby underscoring the blurred boundaries between protein-coding and regulatory transcript functions. Many transcripts previously annotated as non-coding contain short open reading frames (sORFs), which have been shown to encode functional micropeptides, a class of small proteins typically fewer than 100 amino acids in length. These micropeptides have been implicated in various cellular processes, including signal transduction, cytoskeletal organization, and modulation of the tumor microenvironment, and are increasingly recognized for their roles in cancer development and progression.⁴⁷ Conversely, transcripts annotated as protein-coding may serve non-coding regulatory functions, such as scaffolding RNA-binding proteins and sequestering microRNAs (acting as competing endogenous RNAs or “sponges”) enabled by structural alterations that facilitate roles in malignant transformation.^48,49 Recent studies have also demonstrated that mRNAs can act as decoys for transcription factors, modulate translation independently of their coding sequence, or influence RNA granule dynamics.^50
-52 These observations have contributed to the emergence of a new conceptual framework of coding-noncoding duality, in which a single RNA molecule may serve both protein-coding and regulatory roles.⁵³ This paradigm blurs the classical binary distinction between coding and non-coding RNAs, emphasizing the need for functional validation beyond computational annotation.

The convergence of the transcriptomic alterations, such as truncation, retained introns, altered UTRs, and divergence from coding annotations, aligns with the growing recognition that cancer involves not only gene-level dysregulation but also isoform-specific reprograming. Transcript variants derived from alternative promoter usage or non-canonical splicing can modulate gene output and functional phenotype without necessarily altering total gene expression levels, a phenomenon increasingly linked to tumor progression and therapeutic resistance.⁵⁴

Based on our transcriptomic data, NTMT1-204 emerged as the most promising biomarker candidate for colon cancer detection, a finding that was further supported by patient-derived data available through the UCSC Xena platform. The consistent upregulation of NTMT1-204 from early to advanced CRC stages, with the highest expression at stage IV, indicates that this isoform may be involved in tumor development and could warrant further evaluation as a potential biomarker of disease progression. Structurally, this isoform features a unique 5′UTR pointing to tumor-specific splicing and/or alternative promoter usage.^44,55,56 The RNA secondary structure analysis revealed that the 5′UTR of NTMT1-204 is considerably more thermodynamically stable than that of the major NTMT1-203 isoform. Increased structural stability of the 5′UTR may influence translational efficiency, RNA half-life, subcellular localization, or interaction with RNA-binding proteins and microRNAs.^42,57,58 Further evidence supporting the potential regulatory role of NTMT1-204 includes its predicted localization to exosomes, as well as the presence of repetitive SINE/MIR elements within its sequence. Exosomal localization is a hallmark of many regulatory RNAs, which are selectively packaged and secreted via extracellular vesicles to mediate intercellular communication and modulate gene expression in recipient cells. It has been shown that M2 tumor-associated macrophage–derived exosomes promote gastric cancer progression by transferring MALAT1, which enhances aerobic glycolysis, metastasis, and chemoresistance in gastric cancer.⁵⁹ Exosomal RNAs, due to their remarkable stability in bodily fluids and selective enrichment in tumor-derived vesicles, hold strong potential as non-invasive biomarkers for early cancer detection, prognosis, and treatment monitoring.⁶⁰ Repetitive elements, such as SINE sequences, play a pivotal role in transcriptional regulation by serving as cis-regulatory elements, influencing promoter activity, alternative splicing, and epigenetic modifications, thereby modulating gene expression programs in both physiological and pathological contexts. SINE/MIR elements have been implicated in the regulation of transcript expression by functioning as enhancers and providing transcription factor binding sites, with their presence in active regulatory regions across tissues suggesting a role in maintaining normal gene expression; when embedded in tumor-suppressive transcripts, MIRs may help sustain their expression, and their loss or silencing in tumors could disrupt these regulatory inputs, contributing to tumorigenesis.^61,62 Conversely, BLOC1S6-218 and DCTN1-205 exhibited the opposite expression pattern, with high expression in the non-malignant HCEC-1CT cell line and minimal or undetectable levels in cancer cell lines, suggesting their strong potential as therapeutic targets. BLOC1S6-218 contains an additionally spliced exon in its 5′UTR, which could enable tumor-suppressive regulatory functions such as impairing translation initiation through complex RNA secondary structure, sequestering oncogenic RNA-binding proteins or microRNAs, or facilitating subcellular localization where it may act as a decoy RNA or modulator of signaling pathways involved in tumor-associated molecular pathways.⁴² An additional indication of BLOC1S6-218’s potential regulatory role is the presence of repeat elements, including SINE/Alu elements. A recent study demonstrated that Alu retrotransposons can be transcribed into non-coding RNAs, which repress pluripotency genes NANOG and OCT4 expression, suggesting a regulatory mechanism with potential relevance to both normal and tumor cell growth.⁶³ This may suggest that Alu-containing transcripts may produce non-coding RNAs with distinct tumor-suppressive functions by regulating key signaling pathways involved in cancer development. Also, nuclear localization suggests a predominantly regulatory rather than translational function. There is also a connection between Alu elements and nuclear localization, both features of this transcript, as the SIRLOIN motif, a sequence element derived from SINE/Alu repeats, has been identified as a signal for the nuclear retention of lncRNAs.⁶⁴ DCTN1-205 harbors both a unique 5′UTR and a distinct exon composition resulting from exon skipping, which leads to the loss of specific amino acid segments. These deletions may alter the isoform’s ability to associate with the dynactin-dynein complex, potentially impairing its function in intracellular transport and cytoskeletal organization. A previous study demonstrated that the DCTN1B protein isoform, which lacks certain amino acid sequences, shows diminished microtubule-binding capacity and impaired dynein-driven motility, indicating that alternative splicing events removing key structural domains can profoundly affect the structural and functional integrity of the dynactin-dynein complex.⁶⁵ The apparent discrepancy between cell line expression data and patient-derived expression data for BLOC1S6-218 and DCTN1-205 can likely be attributed to several biological and technical factors. First, cell lines and patient tissues represent inherently different biological contexts. Cell lines, even when cultured as spheroids, lack stromal, immune, and microenvironmental components that are integral to the in vivo tumor niche. Such factors can strongly influence transcript expression patterns, particularly for non-canonical isoforms with putative regulatory functions. The lack of confirmation in patient data from UCSC Xena may therefore reflect the contribution of additional cellular compartments or regulatory interactions present in vivo but absent in monocultures of epithelial cells. Also, data type and depth of sequencing may play a role. Cell line spheroid RNA-seq was generated at high resolution and may capture rare or cell type–restricted isoforms, whereas patient bulk RNA-seq data represent heterogeneous mixtures of malignant epithelial, stromal, and immune cells. This heterogeneity can obscure the detection of low-abundance isoforms such as BLOC1S6-218 or DCTN1-205, leading to apparent discrepancies. However, there is the observed trend toward overexpression of DCTN1-205 in non-malignant tissue, although not reaching strong statistical confirmation in Xena. For BLOC1S6-218, the strong expression in non-malignant HCEC-1CT cells but not in tumors may indicate that its loss is a feature of malignant transformation, while residual expression in heterogeneous patient samples could be masked by dilution from non-epithelial cell populations. It is important to acknowledge that spheroid models, while more physiologically relevant than monolayer cultures, still represent a simplified system composed exclusively of epithelial cells. They lack the stromal, microenvironmental, and immune infiltrating cells that are present in patient-derived tissues and known to strongly influence gene expression programs. This limitation may explain why 343 of the transcripts identified as dysregulated in patient samples did not display concordant expression patterns in colon epithelial spheroids. The absence of these additional cellular compartments could mask or eliminate context-dependent regulation, leading to discrepancies between in vitro and in vivo datasets.

Transcript isoform switching is increasingly recognized as a hallmark of cancer-associated transcriptomic reprograming, often reflecting alternative promoter usage or splicing events that can modulate gene output without altering total gene expression. Our previous analysis revealed differential promoter activity and isoform expression, highlighting the role of promoter and transcript switching in malignancy. The non-canonical isoform SMAD4-209 encodes a full-length protein that may support SMAD4 function and cellular homeostasis, while SMAD4-213 encodes a truncated protein potentially interfering with TGF-β signaling. Additionally, SMAD4-213 may act at the regulatory level through miRNA and RBP interactions, pointing to a possible dual role (coding/non-coding).⁶⁶

Building upon prior studies that inferred promoter activity in colon cancer from transcript expression patterns in patient-derived tissues, we investigated whether similar regulatory dynamics could be recapitulated in cell line models of malignant transformation. Our approach, focusing on genes with transcript isoforms displaying opposing expression trends between malignant and non-malignant colon epithelial cell lines, revealed extensive isoform-level dysregulation. However, isoform switching events previously identified in colon cancer patient datasets for genes such as PHF19, PRKAR1B, CD81, and MCF2 were not mirrored in our colon cancer spheroid models. One likely explanation is that promoter switching may be strongly influenced by the tumor microenvironment, including stromal interactions, extracellular matrix cues, and infiltrating immune cells, which are absent in simplified epithelial spheroid cultures. In vivo, these additional cellular compartments can contribute to chromatin remodeling, transcription factor availability, or cytokine-driven signaling, all of which may affect promoter choice and isoform usage.⁶⁷ Additionally, epigenetic regulation, such as DNA methylation or histone modification at alternative promoters, may differ between cultured cells and primary tumors, potentially explaining the observed differences.⁶⁸ Also, tumor heterogeneity in patient tissues may drive context-specific promoter usage that cannot be captured by a limited number of cell line models. Collectively, our findings highlight both the promise and limitations of using cancer cell lines to model transcriptomic phenomena observed in patient tissues. While robust isoform-level dysregulation was evident in cell lines, the lack of congruence with promoter-switching events reported in vivo underscores the need for integrative studies combining cell lines, organoids, and patient-derived materials to comprehensively decode transcriptional regulation in colon cancer.

While this study provides an integrative bioinformatics framework for identifying transcript isoforms associated with colon carcinogenesis, experimental validation remains an important next step. In particular, loss- and gain-of-function studies using approaches such as CRISPR knock-out, CRISPR activation, or siRNA-mediated silencing will be essential to elucidate the causal contribution of key isoforms such as NTMT1-204, BLOC1S6-218, and DCTN1-205 to cellular phenotypes including proliferation, apoptosis, migration, and spheroid integrity. Furthermore, functional assays in 3-dimensional spheroid models and, ultimately, in vivo systems will help clarify whether these dysregulated isoforms act as oncogenic drivers, tumor suppressors, or modulators of the tumor microenvironment. Such studies will also be critical to determine their potential as biomarkers or therapeutic targets. Our current findings therefore provide a prioritized list of candidate isoforms and a strong rationale for future mechanistic investigations.

Conclusion

In summary, our transcriptomic analysis revealed substantial isoform-level dysregulation between malignant and non-malignant colon cell lines, echoing patterns observed in colon tumors and underscoring the role of alternative transcription initiation and splicing in cancer progression. Notably, transcript variants NTMT1-204, BLOC1S6-218, and DCTN1-205 exhibit distinct regulatory features and expression profiles, highlighting their potential as tumor biomarkers or therapeutic targets. While certain isoform switching events identified in patient-derived datasets were not fully recapitulated in cell line models, this divergence reinforces the importance of integrating in vitro and in vivo systems to capture the full spectrum of transcriptional alterations in cancer. Our findings emphasize the necessity of isoform-level analyses for advancing precision oncology and suggest that non-canonical transcript isoforms with potential regulatory function, including those with dual coding and non-coding potential, represent a rich yet underexplored layer of gene expression regulation in colon cancer. Further experimental validation of the identified dysregulated transcripts with proposed regulatory roles, particularly their interaction partners, subcellular localization, and functional impact on tumorigenic pathways, will be essential to elucidate their mechanistic roles in colon cancer and to assess their potential utility as biomarkers or therapeutic targets.

Supplemental Material

sj-doc-1-cix-10.1177_11769351251396250 – Supplemental material for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis

Supplemental material, sj-doc-1-cix-10.1177_11769351251396250 for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis by Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic and Aleksandra Nikolic in Cancer Informatics

Supplemental Material

sj-docx-2-cix-10.1177_11769351251396250 – Supplemental material for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis

Supplemental material, sj-docx-2-cix-10.1177_11769351251396250 for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis by Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic and Aleksandra Nikolic in Cancer Informatics

Supplemental Material

sj-xlsx-3-cix-10.1177_11769351251396250 – Supplemental material for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis

Supplemental material, sj-xlsx-3-cix-10.1177_11769351251396250 for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis by Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic and Aleksandra Nikolic in Cancer Informatics

Footnotes

Acknowledgements

The authors used ChatGPT as AI tool for refining/enhancement of language. We confirm that no scientific data has been generated or modified using AI.

ORCID iD

Tamara Babic

Ethical Considerations

Not applicable.

Consent to Participate

Not applicable.

Author Contributions

Tamara Babic: conceptualization, investigation, visualization, writing – original draft, writing – review & editing. Bojana Banovic Djeri: data curation, software, writing – review & editing. Dunja Pavlovic: investigation, visualization, writing – review & editing. Sandra Dragicevic: investigation, writing – review & editing. Jovana Despotovic: investigation, writing – review & editing. Jelena Karanovic: investigation, writing – review & editing. Aleksandra Nikolic: conceptualization, funding acquisition, project administration, supervision, writing – original draft, writing – review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Science Fund of the Republic of Serbia, PROMIS, #6052315, SENSOGENE and IMGGE Annual Research Program for 2025, Ministry of Science, Technological Development and Innovation of the Republic of Serbia, 451-03-136/2025-03/200042.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online.

References

Siegel

Miller

Wagle

Jemal

Cancer statistics, 2023. CA Cancer J Clin. 2023;73(1):17-48.

Markowitz

Bertagnolli

MM.

Molecular origins of cancer: molecular basis of colorectal cancer. N Engl J Med. 2009;361(25):2449-2460.

Bradner

Hnisz

Young

RA.

Transcriptional addiction in cancer. Cells. 2017;168(4):629-643.

Demircioğlu

Cukuroglu

Kindermans

, et al. A pan-cancer transcriptome analysis reveals pervasive regulation through alternative promoters. Cells. 2019;178(6):1465-1477.e17.

Dhamija

Menon

MB.

Non-coding transcript variants of protein-coding genes - what are they good for?

RNA Biol. 2018;15(8):1025-1031.

Gonzàlez-Porta

Frankish

Rung

Harrow

Brazma

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene. Genome Biol. 2013;14(7):R70.

Trincado

Sebestyén

Pagés

Eyras

The prognostic potential of alternative transcript isoforms across human tumors. Genome Med. 2016;8(1):85.

Sebestyén

Zawisza

Eyras

Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res. 2015;43(3):1345-1356.

Erho

Buerki

Triche

Davicioni

Vergara

IA.

Transcriptome-wide detection of differentially expressed coding and non-coding transcripts and their clinical significance in prostate cancer. J Oncol. 2012;2012:541353.

10.

Romano

Di Porzio

Iaccarino

, et al. G-quadruplexes in cancer-related gene promoters: from identification to therapeutic targeting. Expert Opin Ther Pat. 2023;33(11):745-773.

11.

Smith

Yadav

Pedersen

, et al. Signatures of accelerated somatic evolution in gene promoters in multiple cancer types. Nucleic Acids Res. 2015;43(11):5307-5317.

12.

Dolgalev

Poverennaya

Quantitative analysis of isoform switching in cancer. Int J Mol Sci. 2023;24(12):10065. doi: 10.3390/ijms241210065

13.

Diamantopoulos

Tsiakanikas

Scorilas

Non-coding RNAs: the riddle of the transcriptome and their perspectives in cancer. Ann Transl Med. 2018;6(12):241.

14.

Mattick

Amaral

Carninci

, et al. Long non-coding RNAs: definitions, functions, challenges and recommendations. Nat Rev Mol Cell Biol. 2023;24(6):430-447.

15.

Kim

TK.

Regulatory RNA: from molecular insights to therapeutic frontiers. Exp Mol Med. 2024;56(6):1233-1234.

16.

Tants

Schlundt

The role of structure in regulatory RNA elements. Biosci Rep. 2024;44(10):BSR20240139. doi: 10.1042/BSR20240139

17.

Trigiante

Blanes Ruiz

Cerase

Emerging roles of repetitive and repeat-containing RNA in nuclear and chromatin organization and gene expression. Front Cell Dev Biol. 2021;9:735527.

18.

Wang

Horlacher

Cheng

Winther

RNA trafficking and subcellular localization-a review of mechanisms, experimental and predictive methodologies. Brief Bioinform. 2023;24(5):bbad249. doi: 10.1093/bib/bbad249

19.

Harrison

Amode

Austine-Orimoloye

, et al. Ensembl 2024. Nucleic Acids Res. 2024;52(D1):D891-D899.

20.

von Elm

Altman

Egger

Pocock

Gøtzsche

Vandenbroucke

JP.

The strengthening the reporting of observational studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344-349.

21.

Mortazavi

Williams

McCue

Schaeffer

Wold

Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621-628.

22.

Goldman

Craft

Hastie

, et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol. 2020;38(6):675-678.

23.

Meng

Goddard

Pettersen

, et al. UCSF ChimeraX: tools for structure building and analysis. Protein Sci. 2023;32(11):e4792.

24.

Jumper

Evans

Pritzel

, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583-589.

25.

Yang

Wang

Ding

Gao

AnnoLnc2: the one-stop portal to systematically annotate novel lncRNAs for human and mouse. Nucleic Acids Res. 2020;48(W1):W230-W238.

26.

Lin

Pan

Shen

HB.

lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning. Bioinformatics. 2021;37(16):2308-2316.

27.

Kohany

Gentles

Hankus

Jurka

Annotation, submission and screening of repetitive elements in repbase: RepbaseSubmitter and censor. BMC Bioinformatics. 2006;7:474.

28.

Huang

Zhang

, et al. ILoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics. 2018;34(24):4196-4204.

29.

Mokrejs

Masek

Vopálensky

Hlubucek

Delbos

Pospísek

IRESite–a tool for the examination of viral and cellular internal ribosome entry sites. Nucleic Acids Res. 2010;38(Database issue):D131-D136.

30.

Hofacker

Stadler

PF.

Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics. 2006;22(10):1172-1176.

31.

Jha

Quesnel-Vallières

Wang

Thomas-Tikhonenko

Lynch

Barash

Identifying common transcriptome signatures of cancer by interpreting deep learning models. Genome Biol. 2022;23(1):117.

32.

Kildisiute

Kalyva

Elmentaite

, et al. Transcriptional signals of transformation in human cancer. Genome Med. 2024;16(1):8.

33.

Eralp

Sefer

Reference-free inferring of transcriptomic events in cancer cells on single-cell data. BMC Cancer. 2024;24(1):607.

34.

Teng

Yang

, et al. Tissue-specific transcription reprogramming promotes liver metastasis of colorectal cancer. Cell Res. 2020;30(1):34-49.

35.

Zhao

Chen

Zou

, et al. Pan-cancer transcriptome analysis reveals widespread regulation through alternative tandem transcription initiation. Sci Adv. 2024;10(28):eadl5606.

36.

Sveen

Kilpinen

Ruusulehto

Lothe

Skotheim

RI.

Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene. 2016;35(19):2413-2427.

37.

Sharma

Paul

, et al. Importance of transcript variants in transcriptome analyses. Cells. 2024;13(17):1502.

38.

Hossam Abdelmonem

Kamal

Wardy

, et al. Non-coding RNAs: emerging biomarkers and therapeutic targets in cancer and inflammatory diseases. Front Oncol. 2025;15:1534862.

39.

Sun

Han

, et al. Long-read sequencing reveals the landscape of aberrant alternative splicing and novel therapeutic target in colorectal cancer. Genome Med. 2023;15(1):76.

40.

Mayr

What are 3’ UTRs doing?

Cold Spring Harb Perspect Biol. 2019;11(10):a034728. doi:10.1101/cshperspect.a034728

41.

Aspden

Wallace

EWJ

Whiffin

Not all exons are protein coding: addressing a common misconception. Cell Genom. 2023;3(4):100296.

42.

Hinnebusch

Ivanov

Sonenberg

Translational control by 5’-untranslated regions of eukaryotic mRNAs. Science. 2016;352(6292):1413-1416.

43.

Abdallah

KS.

Uncovering mRNA sequences that control translation initiation. Nat Rev Mol Cell Biol. 2025;26:645-645.

44.

Zhang

Qian

Yang

Alternative splicing and cancer: a systematic review. Signal Transduct Target Ther. 2021;6(1):78.

45.

Huang

JKL

, et al. Long-read transcriptome sequencing reveals abundant promoter diversity in distinct molecular subtypes of gastric cancer. Genome Biol. 2021;22(1):44.

46.

Tsintari

Walter

Fend

, et al. Alternative splicing of apoptosis stimulating protein of TP53-2 (ASPP2) results in an oncogenic isoform promoting migration and therapy resistance in soft tissue sarcoma (STS). BMC Cancer. 2022;22(1):725.

47.

Leong

AZX

Lee

Mohtar

Syafruddin

Pung

Low

. Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures. J Biomed Sci. 2022;29(1):19.

48.

Chen

Fansler

Janjoš

Ule

Mayr

The FXR1 network acts as a signaling scaffold for actomyosin remodeling. Cells. 2024;187(18):5048-5063.e25.

49.

KKW

Zhang

Cho

. Competing endogenous RNAs (ceRNAs) and drug resistance to cancer therapy. Cancer Drug Resist. 2024;7:37.

50.

Oksuz

Henninger

Warneford-Thomson

, et al. Transcription factors interact with RNA to regulate genes. Mol Cell. 2023;83(14):2449-2463.e13.

51.

Jagodnik

Chiaruttini

Guillier

Stem-loop structures within mRNA coding sequences activate translation initiation and mediate control by small regulatory RNAs. Mol Cell. 2017;68(1):158-170.e3.

52.

Rhine

Al-Azzam

Yeo

GW.

Aging RNA granule dynamics in neurodegeneration. Front Mol Biosci. 2022;9:991641.

53.

Liu

Coding or noncoding, the converging concepts of RNAs. Front Genet. 2019;10:496.

54.

Davuluri

Suzuki

Sugano

Plass

Huang

TH.

The functional consequences of alternative promoter use in mammalian genomes. Trends Genet. 2008;24(4):167-177.

55.

Zhang

Sjöström

Cui

, et al. Integrative analysis of ultra-deep RNA-seq reveals alternative promoter usage as a mechanism of activating oncogenic programmes during prostate cancer progression. Nat Cell Biol. 2024;26(7):1176-1186.

56.

Wang

Liu

Zhang

, et al. TFE3 and HIF1α regulates the expression of SHMT2 isoforms via alternative promoter utilization in ovarian cancer cells. Cell Death Discov. 2025;16(1):178.

57.

Maltby

Schofield

JPR

Houghton

, et al. A 5’ UTR GGN repeat controls localisation and translation of a potassium leak channel mRNA through G-quadruplex formation. Nucleic Acids Res. 2020;48(17):9822-9839.

58.

Da Sacco

Masotti

. Recent insights and novel bioinformatics tools to understand the role of microRNAs binding to 5’ untranslated region. Int J Mol Sci. 2012;14(1):480-495.

59.

Wang

Zhang

Shi

, et al. M2 tumor-associated macrophages-derived exosomal MALAT1 promotes glycolysis and gastric cancer progression. Adv Sci. 2024;11(24):e2309298.

60.

Kalluri

LeBleu

VS.

The biology, function, and biomedical applications of exosomes. Science. 2020;367(6478):eaau6977. doi:10.1126/science.aau6977

61.

Cipta

Zeng

Wong

, et al. Rewiring of SINE-MIR enhancer topology and esrrb modulation in expanded and naive pluripotency. Genome Biol. 2025;26(1):107.

62.

Nishihara

Retrotransposons spread potential cis-regulatory elements during mammary gland evolution. Nucleic Acids Res. 2019;47(22):11551-11562.

63.

Morales-Hernández

González-Rico

Román

, et al. Alu retrotransposons promote differentiation of human carcinoma cells through the aryl hydrocarbon receptor. Nucleic Acids Res. 2016;44(10):4665-4683.

64.

Fort

Khelifi

Hussein

SMI

. Long non-coding RNAs and transposable elements: a functional relationship. Biochim Biophys Acta Mol Cell Res. 2021;1868(1):118837.

65.

Kobayashi

Miyashita

Murayama

Toyoshima

YY.

Dynactin has two antagonistic regulatory domains and exerts opposing effects on dynein motility. PLoS One. 2017;12(8):e0183672.

66.

Babic

Ugrin

Jeremic

, et al. Dysregulation of transcripts SMAD4-209 and SMAD4-213 and their respective promoters in colon cancer cell lines. J Cancer. 2024;15(15):5118-5131.

67.

Crouigneau

Auxillos

, et al. Mimicking and analyzing the tumor microenvironment. Cell Rep Methods. 2024;4(10):100866.

68.

Nestor

Ottaviano

Reinhardt

, et al. Rapid reprogramming of epigenetic and transcriptional profiles in mammalian culture systems. Genome Biol. 2015;16(1):11.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.09 MB

0.55 MB

0.03 MB