Abstract
Objectives:
This study aimed to identify transcript isoforms of protein-coding genes with potential relevance to the malignant transformation of gut mucosa.
Methods:
Colon cancer cell lines (HCT116, DLD1, SW620) and immortalized cells derived from healthy gut epithelium (HCEC-1CT) were cultured as spheroids and subjected to RNA sequencing to profile both canonical and non-canonical transcripts. The resulting data were compared with prior bioinformatics study findings that analyzed RNA-seq datasets from 473 patient-derived tumor and 417 non-tumor colon tissue samples.
Results:
Among 375 transcripts previously reported as significantly dysregulated in colon (39 up-regulated and 336 down-regulated), 32 transcripts displayed expression patterns in colon cell lines consistent with those observed in patient tissues (4 up-regulated and 28 down-regulated). In silico characterization of these molecules revealed that all of them exhibited at least 1 feature commonly associated with RNAs possessing regulatory functions, such as coding truncated protein isoform, exosomal localization, or enrichment in repetitive elements. The most prominently dysregulated transcripts with consistent expression profiles across both datasets were NTMT1-204 (up-regulated in cancer) and BLOC1S6-218 and DCTN1-205 (both down-regulated in cancer). The remaining 343 transcripts did not show consistent expression patterns in the cell lines, suggesting their dysregulation in patient-derived tissues may be due to the stromal or microenvironmental factors absent in vitro.
Conclusion:
In summary, this comparative transcriptomic analysis identified 32 transcript isoforms, comprising 2 canonical and 30 non-canonical transcripts, that may play regulatory roles in colon carcinogenesis and warrant further investigation in the context of gut epithelial cell biology.
Introduction
Colorectal cancer (CRC) is the third most commonly diagnosed malignancy worldwide and remains a leading cause of cancer-related mortality. 1 Key molecular hallmarks of CRC include aberrant activation of the WNT/β-catenin pathway, mutations in KRAS and TP53, and widespread dysregulation of gene expression and alternative splicing. 2 Tumors primarily originate from specific genetic lesions, yet their progression and phenotypic diversity are shaped by a complex interplay of molecular alterations, many of which are reflected in gene expression profiles. A global disruption of transcriptional regulation is now recognized as a hallmark of cancer, reinforcing the concept that malignancy represents a disease of aberrant gene expression. 3 Recent pan-cancer studies have shown that changes in promoter activity, rather than overall gene expression, may more accurately capture the transcriptomic reprograming of malignant cells. 4 Most mammalian protein-coding genes are transcribed from multiple promoters, giving rise to a spectrum of transcript isoforms. 5 Although 1 dominant transcript typically prevails at each protein-coding locus, many alternative isoforms contribute to transcriptomic diversity without necessarily expanding proteomic output. 6 Importantly, shifts in transcript isoform usage, independent of total gene expression, have been associated with distinct tumor phenotypes, disease progression, and prognosis across multiple cancer types.7 -9 While promoter regulation has traditionally been studied through the lens of DNA methylation, emerging evidence highlights additional mechanisms, including G-quadruplex motifs and signatures of accelerated somatic evolution, that influence promoter selection and transcription initiation.10,11 The transcriptional patterns generated by different promoters and their functional relevance are poorly explored, but it is clear that the relationship between transcript isoform usage and gene expression is far from straightforward. 12
The non-coding transcriptome has gained prominence as a key regulatory layer in both normal physiology and disease states, including cancer. 13 The generation of multiple transcript isoforms from a single gene adds regulatory and functional complexity to the genome, with implications for development, stress response, and pathological transformation. Isoforms produced by alternative promoter usage, splicing and polyadenylation in protein-coding genes are of particular interest, as they may be utilized as biomarkers for cancer detection and prognosis, or as therapeutic targets, especially once their oncogenic or tumor-suppressive properties are validated. Those molecular mechanisms not only generate multiple protein-coding isoforms but also contributes to the expression of long non-coding RNAs (lncRNAs) and dual-function coding/non-coding RNAs (cncRNAs). These RNA isoforms, characterized by rapid evolutionary divergence and cell type specificity, regulate diverse biological processes including differentiation, development, and tissue homeostasis.5,14 These transcripts may have potential regulatory roles, supported by characteristics including translation into truncated proteins, predicted exosomal localization, and enrichment in repetitive elements.15 -18 According to Ensembl annotation, each protein-coding gene is represented by 1 canonical transcript isoform, designated as the Ensembl canonical or MANE (Matched Annotation from NCBI and EMBL-EBI) Select transcript, which corresponds to the principal RNA species most consistently expressed across tissues and best supported by experimental evidence. All other isoforms are considered non-canonical transcripts, differing from the canonical form in their primary sequence due to alternative promoter usage, splicing, or polyadenylation. Based on coding potential, Ensembl classifies transcripts as coding if they give rise to a translated protein product, and non-coding if no translation occurs. Proteins translated from alternative transcript isoforms may be full-length or truncated relative to the canonical protein. 19
Despite advances in RNA sequencing technologies, most transcriptomic analyses have predominantly focused on gene-level expression, often overlooking isoform-specific regulation. Consequently, a gap remains in our understanding how transcript-level dynamics contribute to disease biology of cancer. To address this gap, the present study compared transcriptomic profiles from spheroid-cultured colon cancer cell lines and patient-derived tumor tissues. The primary aim was to identify transcript isoforms derived from protein-coding genes that may have a role during the malignant transformation of gut epithelium and the initiation and progression of colon cancer.
Methods
The reporting of this study conforms to the STROBE statement (Supplemental File 1). 20
Cell Lines
Several human immortalized cell lines originating from the colon tissue were used in this study: the immortalized epithelial cell line HCEC-1CT (Evercyte, CkHT-039-0229) and malignant cell lines representing different stages of colon cancer – HCT116 (ATCC, CCL-247; early-stage primary tumor), DLD-1 (ATCC, CCL-221; late-stage primary tumor), and SW620 (ATCC, CCL-227; model of metastatic colon cancer). All cell lines were maintained at 37°C and 5% CO2 in Dulbecco’s Modified Eagle’s – Medium (DMEM; Capricorn Scientific, Germany) supplemented with 10% fetal bovine serum (FBS; Capricorn Scientific, Germany) and 1% antibiotic/antimycotic solution (Capricorn Scientific, Germany). Cell morphology was routinely monitored by microscopy, and mycoplasma contamination was evaluated by polymerase chain reaction (PCR). Additionally, cell lines were cultured as three-dimensional (3D) spheroids. For spheroid generation, adherent cells were detached using 1× trypsin/EDTA (Capricorn Scientific, Germany) and counted with a standard hemocytometer. Approximately 2 × 10⁵ cells per well were seeded into 24-well Nunclon™ Sphera™ plates (Thermo Fisher Scientific, USA), which feature a low-attachment surface, in 1 mL of DMEM supplemented with 10% FBS and 1% antibiotic/antimycotic solution. Spheroids were maintained for 7 days in a humidified incubator at 37°C with 5% CO₂. Compact, debris-free spheroids were collected under a microscope for subsequent total RNA extraction.
RNA Extraction and Sequencing
Total RNA was isolated from spheroids cultured in two 24-well plates using PureLink™ RNA Mini Kit (Thermo Fisher Scientific, USA) according to the manufacturer’s protocol. RNA concentration and purity were assessed by absorption at 260 and 280 nm using BioSpec-nano spectrophotometer (Shimadzu, Japan).
High-throughput next generation RNA sequencing was performed by Novogene (UK) Company Limited (Cambridge, United Kingdom). Prior to library preparation, total RNA was subjected to quality check (QC) using a combination of 1% agarose gel electrophoresis (for RNA integrity), NanoDrop spectrophotometry (for RNA quantity and purity), and Agilent2100 Bioanalyzer (for RNA Integrating Number, RIN).
Ribosomal RNA was depleted during library preparation to enrich for both coding and non-coding transcripts. Sequencing was performed using Illumina’s NovaSeq6000 platform that generated paired-end reads of 150 bp.
Bioinformatics and In Silico Analyses
RNA-seq data were processed using Novogene’s validated analysis pipeline, which included read quality filtering, alignment to the human reference genome, and transcript quantification. Final gene expression levels were reported as FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Transcripts with FPKM values below 0.3 were considered not expressed. 21 Post-NGS analysis was performed by an in-house Python script, and file preparation was conducted by filtration in Excel using built-in functions. To compare our RNA-seq results from colon spheroids with patient-derived colon tissue samples, we analyzed the published data from a study by Demircioğlu et al. 4
Transcript expression level in colon samples was retrieved from the UCSC Xena Functional Genomics Explorer using TCGA TARGET GTEx dataset (Toil recompute; dataset ID: TcgaTargetGtex_rsem_transcript_fpkm), while colon cancer stages data were retrieved from TCGA Pan-Cancer (PANCAN) dataset (TcgaPancancer_rsem_transcript_fpkm). 22 Transcript expression levels were quantified in FPKM units, and statistical comparisons was performed between TCGA primary tumor (288 samples), TCGA adjacent non-tumor tissue (41 samples) and GTEx normal tissue (308 samples) using Kruskal-Wallis test followed with Dunn’s multiple comparisons test. For colon cancer stages, statistical comparison was performed between stage I (44 samples), stage II (109 samples), stage III (80 samples), and stage IV (40 samples) using Kruskal-Wallis test followed with Dunn’s multiple comparisons test. Differences were considered statistically significant if P < .05.
To further characterize dysregulated candidate transcripts and predict their potential roles in malignant transformation, several in silico tools were utilized. The predictions of the 3D structure of the truncated and full-length protein isoforms, translated from candidate transcripts, were analyzed using UCSF ChimeraX (version 1.10.1). 23 Protein Data Bank (PDB) files were retrieved from AlphaFold for every pair of canonical/non-canonical isoform, aligned and root mean square deviation (RMSD) values were computed. 24 A Sashimi plot of the Binary Alignment Map (BAM) files was generated using the Integrative Genomics Viewer (IGV). The presence of repeat elements was analyzed by the AnnoLnc2 and the Censor tools, while prediction of subcellular localization was assessed by lncLocator and iLoc-LncRNA tools.25 -28 The Internal Ribosome Entry Sites (IRES) within the 5′ untranslated regions (5′UTRs) were predicted using IRESite. 29 RNA secondary structure stability was estimated using RNAfold by calculating the Minimum Free Energy (MFE). 30
To evaluate the suitability of dysregulated transcripts as candidate biomarkers or therapeutic targets, expression thresholds were applied to identify cancer-specific expression profiles. Transcripts were classified as strong biomarker candidates if their expression was undetectable in the non-malignant colon epithelial cell line HCEC-1CT (FPKM < 0.3) and markedly elevated in colon cancer cell lines (FPKM > 5). Conversely, transcripts that may exert therapeutic effects were required to exhibit robust expression in HCEC-1CT (FPKM > 5) and low or undetectable expression in cancer cell lines (FPKM < 0.3).
Results
Transcriptomic data obtained by the analysis of spheroid-cultured colon cell lines were deposited in the GEO database (accession number GSE291181). A total of 53.434 transcripts were detected in the immortalized cell line from the gut mucosa, HCEC-1CT (Figure 1A). An increased number of transcripts was detected in the cell lines that served as the model of early- and late-stage primary tumor – 55.095 in HCT116 (primary tumor, Dukes’ A) and 61.626 in DLD1 (primary tumor, Dukes’ C). A total of 53.876 transcripts were detected in the cell line SW620, which served as a model of metastatic colon cancer. The number of detected transcripts in SW620 was similar to that in HCEC-1CT, but one-third of the detected transcripts were different in metastatic in comparison to the normal gut mucosa cell line. To better visualize transcriptome-wide expression patterns, we compared FPKM distributions across the 4 analyzed cell lines (Figure 1B). Tumor cell lines DLD1, HCT116 and SW620 have a slightly higher fraction of transcripts expressed at moderate-to-high levels (log10(FPKM + 1) > 1) compared to HCEC-1CT (HCEC-1CT: 2.49%, HCT116: 2.99%, DLD1: 3.05%, SW620: 3.03%).

(A) Distribution of expressed transcripts across analyzed colon cancer cell lines and non-malignant cell line, (B) scatter plot shows log10(FPKM + 1) values for all detected transcripts in analyzed cell lines. DLD1 – primary tumor, Dukes’, and (C) HCEC-1CT – immortalized cell line from the gut mucosa, HCT116- primary tumor, Dukes’ A and SW620 – cell line which served as a model of metastatic colon cancer.
We compared transcriptomic profiles from colon cell lines cultured as spheroids with RNA sequencing data from patient-derived colon tissue samples. This analysis focused on 375 transcripts previously identified as significantly dysregulated in colon cancer (2). Only transcripts exhibiting an opposing expression pattern between the non-malignant colon epithelial cell line (HCEC-1CT) and cancer cell lines (DLD1, SW620, HCT116) were considered dysregulated in colon cancer. This comparative analysis identified 32 transcripts, annotated in the GRCh38 genome assembly, with expression patterns in spheroid-grown cell lines consistent with those observed in patient-derived tissues (Table 1). Among these, 2 transcripts were annotated by Ensembl as canonical isoforms, while the remaining 30 were classified as non-canonical and exhibited differences in their primary structure compared to the canonical forms.
Dysregulated Transcripts in Colon Cancer with Corresponding Expression Profiles in Colon Cell Lines and Patient-derived Tissues. TOP CANDIDATES are Bolded.
Further in silico analyses were performed to investigate the potential regulatory functions of the 32 transcripts, revealing that all exhibit at least 1 characteristic feature associated with regulatory RNA molecules such as encoding truncated proteins, localization to exosomes, or the presence of repetitive elements (Tables 1 and 2). Of these, only 2 transcripts (CENPS-201 and AKNA-206) are annotated as canonical isoforms by Ensembl and encode full-length proteins. CENPS-201 is predicted by lncLocator to localize to exosomes, while AKNA-206 harbors repetitive elements predicted by Censor, both traits commonly linked to regulatory functions of RNA molecules. The remaining 30 transcripts are classified as non-canonical isoforms, distinguished by unique transcript features including alternative 5′ UTRs or 3′UTRs and alternatively spliced exons. These structural variations contribute to distinct primary transcript architectures that may influence transcript stability, localization, and functional output. Among the non-canonical isoforms, five are predicted to encode full-length proteins, albeit with altered 5′ UTRs compared to their canonical counterparts, which could affect their translational regulation. The other 25 non-canonical isoforms encode either truncated proteins or are not translated into protein sequence.
Results of In silico Analysis of Repeat Elements and Subcellular Localization Performed for Transcripts Dysregulated in Colon Cancer Cell Lines and Patient-derived tissues. TOP CANDIDATES are Bolded.
Predicted 3D structures of full-length and truncated protein isoforms, together with their structural superpositions, were presented in Figure 2. To assess the structural similarity between canonical and truncated isoforms, we performed sequence-guided structural alignments and calculated RMSD values for each pair (Figure 2D). Several truncated isoforms showed strong overlap with the corresponding regions of their full-length proteins, including SMTN-201 versus SMTN-206 (RMSD = 1.17 Å), DNPEP-201 versus DNPEP-206 (0.49 Å), and UBE2D3-210 versus UBE2D3-211 (0.24 Å). Similarly, BLOC1S6-201 versus BLOC1S6-218 (2.17 Å) and DNPEP-201 versus DNPEP-207 (0.37 Å) also displayed low RMSD values, consistent with preservation of local folding. However, even in these cases, the similarity applies only to the overlapping regions: truncated isoforms often lack entire domains that are present in the canonical protein, resulting in proteins that are structurally incomplete despite retaining a nearly identical fold in the preserved segment. In contrast, other isoforms exhibited markedly higher RMSD values, such as NCUB2-207 versus NCUB2-217 (31.46 Å), PLD1-201 versus PLD1-203 (23.67 Å), and SLC43A1-201 versus SLC43A1-208 (11.48 Å), reflecting substantial conformational divergence relative to the canonical isoforms.

Structural comparison of full-length and truncated protein isoforms: (A) full-length protein isoforms, (B) truncated protein isoforms, (C) aligned superpositions, and (D) RMSD values (Å) between Cα atoms of full-length and truncated isoform pairs.
To identify cancer-specific expression profiles and evaluate the potential of dysregulated transcripts as biomarkers or therapeutic targets, we applied expression thresholds, designating transcripts as strong biomarker candidates if they were undetectable in the non-malignant HCEC-1CT cell line (FPKM < 0.3) and highly expressed in colon cancer cell lines (FPKM > 5), whereas transcripts with potential therapeutic effects were defined by robust expression in HCEC-1CT (FPKM > 5) and low or absent expression in cancer cell lines (FPKM < 0.3). Based on these criteria, NTMT1-204 (up-regulated), and BLOC1S6-218 and DCTN1-205 (down-regulated), emerged as the most promising candidates. Expression patterns of the 32 transcript candidates were further assessed using publicly available patient-derived RNA-seq data from the UCSC Xena Browser, which integrates TCGA colon tumor tissues, TCGA matched adjacent non-tumor tissues, and GTEx normal colon samples (Supplemental File 2).
NTMT1-204 was exclusively expressed in malignant cell lines (FPKM = 0 in HCEC-1CT; FPKM > 5 in DLD1, SW620, and HCT116). This observation was corroborated by publicly available patient data: in the UCSC Xena Browser, NTMT1-204 was expressed approximately sixfold higher in colon tumors compared to matched adjacent tissue in the TCGA dataset (p = 5.203 × 10−19), and more than 250-fold higher relative to healthy colon tissues from the GTEx dataset (p = 4.734 × 10−102; Figure 3A). Also, comparison of NTMT1-204 expression across 4 tumor stages revealed that transcript expression remained elevated throughout disease progression, with significant difference observed between stage II and stage IV (P = .04; Figure 3B). This transcript encodes a full-length protein isoform (UniProt ID: Q9BV86-1) and features a unique 5′UTR composed of a non-coding exon 1 and a partially coding exon 2 (Figure 4A), implicating the usage of tumor-specific splicing events (Figure 5A). RNA secondary structure prediction using RNAfold revealed that the 5′UTR of NTMT1-204 has a minimum free energy (MFE) 68 kcal/mol lower than that of the major transcript NTMT1-203, indicating a more thermodynamically stable structure. A subsequent BLASTN search against experimentally validated internal ribosome entry sites yielded only short (11-12 nucleotide) matches with high E-values, suggesting no significant IRES elements in the 5′UTR of NTMT1-204.

(A) Expression level of selected dysregulated transcripts NTMT1-204, BLOC1S6-218 and DCTN1-205 using patient data retrieved from UCSC Xena browser (n TCGA Tumor = 288, n TCGA Adjacent Non-Tumor Tissue = 41, n GTEx = 308). Data are presented as mean ± standard deviation (SD). ****P < .0001, n-number of samples. (B) Expression level of NTMT1-204 across colorectal cancer stages I-IV using patient data retrieved from UCSC Xena browser (n Stage I = 44, n Stage II = 109, n Stage III = 80, n Stage IV = 40). **P < .01, n-number of samples. Data are derived from TCGA bulk RNA-seq, and observed expression patterns may be influenced by tumor purity and the presence of non-epithelial cell populations.

Transcript structures of selected differentially expressed transcripts. (A) NTMT1-204, (B) BLOC1S6-218, and (C) DCTN1-205. Transparent boxes represent non-coding exons/part of the exon, colored boxes represent coding exons and lines connecting boxes represent introns. Red arrows point to characteristic transcript isoform sequences. Modified from Ensembl Genome Browser (release 114).

Sashimi plots showing alternative splicing patterns in the genomic regions of the selected dysregulated transcripts. (A) NTMT1 gene loci, (B) BLOC1S6 gene loci, and (C) DCTN1 gene loci. Exon expression is shown by read coverage histograms, while splice junction support is indicated by arcs annotated with junction read counts. Alternative exon junctions have been marked with red asterisk (*).
BLOC1S6-218, in contrast, showed strong expression in the non-malignant HCEC-1CT cell line (FPKM = 20) while being present in trace amounts in cancer cell lines (FPKM < 0.3). This expression pattern was not confirmed using patient data retrieved from the UCSC Xena browser (Figure 3A). It contains a unique 5′UTR due to the inclusion of a non-coding exon 3 (Figures 4B and 5B) and encodes a truncated 75 aa isoform, substantially shorter than the canonical 172 aa BLOC1S6 protein. The altered 5′UTR, low coding potential, nuclear localization, and presence of SINE/Alu elements suggest potential regulatory roles for this isoform.
DCTN1-205 was robustly expressed in HCEC-1CT (FPKM = 22) but was not detected in any malignant cell lines (FPKM = 0). This expression pattern was not confirmed using patient data retrieved from the UCSC Xena browser, although there is a trend of overexpression in non-malignant tissue (Figure 3A). Its structure diverges from canonical DCTN1 transcripts due to exon skipping and use of alternative splice sites, producing a distinct exon combination (Figures 4C and 5C). Also, it lacks a 3′UTR compared to the canonical isoform, making the transcript end a unique feature of this isoform. This isoform encodes a 1253 amino acid protein, slightly shorter than the canonical 1278 aa isoform. According to the UniProt, the DCTN1-205-derived protein lacks amino acid segments 132 to 151 and 1066 to 1070 relative to the full-length isoform, changes that could affect its structural integrity or interaction with components of the dynactin-dynein complex.
Building on previous patient-based studies that inferred promoter activity from transcript-level expression, we investigated whether similar transcript isoform shifts, potentially reflecting alternative promoter usage, occur between non-malignant and malignant colon cell lines. In our study, we analyzed transcriptomic data from a non-malignant colon mucosa cell line (HCEC-1CT) and 3 colon cancer cell lines (HCT116, DLD1, and SW620) to identify analogous patterns of transcript-level regulation. Specifically, we filtered for transcripts that showed at least a threefold change in expression between cancer cell lines and the non-malignant cell line. We focused on genes exhibiting at least 1 up-regulated and 1 down-regulated transcript isoform, consistent with a potential switch in promoter usage. This approach identified 2768 up-regulated and 2003 down-regulated transcripts arising from a total of 1338 genes, each exhibiting multiple transcript isoforms with differential expression between malignant and non-malignant conditions. However, among the genes previously reported to exhibit promoter switching in tumor and non-tumor tissues from colon cancer patients (including PHF19, PRKAR1B, CD81, and MCF2), such isoform switching patterns were not recapitulated in the transcriptomic profiles of the analyzed colon cancer cell lines. 4
Discussion
To comprehensively understand the molecular complexity underlying colon cancer, it is essential to move beyond gene-level analyses and interrogate transcript-level regulatory events, including alternative promoter usage and non-canonical splicing. Building on the growing evidence that such mechanisms generate functionally distinct RNA isoforms with roles in tumor biology, this study set out to determine whether the transcriptomic features previously identified in patient-derived colon tumors, particularly those involving coding–noncoding duality and isoform-specific regulation, could be mirrored in spheroid-cultured colon cancer cell lines. Through direct comparison with non-malignant gut epithelial cells, we aimed to identify transcript isoforms derived from protein-coding genes that may exert regulatory functions during the malignant transformation of gut epithelium, offering insight into transcriptome remodeling events that drive colon cancer onset and progression.
The number of expressed transcripts progressively increased from non-malignant to malignant cell lines, consistent with previous observations that cancer cells often exhibit a more complex and dysregulated transcriptome.31,32 In particular, the transition from HCEC-1CT (53 434 transcripts) to early-stage HCT116 (55 095 transcripts) and late-stage DLD1 (61 626 transcripts) models suggests that transcriptomic complexity correlates with tumor stage. A recent study analyzed single-cell RNA sequencing data from esophageal cancer and glioblastoma tissues, comparing them to normal tissues, where a significant number of novel transcripts, including splicing variants and non-coding RNAs, were present in cancer cells but not in normal tissues. 33 Interestingly, the metastatic SW620 cell line, despite having a transcript count similar to the non-malignant HCEC-1CT (53 876 transcripts), displayed a markedly different transcriptome composition. Approximately one-third of the transcripts expressed in SW620 were distinct from those in HCEC-1CT, indicating a shift in transcriptional programs associated with metastasis. This may be explained by the rewiring of gene expression programs during metastasis. A study demonstrated that liver-metastatic CRC cells acquire a reshaped epigenetic landscape, resulting in reprogramed, tissue-specific transcription. 34 These findings support the hypothesis that transcriptome diversification is a hallmark of tumor progression, driven mostly by mechanisms such as alternative transcription initiation and aberrant splicing, which generate cancer-specific transcript isoforms.35,36 The fact that significant transcriptomic shifts were detected in the absence of changes in total transcript count (as seen in SW620) emphasizes the importance of isoform-level analysis. 37 This supports growing recognition of non-canonical transcript isoforms as key players in tumor biology, including colon cancer, with potential implications for biomarker discovery and therapeutic targeting.38,39
The comparative transcriptomic analysis between spheroid-cultured colon cancer cell lines and patient-derived tumor tissues highlights a reproducible set of dysregulated transcript isoforms with potential roles in colon carcinogenesis. Out of 375 transcripts previously identified as significantly dysregulated in colon cancer, 4 32 showed consistent expression patterns across both patient tissues and cell line models, reinforcing their possible biological relevance and robustness across experimental systems. The majority of analyzed transcripts (30 out of 32) were classified as non-canonical isoforms, highlighting the transcriptomic complexity underlying cancer-specific gene regulation. These isoforms are characterized by unique 5′ and/or 3′UTRs and alternative splicing events, which can profoundly alter RNA fate, affecting stability, localization, and translational efficiency.40 -42 Among these, 5 non-canonical transcripts retain full-length coding potential but differ from their canonical counterparts in their 5′ UTRs, potentially modulating translation initiation and interaction with RNA-binding proteins. 43 In contrast, the remaining 25 non-canonical isoforms either encode truncated proteins or lack coding potential altogether, suggesting that differential promoter usage and splicing may produce functional diversity or regulatory decoys during tumor development.44,45 The comparative structural analysis highlights 2 distinct classes of protein isoforms encoded by dysregulated transcript isoforms. Protein isoforms with very low RMSD (<2 Å) such as DNPEP, SMTN, UBE2D3, and BLOC1S6 demonstrate that truncation can preserve the secondary and tertiary structure of the aligned region. However, these proteins remain incomplete: the absence of domains present in the canonical proteins indicates that, despite local structural similarity, truncated isoforms may lose essential binding motifs or regulatory elements. This raises the possibility that they could function in dominant-negative or regulatory roles by partially retaining fold integrity while lacking full-length functionality. By contrast, isoforms with high RMSD (>10 Å) such as NCUB2-217, PLD1-203, and SLC43A1-208 show extensive divergence from canonical folding, consistent with more profound alterations of protein architecture and potentially novel functions. Together, these findings emphasize that RMSD must be interpreted in the context of truncation: small values reflect similarity in preserved segments, but cannot capture the biological consequences of missing domains. Truncated proteins may exert dominant-negative effects or participate in signaling networks with altered specificity. A recent study showed that alternative splicing of ASPP2 leads to the generation of a truncated isoform (ASPP2κ), which lacks the C-terminal p53-binding domain and acts in a dominant-negative manner, promoting cellular migration and therapy resistance in soft tissue sarcoma. 46
Collectively, these observations suggest that isoforms previously classified based solely on coding potential may engage in diverse functional roles, including those traditionally ascribed to non-coding RNAs, thereby underscoring the blurred boundaries between protein-coding and regulatory transcript functions. Many transcripts previously annotated as non-coding contain short open reading frames (sORFs), which have been shown to encode functional micropeptides, a class of small proteins typically fewer than 100 amino acids in length. These micropeptides have been implicated in various cellular processes, including signal transduction, cytoskeletal organization, and modulation of the tumor microenvironment, and are increasingly recognized for their roles in cancer development and progression. 47 Conversely, transcripts annotated as protein-coding may serve non-coding regulatory functions, such as scaffolding RNA-binding proteins and sequestering microRNAs (acting as competing endogenous RNAs or “sponges”) enabled by structural alterations that facilitate roles in malignant transformation.48,49 Recent studies have also demonstrated that mRNAs can act as decoys for transcription factors, modulate translation independently of their coding sequence, or influence RNA granule dynamics.50 -52 These observations have contributed to the emergence of a new conceptual framework of coding-noncoding duality, in which a single RNA molecule may serve both protein-coding and regulatory roles. 53 This paradigm blurs the classical binary distinction between coding and non-coding RNAs, emphasizing the need for functional validation beyond computational annotation.
The convergence of the transcriptomic alterations, such as truncation, retained introns, altered UTRs, and divergence from coding annotations, aligns with the growing recognition that cancer involves not only gene-level dysregulation but also isoform-specific reprograming. Transcript variants derived from alternative promoter usage or non-canonical splicing can modulate gene output and functional phenotype without necessarily altering total gene expression levels, a phenomenon increasingly linked to tumor progression and therapeutic resistance. 54
Based on our transcriptomic data, NTMT1-204 emerged as the most promising biomarker candidate for colon cancer detection, a finding that was further supported by patient-derived data available through the UCSC Xena platform. The consistent upregulation of NTMT1-204 from early to advanced CRC stages, with the highest expression at stage IV, indicates that this isoform may be involved in tumor development and could warrant further evaluation as a potential biomarker of disease progression. Structurally, this isoform features a unique 5′UTR pointing to tumor-specific splicing and/or alternative promoter usage.44,55,56 The RNA secondary structure analysis revealed that the 5′UTR of NTMT1-204 is considerably more thermodynamically stable than that of the major NTMT1-203 isoform. Increased structural stability of the 5′UTR may influence translational efficiency, RNA half-life, subcellular localization, or interaction with RNA-binding proteins and microRNAs.42,57,58 Further evidence supporting the potential regulatory role of NTMT1-204 includes its predicted localization to exosomes, as well as the presence of repetitive SINE/MIR elements within its sequence. Exosomal localization is a hallmark of many regulatory RNAs, which are selectively packaged and secreted via extracellular vesicles to mediate intercellular communication and modulate gene expression in recipient cells. It has been shown that M2 tumor-associated macrophage–derived exosomes promote gastric cancer progression by transferring MALAT1, which enhances aerobic glycolysis, metastasis, and chemoresistance in gastric cancer. 59 Exosomal RNAs, due to their remarkable stability in bodily fluids and selective enrichment in tumor-derived vesicles, hold strong potential as non-invasive biomarkers for early cancer detection, prognosis, and treatment monitoring. 60 Repetitive elements, such as SINE sequences, play a pivotal role in transcriptional regulation by serving as cis-regulatory elements, influencing promoter activity, alternative splicing, and epigenetic modifications, thereby modulating gene expression programs in both physiological and pathological contexts. SINE/MIR elements have been implicated in the regulation of transcript expression by functioning as enhancers and providing transcription factor binding sites, with their presence in active regulatory regions across tissues suggesting a role in maintaining normal gene expression; when embedded in tumor-suppressive transcripts, MIRs may help sustain their expression, and their loss or silencing in tumors could disrupt these regulatory inputs, contributing to tumorigenesis.61,62 Conversely, BLOC1S6-218 and DCTN1-205 exhibited the opposite expression pattern, with high expression in the non-malignant HCEC-1CT cell line and minimal or undetectable levels in cancer cell lines, suggesting their strong potential as therapeutic targets. BLOC1S6-218 contains an additionally spliced exon in its 5′UTR, which could enable tumor-suppressive regulatory functions such as impairing translation initiation through complex RNA secondary structure, sequestering oncogenic RNA-binding proteins or microRNAs, or facilitating subcellular localization where it may act as a decoy RNA or modulator of signaling pathways involved in tumor-associated molecular pathways. 42 An additional indication of BLOC1S6-218’s potential regulatory role is the presence of repeat elements, including SINE/Alu elements. A recent study demonstrated that Alu retrotransposons can be transcribed into non-coding RNAs, which repress pluripotency genes NANOG and OCT4 expression, suggesting a regulatory mechanism with potential relevance to both normal and tumor cell growth. 63 This may suggest that Alu-containing transcripts may produce non-coding RNAs with distinct tumor-suppressive functions by regulating key signaling pathways involved in cancer development. Also, nuclear localization suggests a predominantly regulatory rather than translational function. There is also a connection between Alu elements and nuclear localization, both features of this transcript, as the SIRLOIN motif, a sequence element derived from SINE/Alu repeats, has been identified as a signal for the nuclear retention of lncRNAs. 64 DCTN1-205 harbors both a unique 5′UTR and a distinct exon composition resulting from exon skipping, which leads to the loss of specific amino acid segments. These deletions may alter the isoform’s ability to associate with the dynactin-dynein complex, potentially impairing its function in intracellular transport and cytoskeletal organization. A previous study demonstrated that the DCTN1B protein isoform, which lacks certain amino acid sequences, shows diminished microtubule-binding capacity and impaired dynein-driven motility, indicating that alternative splicing events removing key structural domains can profoundly affect the structural and functional integrity of the dynactin-dynein complex. 65 The apparent discrepancy between cell line expression data and patient-derived expression data for BLOC1S6-218 and DCTN1-205 can likely be attributed to several biological and technical factors. First, cell lines and patient tissues represent inherently different biological contexts. Cell lines, even when cultured as spheroids, lack stromal, immune, and microenvironmental components that are integral to the in vivo tumor niche. Such factors can strongly influence transcript expression patterns, particularly for non-canonical isoforms with putative regulatory functions. The lack of confirmation in patient data from UCSC Xena may therefore reflect the contribution of additional cellular compartments or regulatory interactions present in vivo but absent in monocultures of epithelial cells. Also, data type and depth of sequencing may play a role. Cell line spheroid RNA-seq was generated at high resolution and may capture rare or cell type–restricted isoforms, whereas patient bulk RNA-seq data represent heterogeneous mixtures of malignant epithelial, stromal, and immune cells. This heterogeneity can obscure the detection of low-abundance isoforms such as BLOC1S6-218 or DCTN1-205, leading to apparent discrepancies. However, there is the observed trend toward overexpression of DCTN1-205 in non-malignant tissue, although not reaching strong statistical confirmation in Xena. For BLOC1S6-218, the strong expression in non-malignant HCEC-1CT cells but not in tumors may indicate that its loss is a feature of malignant transformation, while residual expression in heterogeneous patient samples could be masked by dilution from non-epithelial cell populations. It is important to acknowledge that spheroid models, while more physiologically relevant than monolayer cultures, still represent a simplified system composed exclusively of epithelial cells. They lack the stromal, microenvironmental, and immune infiltrating cells that are present in patient-derived tissues and known to strongly influence gene expression programs. This limitation may explain why 343 of the transcripts identified as dysregulated in patient samples did not display concordant expression patterns in colon epithelial spheroids. The absence of these additional cellular compartments could mask or eliminate context-dependent regulation, leading to discrepancies between in vitro and in vivo datasets.
Transcript isoform switching is increasingly recognized as a hallmark of cancer-associated transcriptomic reprograming, often reflecting alternative promoter usage or splicing events that can modulate gene output without altering total gene expression. Our previous analysis revealed differential promoter activity and isoform expression, highlighting the role of promoter and transcript switching in malignancy. The non-canonical isoform SMAD4-209 encodes a full-length protein that may support SMAD4 function and cellular homeostasis, while SMAD4-213 encodes a truncated protein potentially interfering with TGF-β signaling. Additionally, SMAD4-213 may act at the regulatory level through miRNA and RBP interactions, pointing to a possible dual role (coding/non-coding). 66
Building upon prior studies that inferred promoter activity in colon cancer from transcript expression patterns in patient-derived tissues, we investigated whether similar regulatory dynamics could be recapitulated in cell line models of malignant transformation. Our approach, focusing on genes with transcript isoforms displaying opposing expression trends between malignant and non-malignant colon epithelial cell lines, revealed extensive isoform-level dysregulation. However, isoform switching events previously identified in colon cancer patient datasets for genes such as PHF19, PRKAR1B, CD81, and MCF2 were not mirrored in our colon cancer spheroid models. One likely explanation is that promoter switching may be strongly influenced by the tumor microenvironment, including stromal interactions, extracellular matrix cues, and infiltrating immune cells, which are absent in simplified epithelial spheroid cultures. In vivo, these additional cellular compartments can contribute to chromatin remodeling, transcription factor availability, or cytokine-driven signaling, all of which may affect promoter choice and isoform usage. 67 Additionally, epigenetic regulation, such as DNA methylation or histone modification at alternative promoters, may differ between cultured cells and primary tumors, potentially explaining the observed differences. 68 Also, tumor heterogeneity in patient tissues may drive context-specific promoter usage that cannot be captured by a limited number of cell line models. Collectively, our findings highlight both the promise and limitations of using cancer cell lines to model transcriptomic phenomena observed in patient tissues. While robust isoform-level dysregulation was evident in cell lines, the lack of congruence with promoter-switching events reported in vivo underscores the need for integrative studies combining cell lines, organoids, and patient-derived materials to comprehensively decode transcriptional regulation in colon cancer.
While this study provides an integrative bioinformatics framework for identifying transcript isoforms associated with colon carcinogenesis, experimental validation remains an important next step. In particular, loss- and gain-of-function studies using approaches such as CRISPR knock-out, CRISPR activation, or siRNA-mediated silencing will be essential to elucidate the causal contribution of key isoforms such as NTMT1-204, BLOC1S6-218, and DCTN1-205 to cellular phenotypes including proliferation, apoptosis, migration, and spheroid integrity. Furthermore, functional assays in 3-dimensional spheroid models and, ultimately, in vivo systems will help clarify whether these dysregulated isoforms act as oncogenic drivers, tumor suppressors, or modulators of the tumor microenvironment. Such studies will also be critical to determine their potential as biomarkers or therapeutic targets. Our current findings therefore provide a prioritized list of candidate isoforms and a strong rationale for future mechanistic investigations.
Conclusion
In summary, our transcriptomic analysis revealed substantial isoform-level dysregulation between malignant and non-malignant colon cell lines, echoing patterns observed in colon tumors and underscoring the role of alternative transcription initiation and splicing in cancer progression. Notably, transcript variants NTMT1-204, BLOC1S6-218, and DCTN1-205 exhibit distinct regulatory features and expression profiles, highlighting their potential as tumor biomarkers or therapeutic targets. While certain isoform switching events identified in patient-derived datasets were not fully recapitulated in cell line models, this divergence reinforces the importance of integrating in vitro and in vivo systems to capture the full spectrum of transcriptional alterations in cancer. Our findings emphasize the necessity of isoform-level analyses for advancing precision oncology and suggest that non-canonical transcript isoforms with potential regulatory function, including those with dual coding and non-coding potential, represent a rich yet underexplored layer of gene expression regulation in colon cancer. Further experimental validation of the identified dysregulated transcripts with proposed regulatory roles, particularly their interaction partners, subcellular localization, and functional impact on tumorigenic pathways, will be essential to elucidate their mechanistic roles in colon cancer and to assess their potential utility as biomarkers or therapeutic targets.
Supplemental Material
sj-doc-1-cix-10.1177_11769351251396250 – Supplemental material for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis
Supplemental material, sj-doc-1-cix-10.1177_11769351251396250 for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis by Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic and Aleksandra Nikolic in Cancer Informatics
Supplemental Material
sj-docx-2-cix-10.1177_11769351251396250 – Supplemental material for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis
Supplemental material, sj-docx-2-cix-10.1177_11769351251396250 for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis by Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic and Aleksandra Nikolic in Cancer Informatics
Supplemental Material
sj-xlsx-3-cix-10.1177_11769351251396250 – Supplemental material for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis
Supplemental material, sj-xlsx-3-cix-10.1177_11769351251396250 for Comparative RNA-Seq Analysis of Colon Spheroids and Patient-derived Tissues Identifies Non-Canonical Transcript Isoforms of Protein-Coding Genes Implicated in Colon Carcinogenesis by Tamara Babic, Bojana Banovic Djeri, Dunja Pavlovic, Sandra Dragicevic, Jovana Despotovic, Jelena Karanovic and Aleksandra Nikolic in Cancer Informatics
Footnotes
Acknowledgements
The authors used ChatGPT as AI tool for refining/enhancement of language. We confirm that no scientific data has been generated or modified using AI.
Ethical Considerations
Not applicable.
Consent to Participate
Not applicable.
Author Contributions
Tamara Babic: conceptualization, investigation, visualization, writing – original draft, writing – review & editing. Bojana Banovic Djeri: data curation, software, writing – review & editing. Dunja Pavlovic: investigation, visualization, writing – review & editing. Sandra Dragicevic: investigation, writing – review & editing. Jovana Despotovic: investigation, writing – review & editing. Jelena Karanovic: investigation, writing – review & editing. Aleksandra Nikolic: conceptualization, funding acquisition, project administration, supervision, writing – original draft, writing – review & editing.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Science Fund of the Republic of Serbia, PROMIS, #6052315, SENSOGENE and IMGGE Annual Research Program for 2025, Ministry of Science, Technological Development and Innovation of the Republic of Serbia, 451-03-136/2025-03/200042.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
