Abstract
Three decades since the Human Genome Project began, scientists have now identified more then 25,000 protein coding genes in the human genome. The vast majority of the protein coding genes (> 90%) are multi-exonic, with the coding DNA being interrupted by intronic sequences, which are removed from the pre-mRNA transcripts before being translated into proteins, a process called splicing maturation. Variations in this process, i.e. by exon skipping, intron retention, alternative 5’ splice site (5’ss), 3’ splice site (3’ss), or polyadenylation usage, lead to remarkable transcriptome and proteome diversity in human tissues. Given its critical biological importance, alternative splicing is tightly regulated in a tissue- and developmental stage-specific manner. The central nervous system and skeletal muscle are amongst the tissues with the highest number of differentially expressed alternative exons, revealing a remarkable degree of transcriptome complexity. It is therefore not surprising that splicing mis-regulation is causally associated with a myriad of neuromuscular diseases, including but not limited to amyotrophic lateral sclerosis (ALS), spinal muscular atrophy (SMA), Duchenne muscular dystrophy (DMD), and myotonic dystrophy type 1 and 2 (DM1, DM2). A gene’s transcript diversity has since become an integral and an important consideration for drug design, development and therapy. In this review, we will discuss transcript diversity in the context of neuromuscular diseases and current approaches to address splicing mis-regulation.
MECHANISMS OF TRANSCRIPT DIVERSITY REGULATION
Splicing is mediated by a dynamic small nuclear ribonucleoprotein machinery called the spliceosome [1, 2], which undergoes step-wise assembly, starting from recognition of the cis-element 5’ss and 3’ss on the pre-mRNA. Splice site choice is also modulated by additional splicing regulatory cis-elements throughout the pre-mRNA that dictate the recruitment of different trans-acting RNA binding splicing factors, known as exonic or intronic splicing enhancers and exonic or intronic splicing silencers, depending on whether they enhance or silence splicing activity. These consensus cis-element and trans-acting splicing factors act in a coordinated manner to orchestrate alternative splicing, accounting for the tissue and stage specific regulation of transcript expression. One well-studied example is the recruitment of muscleblind-like proteins (MBNL) to the “UGCU” motif to regulate alternative splicing events in muscle [3, 4]. Weyn-Vanhentenryck et al. reported temporal regulation of splice switching during neural development and maturation in different neuronal subtypes that involves coordinated regulations by RNA binding proteins including neuro-oncological ventral antigen protein (NOVA), RNA binding Fox-1 (RBFOX), MBNL and polypyrimidine tract binding protein (PTBP) [5]. The neural-specific SR-related protein of 100 kDa (nSR100/Srmm4) is also important in regulating alternative splicing events in genes involved in neural functions, as exemplified by abnormalities in the branching of motor neurons innervating the diaphragm, defects in the axonal midline crossing in the corpus callosum and altered cortical layers of the forebrain observed in nSR100 knockout mice [6]. Since the majority of splicing events occur co-transcriptionally, epigenetic factors that regulate the rate of transcription, transcription initiation and elongation can also modulate splicing. These include histone modifications and DNA methylation within the gene body or promoter regions that affect splicing factor recruitment and alter the rate of transcription elongation, thereby indirectly influencing splice site choice and recognition (Reviewed in [7, 8]). In addition, several RNA modifications have been reported to have a role in alternative RNA processing and alternative splicing, including the highly prevalent modifications pseudouridine (Ψ) and internal N6-methyladenosine (m6A) [9], through recruitment of RNA binding proteins specific to the modifications. m6A RNA methylation occurs at the N-6 position of the adenosine residue in the RRACH (R = A/G, H = A/C/U) consensus motif. Depletion of m6A writers [10, 11], erasers [12, 13] and readers [14, 15] have been shown to change the alternative splicing patterns in mammalian systems. Recently Martinez N et al. reported that pre-mRNA is pseudouridylated co-transcriptionally, with specific enrichment of Ψ near alternative splice sites, splicing regulatory elements and splicing factor binding motifs, and installation of a single Ψ is sufficient to alter the splicing outcome in vitro, suggesting regulatory potential of Ψ in alternative splicing [16]. Interestingly, similar to m6A, Ψ is also significantly enriched in 3’ UTRs of pre-mRNAs suggesting a likely role of both these modifications also in alternative polyadenylation [16–18]. Lastly, adenosine-to-inosine RNA editing mediated by Adenosine Deaminases Acting on RNA (ADAR) enzymes, also plays a role in RNA processing and splicing. This is supported by the findings that more than 95% of A-I RNA editing occurs co-transcriptionally in nascent RNAs prior to polyadenylation and splicing events. Modulation of RNA splicing events can occur via creation or elimination of splice sites and branch points by RNA editing or by altering RNA secondary structure which may affect the accessibility of splice sites or the ADAR proteins can promote or preclude binding of splicing machineries or splicing regulators to the RNA [19–22].
ALTERNATIVE SPLICING IN NEUROMUSCULAR DISEASES

The DMD gene encodes several dystrophin proteins (Dp) or isoforms, which are named based on the length in kilodaltons: Dp427 (c, cortical; m, muscle; p: purkinje cells), Dp260, Dp140, Dp116, Dp71. Alternative promoters are depicted by the arrows.
Studies indicate that 95% of human genes are alternatively spliced, representing a fundamental mechanism of spatiotemporal gene regulation [23, 24]. With the progressive maturation of the methods of detection of RNA isoforms, their contribution to human homeostasis is only starting to emerge [25]. Alternative splicing appears to be particularly prevalent in muscle and brain [26–29], where it plays a key role in numerous functions, including driving the process of development and aging [30–32]. In skeletal muscle, tropomyosin isoforms display different localization patterns along actomyosin bundles and are functionally non-redundant [33]. Transcription factor myocyte-specific enhancer factor 2D (Mef2D), a member of the Mef2 family and a key mediator of signal-dependent regulator of developmental processes such as differentiation [34], undergo a major isoform switch during myogenesis. Alternate use of mutually exclusive exons generates a muscle-specific isoform, Mef2Dα2, which is resistant to protein kinase A (PKA) phosphorylation, allowing transcriptional activation of pro-myogenic target genes in the presence of inhibitory PKA signalling [35, 36]. Alterations of tissue-specific transcript isoforms can account for the phenotypic variability observed in many neuromuscular conditions. The dystrophin (DMD) gene contains at least seven tissue-specific promoters and two alternative polyadenylation sites, producing several tissue- and developmental stage-specific transcripts, such as the Dp427 variants, Dp427c and Dp427p, which are predominantly expressed in neurons [37] and the shorter Dp140 isoform, which is predominantly expressed during foetal life stages across the brain [38]. DMD mutations resulting in absent/non-functional dystrophin protein determine a severe phenotype mainly characterised by progressive muscle wasting and weakness, and is frequently also associated with cognitive delay and autism spectrum disorders (ASD) [39, 40]. The risk of cognitive impairment has been linked with the presence of the DMD mutations post-intron 44, which affects not only the full-length isoform, but also the shorter neuronal isoforms, with a pattern of worse cognitive performances on all neuropsychological tests in these patients [41–43] (Fig. 1). Consistently, restoration of Dp140 via mRNA-mediated overexpression improves ASD-like behaviour in a dystrophic mouse model with a mutation in exon 52 [44], further confirming the importance of these neuronal isoforms in the pathophysiology of DMD. Several variants associated with Congenital Myasthenic Syndrome (CMS) fall in the exonic/intronic splicing regions of genes essential for neuromuscular synaptogenesis, such as cholinergic receptor nicotinic epsilon subunit (CHRNE), docking protein 7 (DOK7), and receptor associated protein of the synapse (RAPSN), compromising the binding affinity for trans-acting proteins [45–48], and overall highlighting the important role of splicing for neuromuscular junction formation, maintenance and function. For instance, the IVS3-8G>A change in CHRNA1, a gene encoding the muscle nicotinic acetylcholine receptor α subunit, disrupts an intronic splicing silencer and results in exclusive inclusion of the downstream P3A exon, generating an acetylcholine receptor (AChR) subunit that fails to be incorporated at the motor end plate [49]. Similarly, alterations in trans-acting factors, such as serine arginine-rich (SR) proteins and heterogeneous nuclear ribonucleoproteins (hnRNPs), chromatin landscape and RNA structure can also result in neuromuscular diseases by global perturbation of alternative splicing patterns [50–61]. In DM1 expanded microsatellite repeats in the DM1 protein kinase (DMPK1) gene lead to the formation of RNA secondary structures that sequesters RNA binding proteins, including the splicing regulator MBNL, altering the alternative splicing pattern regulation of key genes in skeletal muscles and other affected tissues [62]. ALS-causing mutations in several RNA binding proteins, including Transactive-response DNA-binding Protein, 43 kDa (TDP-43) and Fused in Sarcoma (FUS), have been causally linked with aberrant alternative splicing in diseased motor neurons [63–70]. Loss-of nuclear TDP-43 function in FTD (frontaltemporal dementia)/ALS has been reported to induce inclusion of cryptic exons, leading to nonsense-mediated delay and loss of Unc-13 Homolog A (UNC13A) and Stathmin-2 (STMN2) protein that are critical for synapse function [71, 72]. Intron retention, a dominant feature of splicing programming occurring during early motor neuron differentiation, occurs prematurely in iPS-derived motor neurons from ALS patients, with Splicing Factor Proline and Glutamine rich (SFPQ) being the most significant intron-retaining transcript across several ALS-causing mutations and representing a hallmark of both familiar and sporadic ALS [73]. Overall these examples highlight the critical importance of alternative isoform regulation in both healthy and diseased motor units.
SPLICING MODULATION AS A THERAPEUTIC TARGET FOR NEUROMUSCULAR DISEASES
Strategies to manipulate RNA splicing have gained traction in the recent years, resulting in several approved drugs and many more showing promising results at preclinical stages in the field of neuromuscular diseases and beyond [74–78]. Therapeutic modulation of RNA splicing has been proposed for two ranges of applications: 1) to correct mutation-induced abnormal splicing patterns, and 2) to selectively modulate isoform expression. One such strategy entails the use of splice switching oligonucleotides (SSOs) to promote exon inclusion or exon skipping by blocking the RNA-RNA base pairing or protein-RNA binding interactions, in order to modulate the ratio of splicing variants or correct splicing defects. To date, five SSOs have been approved for clinical use for neuromuscular diseases, including eteplirsen [79–81], golordirsen [82], viltolarsen [83, 84] and casimersen [85] for DMD and nusinersen [86, 87] for SMA. In the case of DMD, SSOs are designed to skip one or multiple mutation-containing exons to restore the reading frame of dystrophin transcripts, giving rise to truncated but functional dystrophin protein [88], while for SMA, SSOs promote inclusion of exon 7 in survival of motor neuron 2 (SMN2) transcripts, thereby rescuing the expression of full length SMN protein [87, 89–91], which is lacking in this disease. These oligonucleotides are chemically modified not only to increase their stability and affinity to target RNA and to protect them from nuclease activity, but also to prevent activation of RNase H degradation mechanisms and to allow access to the target pre-mRNAs located in the nuclei of cells [92, 93]. In addition to SSOs, other oligonucleotide therapies are also being developed to restore activity of RNA binding splicing factors. Two antisense oligonucleotide (ASO)-based strategies have been proposed to treat DM1: Steric block strategy using ASO to block the binding of MBNL1 to the hairpins, and an RNase H-active ASO that targets the CUG-expanded transcripts for degradation. Both approaches release the sequestered MBNL1 and restore splicing regulatory activity of MBNL1, having shown promise when tested in DM1 preclinical models [94–100]. ASOs have also been used to redirect the usage of alternative translation start site and alternative polyadenylation site, thereby altering the expression of transcript isoforms [101, 102]. Splicing modulation can also be achieved by small molecules, whose discovery has been greatly accelerated by the development of high-throughput screening techniques and in silico splice site prediction tools [103–106]. Risdiplam (PTC-Roche) is the first small molecule splicing modifier approved by the U.S. Food and Drug Administration in 2020 for SMA [107, 108]. While the exact mechanism of action of risdiplam is not yet completely understood, studies suggested that this class of compounds bind the exonic splicing enhancer 2 and 5’ss of SMN2 exon 7 pre-mRNA, ultimately leading to exon 7 inclusion in SMN2 transcript and increased expression of functional full length SMN protein [106, 109]. PK4C9, another RNA splicing modulator that enhances SMN2 exon 7 splicing inclusion by binding to and remodelling the stem-loop RNA structure terminal stem-loop 2 on the 5’ss of SMN2 exon 7 to improve accessibility of the 5’ss to splicing factors [110], is currently under preclinical investigations. RNA splice modulating small molecules have also been used to reduce the production of toxic proteins by inducing a pseudoexon inclusion containing a premature stop codon [111, 112]. For DMD, a strategy deploying Clustered Regularly-Interspaced Short Palindromic Repeats (CRISPR)/Cas to restore dystrophin expression levels in preclinical models have been tested with various levels of success. Olson and colleagues systemically delivered two adeno-associated virus seroptype 9, one encoding Streptococcus pyogenes Cas9 (SpCas9) driven by muscle-specific creatine kinase promoter and another expressing sgRNA targeting a region adjacent to the exon 51 splice acceptor site, to 1-month-old puppies. This strategy generated a single cut at the target site and created various INDELs, in particular, a single nucleotide adenosine (A) insertion immediately 3’ to the Cas9 cut site that resulted in restoration of the dystrophin open reading frame and dystrophin protein expression of up to 70% of wild type levels in skeletal muscles and 92% in heart [113]. In a separate study, Kupatt et al. delivered two nanoparticle coated adeno-associated virus seroptype 9 each carrying half of Cas9 fused to a split-intein moiety that self-assemble upon expression and a pair of single guide RNA targeting sequences flanking exon 51 to 10–14 year old piglets. A high dose treatment (4×1014 vp/kg) restored dystrophin protein expression, 54% and 34%, in quadriceps and diaphragm, respectively, at ∼70 days post treatment, as well as improved muscle and cardiac function [114]. CRISPR/Cas strategies relying on the repair of the double-stranded DNA break, however, may cause unwanted large deletion and in some cases, DNA rearrangement. To circumvent this, two new classes of genome editing technology, termed ‘base editor’ and ‘prime editor’[115], were developed. Base editor, through the fusion of Cas9 nickase with nucleobase deaminases (cytidine or adenine deaminase), catalyses the conversion of one base to another (adenine base editors ABEs: A-to-G or cytosine base editors CBEs: C-to-T), therefore, directly and precisely create point mutations into the DNA without making double-stranded breaks. There has been substantial interest in using the CBEs and ABEs to modulate splicing to promote skipping of exons bearing pathogenic mutations [116, 117] or to induce functional alternative splicing patterns [118]. ABEs, in particular, show great promises since nearly half of the human disease-causing point mutations are G-to-A or C-to-T. Using mouse models of DMD carrying a nonsense point mutation in DMD gene, Ryu S. M. et al. and Xu L. et al. successfully corrected the genetic mutations using ABEs and observed widespread dystrophin rescue and functional improvement in dystrophic mice [119, 120]. Base editors have also been applied to target splice sites for gene knockout. Interestingly, compared to base editor-mediated premature STOP induction, targeting splice sites, in particular the splice donors, produces more robust gene disruption, reflecting the critical role of splice sites in controlling gene splicing and gene expression [121]. While base editors are limited to transitions of A:T to G:C or C:G to T:A, prime editing that relies on a catalytically impaired Cas9 endonucleases fused to an engineered reverse transcriptase and a guide RNA specifying the target site and the desired edit, theoretically, can be used for any type of splice corrections. Using prime editors to reframe the open reading frames, Chemello F et al. restored dystrophin expression and corrected contractile abnormalities in human DMD cardiomyocytes [122]. While these new generation editors exhibit huge potential for splicing correction following a single treatment, the large size of the base editing and prime editing construct precludes single-vector adeno-associated virus packaging. Delivery strategies, longevity of the rescue and potential consequences of persistent in vivo expression of the genome editors are questions still to be addressed and investigated.
SELECTIVE TRANSCRIPT ISOFORM MODULATION
Isoforms often exhibit complementary, unique or even opposing functions to the canonical variant, representing a largely unexplored area of therapeutic opportunity. A role of isoforms has been widely demonstrated in tumorigenesis [123–129], neurological disorders [130, 131] and viral infections [132]. In several types of cancer, toxic isoforms arise as a result of aberrant proteolytic processes when selective pressure is exerted by therapy, forming a pool of escape variants. Targeted degradation of the toxic isoforms via RNA or protein targeting strategies may therefore improve treatment sensitivity and disease prognosis [129, 133, 134]. Overexpression of therapeutically beneficial isoforms have been proposed for treatment of many neuromuscular conditions. In ALS, gene therapy overexpressing trophic factor Neuregulin 1 isoform 1 (NRG1-I) in the skeletal muscles and/or Neuregulin 1 isoform 3 (NRG1-III) in the central nervous system are effective in preserving motor neuron functions [135, 136]. Recently, we showed that specific overexpression of a naturally occurring dominant negative isoform of androgen receptor (AR isoform 2) ameliorates the disease phenotype in a mouse model of spinal and bulbar muscular atrophy, by modulating the activity of the disease causing mutant androgen receptor protein [137]. Another gene therapy using a recombinant adeno-associated virus serotype 1 to deliver follistatin alternatively spliced isoform FS344, to avoid potential binding to off target sites, is currently under clinical investigation for a milder form of dystrophin deficient muscular dystrophy, Becker muscular dystrophy [138]. This strategy is also under investigation for other indications including sporadic inclusion body myositis [139], facioscapulohumeral muscular dystrophy (FSHD) and as a combinatorial therapy for DMD [140].
COMPUTATIONAL TOOLS FOR ISOFORM QUANTIFICATION
Traditional methods for transcriptome-wide identification of alternative splicing events rely on sequencing technologies producing reads ranging from 50 to 150 bp in length, which, combined with various computational tools, have already allowed the identification of thousands of transcript isoforms in human tissues [141, 142]. The methods for isoform analysis can be broadly divided based on whether they utilise a genome reference approach or a de novo assembly one [143], and whether the alternative splicing is estimated from the isoform quantification or directly from exon inclusion ratios [144], with exon-based approaches being generally more sensitive for known transcripts. Many of these tools, such as Cufflinks [145], StringTie [146], systems-level interactive data exploration (SLIDE) [147], and IsoLasso [148] are able to perform transcript discovery based on existing annotations, nevertheless the results are often inaccurate and contradictory, mainly because the detection is biased by the process itself [149, 150]. The use of de novo transcript assembly packages such as Trinity [151], Trans-AbySS [152], and Oases [153] may help mitigate the issue but such methods have failed to provide a complete assessment of isoform diversity in human tissues. Recently developed long-read sequencing technologies, such as single molecule real time sequencing from Pacific Biosciences and Oxford Nanopore Technologies, remove the challenging task of reconstructing transcript isoforms from fragmented short reads and therefore hold great potential in improving our understanding of the plethora of alternatively spliced isoforms in human and non-human tissues [154, 155]. Various computational programs have been developed to produce high-confidence isoforms from long-read sequencing data, such as full-length alternative isoform analysis of RNA (FLAIR) [156], full-length analysis of mutations and splicing (FLAMES) [157, 158], Structural and Quality Annotation of Novel Transcript Isoforms (SQANTI) [159], and Technology-Agnostic Long-Read Analysis (TALON) [160]. These methods have been employed to identify novel alternatively spliced isoforms [155, 161–164] and to characterise changes in isoform profiles upon disease state [156, 165]. Expanding on these advancements, a number of studies have successfully coupled long-read sequencing with targeted RNA capture, accelerating annotation of lowly expressed genes including long non-coding RNAs [161, 162] and allowing deep profiling of tissue-specific isoforms [155, 166, 167]. Recently, a long-read RNA sequencing approach enabled identification of differential exon usage and phasing of structural genes producing large transcripts in cardiac muscle and fast and slow skeletal muscles, that could have direct effect on the interpretation of clinical sequencing data [168]. As the throughput and accuracy of current long-read sequencing platforms improve at fast pace and more sophisticated data analysis pipelines are generated, this technology is rapidly maturing for deployment for single-cell resolution and spatially-resolved transcriptomic applications, therefore allowing deep characterization of isoform diversity in human tissues with further level of complexity and accuracy.
CONCLUSIONS
It has been more than four decades since the concept of alternative splicing was first proposed [169]. Alternative splicing is a highly regulated and sophisticated process of gene regulation that ensures proteome plasticity and diversity necessary for biological functions. However, it is prone to errors, which lead to a wide range of human diseases. The motor unit has proven to be heavily reliant on correct splicing regulation and therefore appears to be exceptionally sensitive to its perturbations. With the advent of new high throughput sequencing technologies and bioinformatics tools, our understanding of alternative splicing is rapidly advancing, with further opportunities for further characterization at single-cell and spatial resolution starting to loom. Undoubtedly, this body of knowledge is quickly showing that RNA diversity is to be accounted for when embarking in any scientific endeavours, all the way from the choice of the appropriate disease model to employ, to mechanistic understanding of physiological and pathological processes, and design and development of treatment strategies that can make a difference for patients: It is time for the modern biomedical scientists to embrace it.
