Recent Progress in Sequence Optimization of mRNA Vaccine: Biological Mechanism,Quantitative Metrics,and Computational Model

Abstract

As a groundbreaking advancement in vaccinology, messenger RNA (mRNA) vaccines have transformed the field by offering rapid, flexible, and scalable solutions for combating infectious diseases. However, the efficacy, stability, and immunogenicity of mRNA vaccines are highly dependent on the optimization of their sequences. Recent progress in synthetic biology and computational methods has enabled the optimization of mRNA sequences to enhance their properties, holding the promise to provide deeper insights into the design principles of effective mRNA vaccines. However, it remains a major challenge to determine how to best optimize mRNA sequences for diverse biological contexts and therapeutic applications. In this review, we provide an in-depth analysis of the current advancements in optimizing mRNA vaccine sequences, put forward a comprehensive overview of the latest computational and biological approaches in this field, with a particular focus on the biological mechanisms underlying mRNA translation efficiency and stability, highlighting several quantitative indicators that may affect vaccines’ performance, and summarize some methods to optimize mRNA vaccine by algorithms. We also propose the limitations of current models and the need for further research to address the complexity of biological systems.

Keywords

mRNA vaccine sequence optimization 5′ UTR deep learning mean ribosome load

Introduction

The advent of messenger RNA (mRNA) vaccines has revolutionized the field of medical therapeutics, offering a versatile and rapidly deployable solution for addressing a wide range of diseases. mRNA vaccines function by delivering genetic instructions to host cells, prompting the production of specific antigens that elicit a targeted immune response.¹ This innovative approach has garnered significant attention due to its flexibility, scalability, and rapid development timeline.² The successful deployment of COVID-19 mRNA vaccines has demonstrated the immense potential of this technology, highlighting its ability to rapidly respond to emerging pathogens. Beyond infectious diseases, mRNA vaccines are also being explored for cancer immunotherapy, personalized medicine, and the treatment of chronic conditions.^3,4 Their capability to induce robust humoral and cellular immune responses makes mRNA a powerful tool in both prophylactic vaccine and therapeutic vaccine development. However, the efficacy and stability of mRNA vaccines are highly dependent on the optimization of mRNA sequences, which directly impacts antigen production and immune response.⁵

Recent advancements in sequence optimization have markedly enhanced the performance of mRNA vaccines. The untranslated regions (UTRs) and coding sequence (CDS) are critical elements that regulate translation efficiency (TE) and mRNA stability.⁶ Advances in understanding the biological mechanisms underlying mRNA translation and stability have led to the development of quantitative metrics such as mean ribosome load (MRL), codon adaptation index (CAI), and GC content. These metrics provide valuable insights into optimizing mRNA sequences for enhanced performance. For instance, optimizing the 5′UTR sequence can significantly enhance MRL, a key metric for TE. In addition, chemical modified nucleotide such as pseudouridine (Ψ) and 1-methyl-pseudouridine (m1Ψ) have been shown to improve mRNA TE and stability and reduce immunogenicity.⁷ The nucleotide modifications, along with optimized codon usage and secondary-structure refinement, have been demonstrated to enhance protein expression and overall vaccine efficacy. Despite these achievements, the complexity of biological systems and the variability in cellular environments present ongoing challenges for sequence optimization.

Concurrently, the rapid advancement of artificial intelligence (AI) and machine learning (ML) has garnered significant attention, offering new solutions to address these challenges and transforming numerous fields, which includes biotechnology and medicine.⁸ AI algorithms, particularly deep learning models, have shown remarkable capabilities in handling complex biological data and predicting outcomes with high accuracy. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been particularly effective in analyzing large datasets and identifying patterns that are not easily perceptible through traditional methods.³ The ability of these algorithms to process and integrate vast amounts of data has made them invaluable tools for understanding and optimizing biological sequences in biologically meaningful manner. In the context of mRNA vaccines, AI algorithms can predict ribosome loading, mRNA stability, and TE, thereby facilitating the design of mRNA sequences with improved performance. The integration of AI with experimental biology has opened new avenues for optimizing mRNA vaccines, offering a more systematic and efficient approach to sequence design.⁹

These theoretical advancements have been concretely validated by AI models, greatly showing that the integrative potential of AI in optimizing biological systems remains vast. AI models, such as Optimus 5-Prime, have demonstrated the ability to accurately predict ribosome loading and mRNA stability, outperforming traditional optimization methods.¹⁰ The use of AI algorithms offers several advantages, including the ability to handle complex biological systems, identify subtle sequence variations, and provide rapid and accurate predictions.⁸ In addition, AI models can be trained on diverse datasets, enabling them to generalize across different cell types and condition. This flexibility is crucial for developing mRNA vaccines that are effective in various biological contexts.

This review aims to provide a comprehensive summary of the recent progress in sequence optimization of mRNA vaccines, focusing on the integration of AI algorithms with biological mechanisms and quantitative metrics. We have reviewed numerous studies that utilize AI models to facilitate mRNA sequences design or optimization, highlighting their achievements and limitations. By summarizing and synthesizing these studies, we aim to provide a clear understanding of the current state of the field and identify future directions for research. This review underscores the importance of AI in optimizing mRNA vaccine sequences and highlights the potential of these algorithms in enhancing vaccine efficacy and stability. We hope that this comprehensive analysis will serve as a valuable resource for researchers and developers working in the field of mRNA vaccines, guiding future efforts in sequence optimization and vaccine design.

Biological Mechanisms

There are several biological mechanisms that can potentially affect the effectiveness of mRNA vaccine. And they are discussed in the following sections:

Untranslated regions

UTRs are non-CDS flanking the protein-coding region of mRNA and are integral to post-transcriptional regulation.¹¹ Located at the 5′ and 3′ ends of the mRNA, these regions do not encode amino acids but contain critical regulatory elements that govern TE, mRNA stability, and subcellular localization. The 5′ UTR, preceding the start codon, facilitates ribosome recruitment and modulates translation initiation through secondary structures or upstream open reading frames (uORFs). The 3′ UTR, following the stop codon, harbors motifs such as microRNA (miRNA) binding sites, AU-rich elements (AREs), and polyadenylation signals, which influence mRNA decay, translational repression, and interactions with RNA-binding proteins. UTRs thus act as dynamic platforms for coordinating gene expression in response to cellular signals, developmental cues, or stress, underscoring their importance in both normal physiology and disease. Computational optimization of UTRs integrates sequence and structure features—such as local secondary structure, start-codon accessibility, upstream AUGs and uORFs, known regulatory motifs, and RNA–protein binding predictions—using thermodynamic models and machine-learning approaches to design UTR variants that maximize TE and stability under defined cellular contexts while avoiding inhibitory elements.

Upstream ORF

uORFs are short sequences in mRNA 5′UTRs that transiently initiate translation, primarily suppressing main ORF expression by sequestering ribosomes or inducing premature termination.¹² They reduce TE by 30%–80%, depending on Kozak sequence strength, uORF length (optimal ⩽ 30 codons for ribosome reinitiation), and termination codon proximity. Under stress, certain uORFs enhance translation. Plant uORFs often trigger nonsense-mediated decay (NMD), while human uORFs rarely activate this pathway. Optimization of uORFs involves identifying and scoring uORF features using tools like uORFscan and then redesigning the 5′ UTR to remove inhibitory uORFs, weaken their initiation signals, or engineer conditional uORFs.

Translation initiation factor binding

Translation eukaryotic initiation factors (eIFs) are pivotal for ribosome recruitment and decoding mRNA into proteins.¹³ The eIF4F complex—comprising eIF4E (cap-binding protein), eIF4A (RNA helicase), and eIF4G (scaffold protein)—binds the 5′ cap (m7G) of mRNA to unwind secondary structures and assemble the 48S preinitiation complex. eIF4E specifically recognizes the cap structure, while eIF4A resolves inhibitory RNA folds in the 5′UTR to facilitate ribosome scanning. Poly(A)-binding proteins (PABPs) at the 3′ tail further enhance initiation by bridging eIF4G and forming a closed-loop mRNA structure, synergistically boosting TE. Computational optimization targets these determinants by modeling eIF–mRNA interactions using features such as start-codon accessibility, local secondary-structure stability, Kozak context strength, eIF4F recruitment propensity, and scanning barrier scores, and then redesigning the 5′ UTR and proximal CDS to maximize predicted initiation efficiency while avoiding inhibitory structures or competing elements such as strong uORFs.

5′ terminal oligopyrimidine tract (5′TOP)

The 5′TOP is a critical regulatory element found in the 5′ untranslated regions (5′ UTRs) of mRNAs encoding ribosomal proteins and translation factors.¹⁴ This motif, characterized by a cytosine at position + 1 followed by a sequence of 4 to 15 pyrimidines, plays a pivotal role in translational control, particularly in response to cellular growth signals and nutrient availability. The 5′TOP motif is known to mediate growth-dependent translational regulation, with its activity modulated by pathways such as the mechanistic target of rapamycin (mTOR) signaling cascade, which is sensitive to rapamycin inhibition. In mRNA design, optimization of the 5′TOP feature involves computational tuning of the length, pyrimidine composition, and structural context of the 5′ terminal motif, combined with modeling of mTOR responsiveness, cap-proximal secondary structure, and ribosome accessibility, to either enhance conditional translational control or deliberately avoid TOP-like elements for constitutive expression, depending on the therapeutic or experimental objective.

Kozak sequence

The Kozak sequence is a conserved nucleotide motif (5′-GCC(A/G)CCAUGG-3′) flanking the start codon (AUG) in eukaryotic mRNA, critical for efficient translation initiation.¹⁵ Discovered by Marilyn Kozak in the 1980s, its −3 position (A/G) and +4 position (G) are key determinants: a purine at −3 enhances ribosomal recognition by 4-fold, while a pyrimidine (C/T) reduces initiation efficiency and increases sensitivity to adjacent sequence variations. This sequence acts as a ribosomal “landing pad,” directing the 40S subunit to unwind 5′UTR secondary structures via eIF4A helicase activity and position AUG in the ribosomal P-site for accurate methionine-tRNA loading. Optimization of the Kozak sequence involves engineering Kozak variants that maximize TE without introducing unintended upstream initiation.

MicroRNA binding

miRNAs are short non-coding RNAs (~22 nucleotides) that post-transcriptionally regulate gene expression by binding to complementary sequences in target mRNA 3′ untranslated regions (3′UTRs), triggering translational repression or mRNA degradation.¹⁶ This interaction relies on a “seed sequence” (nucleotides 2–8 of the miRNA) pairing with mRNA targets, often amplified by auxiliary binding sites in coding regions. For example, miR-155 and miR-150 modulate immune responses by targeting transcripts encoding cytokines (eg, IL-12) and co-stimulatory molecules (eg, CD40) in dendritic cells (DCs). In mRNA vaccine design, optimization involves computational identification and elimination of high-affinity miRNA binding sites, evaluation of site accessibility using RNA folding models, and incorporation of synonymous mutations or UTR redesign to reduce unintended miRNA interactions, thereby enhancing stability and translation while preserving protein-CDS.

CDS: sequence and structure

The CDS is the central functional region of mRNA, spanning from the start codon (AUG) to the stop codon, and directly determines the amino acid sequence of the translated protein.¹⁷ Beyond this primary role, the nucleotide composition and structural features of the CDS itself are critical determinants of mRNA stability, TE, and even immunogenicity. Optimization of the CDS is therefore a cornerstone of mRNA vaccine design. Computational optimization of the CDS typically involves synonymous recoding to balance codon usage and RNA structure, using metrics such as CAI/tAI, minimum free energy (MFE) or base-pairing probability profiles, codon pair bias, and ribosome accessibility windows, often guided by multi-objective optimization or machine-learning models to maximize TE while preserving protein sequence and regulatory constraints.

Preferred codon usage

Codon usage which refers to the choice of codon is a key factor in optimizing mRNA sequences.¹⁸ Codons that are more frequently used in the target organism can enhance TE without altering the protein sequence. For example, replacing rarely used codons with more frequently occurring ones can increase protein expression levels. This optimization strategy helps ensure that the antigen is produced at optimal levels, which is critical for eliciting a strong immune response. However, it must be used judiciously, as some rare codons may be necessary for proper protein folding. Codon usage optimization is quantitatively described using CAI, tAI, relative synonymous codon usage (RSCU), codon pair bias scores, codon harmonization metrics, and host-specific codon frequency distributions that are directly computable from sequence data.

GC content

The percentage of guanine (G) and cytosine (C) nucleotides in a nucleic acid sequence—profoundly influences mRNA stability, secondary structure, and translational efficiency.¹⁸ Elevated GC content (>50%) strengthens mRNA stability through enhanced base-pairing (three hydrogen bonds for G:C vs two for A:T), reducing degradation by nucleases. However, excessive GC-rich regions (>70%) risk forming stable secondary structures (eg, hairpins), which impede ribosome scanning and reduce TE. Uracil (U) is often replaced with guanine (G) or cytosine (C) to reduce the likelihood of mRNA degradation by nucleases. In addition, optimizing AGC content can help in reducing the immunogenicity of the mRNA, which is important for minimizing adverse immune responses. Computational optimization of GC content is performed by recoding synonymous codons to tune local and global GC levels while monitoring other quantitative metrics, enabling systematic identification of GC profiles that maximize translational output without inducing excessive secondary structure.

Secondary structure

The secondary structure of mRNA refers to the specific folding patterns formed by the mRNA molecule through intramolecular base-pairing interactions.¹⁹ These structures, such as hairpin loops and stem-loops, are stabilized by hydrogen bonds between complementary nucleotides. More stable global structures can protect the mRNA from enzymatic degradation by ribonucleases, thereby increasing its intracellular half-life. However, strong stem-loops or hairpins within CDS can cause ribosomes to pause or even dissociate during elongation, leading to truncated protein products or reduced overall protein synthesis. Optimization of secondary structure focuses on structure-aware metrics—such as MFE, base-pairing probability, accessibility of the 5′ UTR and start codon, and local folding profiles—often optimized in a position-specific and windowed manner rather than globally.

G-quadruplex

G-quadruplexes (G4s) are non-canonical nucleic acid structures formed by guanine-rich sequences, where four guanines arrange into planar tetrads stabilized by Hoogsteen hydrogen bonds and monovalent cations (eg, K⁺ or Na⁺).²⁰ These structures exhibit functional duality: in mRNA 5′UTRs, they often act as translational roadblocks by stalling ribosome scanning. Computational tools can be used to predict and eliminate potential G-quadruplex-forming sequences, thereby optimizing the mRNA sequence for enhanced TE and stability.

Chemical modification

Chemical modifications of mRNA are pivotal for enhancing stability, reducing immunogenicity, and improving translational efficiency.²¹ Unmodified ssRNA and dsRNA contaminants in IVT mRNA can be recognized by cellular pattern-recognition receptors (PRRs) like Toll-like receptors (TLRs), RIG-I-like receptors (RLRs), and protein kinase R (PKR), triggering the production of type I interferons and other pro-inflammatory cytokines. Key strategies include nucleotide substitutions (eg, pseudouridine, N1-methylpseudouridine (m1Ψ), 5-methylcytidine (m5 C)), which minimize recognition by innate immune sensors and suppress interferon responses. Chemical modifications in mRNA (such as pseudouridine, N¹-methylpseudouridine, m⁶A, or 5-methylcytidine) alter base-pairing energetics, stacking interactions, and local flexibility, thereby reshaping secondary structure by stabilizing or destabilizing specific stems and loops and changing folding kinetics. As a result, modified mRNAs often adopt structural ensembles that differ from their unmodified counterparts. Secondary-structure prediction for chemically modified mRNA is therefore difficult because most algorithms and energy models are trained on unmodified nucleotides, lack modification-specific free-energy parameters, and cannot accurately account for heterogeneous, position-specific modifications or altered co-transcriptional folding behavior.

Internal ribosome entry sites

Internal ribosome entry sites (IRESs) are structured RNA elements in mRNA that enable cap-independent translation initiation.²² IRES elements can directly recruit the 40S ribosomal subunit to an internal location on the mRNA, via conserved secondary/tertiary structures and interactions with trans-acting factors like polypyrimidine tract-binding protein (PTB) or eIFs, often in close proximity to the translation start codon. IRESs are crucial for the translation of circular RNAs (circRNAs), which inherently lack the 5′ cap structure required for canonical, cap-dependent translation initiation. IRES function is tightly coupled to higher-order RNA secondary and tertiary structures and to specific sequence–structure motifs that control ribosome landing, initiation efficiency, and cell-type specificity.

The Biological mechanisms affecting mRNA vaccine design are summarized in Figure 1.

Figure 1.

Biological mechanisms affecting mRNA vaccine design.

Quantitative Metrics

The design and refinement of mRNA sequences for vaccines and therapeutics rely heavily on quantitative metrics that serve as proxies for desirable biological outcomes. These metrics allow computational models to evaluate and compare different sequence candidates, guiding the optimization process toward enhanced stability, translational efficiency, and reduced immunogenicity. Understanding these metrics is crucial for interpreting the outputs of design algorithms and for appreciating the multifaceted nature of mRNA optimization.

Minimum free energy

MFE is a thermodynamic measure representing the stability of the most stable secondary structure an RNA molecule is predicted to adopt. It is expressed in kcal/mol, with a lower (more negative) MFE value indicating a more stable and more extensively base-paired structure, as more energy would be required to unfold it. The principle underlying MFE-based optimization is that more stable overall mRNA structures are generally more resistant to enzymatic degradation within the cell, thus correlating with increased mRNA half-life.²³ Consequently, MFE is a primary target for many structural optimization algorithms, such as CDSfold²⁴ MFE is typically calculated using dynamic programming algorithms, like those implemented in widely used software packages such as RNAfold (part of the ViennaRNA package) or mfold, which employ nearest-neighbor thermodynamic parameters to estimate the energy contributions of various structural motifs (eg, stems, loops, bulges).²⁵

While a lower MFE is often sought, it is important to recognize its limitations. MFE represents a global property of the most stable predicted structure and does not fully capture the dynamic ensemble of structures an RNA molecule might adopt in vivo. Furthermore, excessive structural stability, particularly in regions critical for translation (like the start codon or ribosome binding site), can actually impede ribosome scanning or elongation, thereby reducing protein expression. Overly stable, long dsRNA regions can also trigger innate immune responses. Thus, MFE optimization must be balanced with other functional considerations.²⁶

Codon Adaptation Index

The CAI is a widely used metric to quantify the extent to which the codon usage in a given gene conforms to the codon usage bias observed in a reference set of highly expressed genes from a specific organism or cell type.²⁷ The CAI value ranges from 0 to 1, where a value of 1.0 indicates that the gene exclusively uses the most frequently occurring synonymous codons for each amino acid. CAl is calculated using the following equation:

C A I = {(\prod_{i = 1}^{N} w_{i})}^{\frac{1}{N}}

(1)

where $N$ is the total number of codons in the sequence, and $w_{i}$ is the relative adaptiveness of each codon computed as:

w = \frac{f_{i}}{A_{f_m a x}}

(2)

where f and $A_{f_m a x}$ are the number of occurrences of synonymous codons and the number of occurrences for the most frequently used codon in the amino acid of codon $i$ , respectively. A higher CAI is generally correlated with higher rates of translation elongation and, consequently, increased protein expression levels.²⁷ This is based on the hypothesis that preferred codons are recognized by more abundant tRNA species, allowing for faster and more efficient decoding by the ribosome. CAI is calculated as the geometric mean of the relative adaptiveness values (weights) of all codons in the CDS, where the relative adaptiveness of a codon is its frequency relative to the most frequent synonymous codon for the same amino acid in the reference set.²⁸

Codon optimization based on maximizing CAI has been a standard strategy in recombinant protein production and mRNA therapeutic design.²⁹ However, solely maximizing CAI can have drawbacks. It may lead to the creation of unintended RNA secondary structures or alter GC content suboptimally. Moreover, extremely rapid translation might compromise the correct co-translational folding of some proteins.³⁰ In addition, standard CAI calculations often use genome-wide codon usage tables, which may not reflect tissue-specific or condition-dependent variations in tRNA availability.³¹ Therefore, CAI is often used as one of several objectives in multi-parameter optimization algorithms.

Mean ribosome load

MRL is a measure of the average number of ribosomes actively translating a given mRNA molecule at a specific point in time. It is often determined experimentally using techniques like ribosome profiling (Ribo-seq), which involves sequencing the mRNA fragments protected by ribosome.³² A higher MRL generally indicates a higher rate of translation initiation and/or efficient elongation, leading to increased protein synthesis from that mRNA species. MRL can serve as a more direct readout of translational activity than purely sequence-derived metrics like CAI. Computational models, such as the UTR-LM, are being developed to predict MRL based on sequence features, particularly those within the 5′ UTR, thereby providing an indirect quantification of TE to guide UTR optimization.³³

mRNA half-life (t1/2)

The mRNA half-life (t1/2) is the time it takes for 50% of a specific mRNA population in a cell to be degraded. It is a critical determinant of the overall amount and duration of protein expression from an mRNA therapeutic or vaccine.³⁴ A longer half-life provides more opportunities for the mRNA to be translated, thus increasing protein yield. mRNA half-life is influenced by a multitude of factors, including:

Sequence elements within the UTRs (eg, stabilizing elements or destabilizing AREs in the 3′ UTR.³⁵

The length and integrity of the poly(A) tail³⁶ (segmented poly(A) tails or circularization).

Codon optimality within the CDS.¹⁸

The overall secondary structure and GC content of the mRNA.³⁷

The presence of chemical modifications (eg, m1Ψ).³⁸

Unpaired probability/average unpaired probability

The unpaired probability of a nucleotide refers to the likelihood that this specific nucleotide is in a single-stranded (unpaired) state when considering the ensemble of all possible secondary structures the RNA molecule can adopt. The average unpaired probability (AUP) is the sum of these unpaired probabilities across all nucleotides in the sequence, often normalized by the length of the RNA.³⁷ The rationale behind using AUP as an optimization metric is that single-stranded regions of RNA are generally more susceptible to hydrolytic cleavage and enzymatic degradation than base-paired regions. Therefore, a lower AUP, indicating a more structured RNA with fewer accessible unpaired regions, is hypothesized to correlate with increased mRNA stability and a longer half-life.²⁹ Algorithms like RiboTree aim to optimize sequences by reducing AUP, thereby minimizing multi-loop structures and indirectly enhancing mRNA longevity.³⁷ AUP can be calculated from base-pairing probability matrices, which are standard outputs of many RNA folding prediction packages like ViennaRNA’s RNAfold.

Many of these quantitative metrics serve as valuable, albeit imperfect, proxies for complex in vivo biological processes. MFE, for instance, reflects in vitro thermodynamic stability, while CAI is based on statistical codon preferences. Their predictive power for actual in vivo performance is not absolute. A significant challenge is that optimizing one metric can sometimes negatively affect another; for example, maximizing structural stability (low MFE) through high GC content might create overly rigid structures that impede ribosome movement or cause issues in PCR amplification.²⁹ This highlights the critical need for multi-objective optimization strategies that explicitly manage these trade-offs. The field is also witnessing an evolution in metrics themselves, with AI and machine learning contributing to the development of more sophisticated, data-driven predictors that aim to capture functional outcomes more directly, such as MRL predicted by models like UTR-LM³³ or overall protein expression predicted by frameworks like RiboDecode.³⁹ These advanced metrics move beyond simple statistical correlations toward more integrative and functionally relevant assessments of mRNA sequence quality.

We summarize in Table 1 the quantitative metrics discussed above.

Table 1.

Quantitative metrics for evaluating mRNA sequence optimization.

Metric	Definition/calculation basis	Significance for mRNA performance	Typical tools/methods for assessment
Minimum Free Energy (MFE)	Thermodynamic stability of the most stable predicted secondary structure (kcal/mol)	Lower MFE generally correlates with increased resistance to degradation and longer mRNA half-life	RNAfold, mfold, CDSfold
Codon Adaptation Index (CAI)	Geometric mean of relative adaptiveness of codons, based on usage in highly expressed genes	Higher CAI generally correlates with increased translation elongation efficiency and protein yield	JCat, GeneOptimizer
Mean Ribosome Load (MRL)	Average number of ribosomes bound per mRNA molecule	Higher MRL indicates efficient translation initiation and/or elongation, leading to more protein yield	Ribosome profiling, UTR-LM, RiboDecode
mRNA Half-life (t1/2 )	Time taken for 50% of mRNA molecules to be degraded in the cell	Critical determinant of the duration and total amount of protein expression	Pulse labeling experiment, transcriptional shutoff experiment^40,41
Average Unpaired Probability (AUP)	Average probability of nucleotides being in an unpaired state across the structural ensemble	Lower AUP (more structured) is hypothesized to correlate with increased stability against hydrolysis	ViennaRNA, RiboTree

Computational Model

The immense complexity of mRNA sequence design, characterized by a vast search space and multiple interacting biological parameters, necessitates the use of sophisticated computational models. These models aim to predict the performance of mRNA sequences and to guide their optimization toward desired therapeutic outcomes. Over recent years, a diverse array of algorithms and software tools has been developed, ranging from those focused on single objectives like structural stability to advanced AI-driven platforms capable of multi-objective, full-length mRNA optimization.

Tool choice should be driven by the primary optimization goal and the mRNA region under design. If the priority is structural stability/persistence, begin with structure-centric predictors and optimizers and evaluate impact on t1/2. If the goal is maximizing protein output, prioritize methods targeting translation-related readouts: 5′UTR-focused predictors/generators (eg, UTR-LM, UTRGAN, Smart5UTR) guided by MRL/TE, and CDS-focused codon tools (eg, JCat/GeneOptimizer) guided by CAI. When multiple objectives must be balanced (eg, stability vs translation), adopt multi-objective frameworks (eg, LinearDesign, mRNA-LM, integrated tools such as mRNAdesigner) and report trade-offs across MFE/AUP, CAI, MRL, and t1/2. Finally, because in silico scores are imperfect proxies, candidate sequences should be shortlisted by constraints (motifs, GC/structure limits) and then validated experimentally, enabling iterative refinement. All the advantages and limitation have been listed in Supplemental Table 1.

Structure optimization

A primary goal in mRNA design is to enhance its stability, thereby prolonging its intracellular half-life and increasing the potential for protein production. Structure optimization models typically focus on modulating the mRNA’s secondary structure to achieve this.

CDSfold: This algorithm was an important development for optimizing mRNA structure by minimizing the MFE of the CDS while respecting the constraints imposed by the genetic code (ie, ensuring synonymous codon substitutions) However, CDSfold primarily focuses on MFE optimization and does not inherently co-optimize for translational efficiency metrics like CAI.²⁴

RiboTree: This approach innovatively targets mRNA stability by aiming to reduce the AUP of nucleotides within the sequence. The rationale is that regions with higher unpaired probability are more susceptible to degradation. By minimizing AUP, RiboTree seeks to reduce multi-branch loop structures and thereby indirectly improve mRNA half-life. The Monte Carlo Tree Search (MCTS) algorithm, which balances exploration of new sequences with exploitation of promising known sequences, as used in the Eterna platform and later in tools like mRNAdesigner, was inspired by or related to RiboTree’s development for AUP optimization.³⁷

RNAfold (ViennaRNA Package) and mfold: These are not optimization algorithmsm1, but rather foundational tools for RNA secondary-structure prediction. They calculate the MFE structure and base-pairing probabilities for a given RNA sequence using thermodynamic models (eg, the nearest-neighbor model). Their outputs are crucial for assessing the structural properties of designed sequences and are often integrated as components within larger optimization frameworks.^42,43

IPKnot: This tool is designed to predict mRNA secondary structures, including more complex non-canonical structures like pseudoknots, using dynamic programming and thermodynamic principle.⁴⁴

Protein production optimization

While structure influences stability, the ultimate goal for most mRNA therapeutics is efficient protein production. This category of models focuses on optimizing sequences, primarily the CDS, to maximize the yield of the encoded protein.

RiboDecode: RiboDecode’s key advances include its ability to learn directly from large-scale ribosome profiling (Ribo-seq) datasets, enabling it to capture complex, context-aware relationships between codon sequences and their translation levels. It can generatively explore a vast sequence space. In vitro experiments have demonstrated substantial improvements in protein expression (up to 72-fold increases) compared to previous methods, and in vivo studies have shown that RiboDecode-optimized mRNAs can induce significantly stronger neutralizing antibody responses (for an influenza vaccine candidate) and achieve equivalent therapeutic effects at lower doses (for a nerve growth factor mRNA).³⁹

GeneOptimizer: This is a commercially available tool that optimizes DNA sequences (which are then transcribed into mRNA) to enhance expression in a target host. It employs a sliding window approach to adjust multiple parameters, including codon usage (to match host preferences), GC content, and the removal of detrimental sequence motifs, aiming to improve mRNA stability and translational efficiency.⁴⁵

JCat (Java Codon Adaptation Tool): This tool focuses on improving heterologous protein production by optimizing codon usage based on metrics like Codon Adaptation. It also considers practical aspects like avoiding undesirable restriction enzyme cleavage sites in the optimized sequence.⁴⁶

Multi-objective optimization

Recognizing that optimal mRNA performance depends on a balance of multiple factors, multi-objective optimization models aim to simultaneously improve several, often conflicting, properties such as stability, translational efficiency, and immunogenicity. This reflects a more mature understanding of mRNA biology, moving away from single-parameter optimization.

LinearDesign: This algorithm stands out for its novel application of concepts from computational linguistics, specifically Deterministic Finite Automata (DFA) and lattice parsing, to concurrently optimize mRNA structural stability (MFE) and codon optimalim1ty (CAI). It represents the vast mRNA design space as a DFA and then uses lattice parsing to efficiently find sequences that minimize a combined objective function incorporating MFE and CAI. LinearDesign has demonstrated the ability to generate mRNA sequences with significantly improved half-life, protein expression, and, importantly for vaccines, enhanced immunogenicity (eg, up to 128-fold increase in antibody titers in mice for a COVID-19 vaccine candidate compared to a codon-optimized benchmark).⁸ Its efficiency allows for the optimization of long sequences, like the SARS-CoV-2 spike protein, in minutes.⁸

CodonBERT: This is an example of advanced language model that has been trained on large datasets of mRNA CDS.⁴⁷ They are designed to predict various mRNA properties, such as half-life or protein expression levels, based on codon sequence. This predictive capability can then be used to guide the design of codon sequences for improved recombinant protein production or mRNA-based therapeutics.⁴⁸

mRNAdesigner: This is an integrated web server designed for the comprehensive optimization of full-length mRNA sequences, encompassing the CDS, 5′ UTR, and 3′ UTR. For CDS optimization, it employs an MCTS algorithm to explore the sequence space, aiming to reduce unpaired regions (enhancing stability), minimize complex stem-loop structures (reducing potential translational inhibition and immunogenicity), and mitigate the use of rare codons, all while adhering to user-defined GC content preferences. It also includes modules for selecting optimal 5′ UTRs (using predictions from UTR-LM to maximize MRL) and 3′ UTRs (by filtering for or against elements like AREs and CUREs) to further enhance translation and stability.²⁹

mRNA-LM: This is an advanced integrated small language model (SLM) built by combining three separate BERT-based models, each specialized for one of the three main mRNA regions: 5′ UTR (5UTRBERT), CDS (CodonBERT), and 3′ UTR (3UTRBERT). It utilizes contrastive learning (specifically, the contrastive language–image pretraining [CLIP] methodology) to learn a joint representation of these segments from the full-length mRNA. Trained on millions of diverse mRNA sequences, mRNA-LM can be fine-tuned to predict a range of mRNA properties, including transcript stability, mRNA abundance, translation rate, and protein expression levels. This predictive power for multiple properties makes it a valuable tool for guiding multi-objective optimization efforts for full-length mRNA design.⁴⁹ These multi-objective models represent the cutting edge of mRNA design, employing diverse algorithmic strategies—from dynamic programming and graph theory to genetic algorithms and sophisticated machine-learning architectures—to tackle the inherent trade-offs in optimizing mRNA for therapeutic use.

UTR optimization

Given the critical role of UTRs in modulating mRNA stability and translation, specialized computational models have been developed to focus specifically on the design and selection of optimal 5′ and/or 3′ UTR sequences. The optimization of the entire mRNA, including both UTRs and the CDS, is increasingly recognized as crucial.

UTRGAN: This model utilizes a generative adversarial network (GAN) architecture to de novo generate novel human 5′ UTR sequences.⁵⁰ Beyond generation, UTRGAN incorporates an optimization procedure to refine these sequences, aiming to achieve higher MRL, enhanced TE, and increased overall gene expression. It has reported significant improvements, such as up to 5-fold higher average expression and 34-fold higher average TE for optimized UTRs compared to initial sequences.⁵⁰

UTR-LM: This language model, pretrained on a vast collection of endogenous 5′ UTR sequences from multiple species, is designed to predict functional properties of 5′ UTRs.³³ It can be fine-tuned for tasks such as predicting MRL, TE, and mRNA expression levels. Within tools like mRNAdesigner, UTR-LM is used to screen and select 5′ UTRs from a library that are predicted to yield high translational performance when combined with a specific CDS.²⁹

Smart5UTR: This is another deep generative model specifically developed to identify superior m1Ψ-modified 5′ UTRs in silico. It features a tailored loss function and network architecture designed to overcome limitations of existing models in the context of modified UTRs.⁵¹

These UTR-focused models often leverage machine-learning techniques, particularly deep learning and generative models, trained on large datasets of known UTR sequences and their associated functional data (eg, MRL from Ribo-seq, expression levels from reporter assays). This allows them to learn complex sequence-function relationships and either predict the performance of given UTRs or generate entirely new UTR sequences with desired characteristics.

The evolution of computational models for mRNA optimization clearly shows a trend from single-objective, often rule-based approaches toward more sophisticated, AI-driven multi-objective strategies. Machine learning, particularly deep learning and large language models (LLMs), is playing an increasingly central role, enabling the capture of complex biological patterns from vast datasets that elude simpler methods. There is also a growing emphasis on optimizing the entire full-length mRNA (5′UTR-CDS-3′UTR) holistically, recognizing the synergistic interactions between these regions. Furthermore, the advent of generative AI models marks a significant shift, empowering researchers to design entirely novel sequences with desired properties, thereby vastly expanding the explorable design space beyond mere selection or modification of existing sequences.

We summarize in Table 2 the algorithms discussed above.

Table 2.

Overview of computational models for mRNA sequence optimization.

Model/tool name	Primary optimization goal(s)	Algorithmic approach	Key features/outputs
CDSfold	MFE minimization (structural stability) of CDS	Dynamic programming, graph-based representation of codons	Optimized CDS for low MFE
RiboTree	AUP reduction	Monte Carlo tree search, AUP minimization	CDS with low Average Unpaired Probability
RNAfold	Structure prediction	Thermodynamic model, dynamic programming	Predicted MFE secondary structure, base-pairing probabilities
IPKnot	mRNA secondary structures prediction	Dynamic programming, thermodynamic principles	Handles complex non-canonical structures
RiboDecode	Protein yield maximization	Deep learning (trained on Ribo-seq data), generative exploration	Optimized CDS for enhanced protein production, context-aware, cell-type-specific expression
GeneOptimizer	mRNA expression improvement in target host	Sliding-window multi-parameter optimization	Optimized codon usage, GC content, removes detrimental motifs; commercial tool for DNA/mRNA design
JCat	Heterologous protein production improvement	Codon adaptation index calculation	Avoided restriction sites; focuses on translational efficiency
CodonBERT	mRNA properties prediction	Advanced language models (BERT-based)	Guided codon design for protein production or mRNA therapeutics; trained on large mRNA dataset
LinearDesign	MFE & CAI co-optimization (stability & translation efficiency)	Deterministic Finite-state Automaton (DFA), lattice parsing, dynamic programming	Optimized CDS for balanced MFE and CAI, improved half-life, protein expression
mRNAdesigner	Full-length mRNA stability & translation efficiency	Monte Carlo tree search (for CDS), UTR-LM integration (for 5′ UTR), UTR library filtering (for 3′ UTR)	Optimized full mRNA (CDS, 5′UTR, 3′UTR) for stability, GC content, translation efficiency
mRNA-LM	Full-length mRNA property prediction for multi-objective opt.	Integrated language models (BERT for UTRs/CDS), contrastive learning (CLIP)	Predicts stability, expression, translation rate for full mRNA; enables informed multi-objective design
UTRGAN	5′ UTR optimization for MRL, Translation Efficiency (TE), gene expression	Generative Adversarial Network (GAN)	Generates and optimizes novel 5’ UTR sequences for enhanced translational output
UTR-LM	Prediction of functional properties of 5′ UTRs	Transformer	MRL, translation efficiency, IRES identification
Smart5UTR	Identification of superior m1Ψ-modified 5′ UTRs in silico	multi-task autoencoder (MTAE) frame	Generates m1Ψ modified 5′ UTR

Discussion

The rapid ascent of mRNA vaccine technology, particularly highlighted during the COVID-19 pandemic, has underscored the critical importance of rational sequence design. Significant progress has been made in understanding the complex interplay between mRNA sequence elements, their corresponding biological functions, and the ultimate efficacy of the encoded protein, whether it be a vaccine antigen or a therapeutic protein. This journey has seen a shift from relatively simple, single-parameter optimization strategies, such as focusing solely on codon adaptation, to more sophisticated, multi-objective computational approaches. The development and application of AI and machine-learning algorithms have been pivotal in this evolution, providing powerful tools to navigate the astronomically vast sequence space and to begin deciphering the intricate sequence-to-function relationships that govern mRNA stability, translation, and immunogenicity. This review has summarized key biological mechanisms, the quantitative metrics used to guide optimization, and the diverse array of computational models currently employed in this dynamic field.

The journey of mRNA vaccine technology has been transformative, moving from foundational research to global therapeutic impact. This progress has been fueled by an enhanced understanding of RNA biology—from the roles of UTRs and codon usage to the impact of secondary structures and chemically modifications—and by the concurrent development of powerful computational tools. Sequence optimization has matured significantly. Initially, efforts might have focused on individual parameters like maximizing CAI or achieving a target GC content. However, the field now widely recognizes that mRNA performance is a result of many interacting factors. Consequently, the focus has shifted toward multi-objective optimization strategies that aim to simultaneously balance stability, translational efficiency, and immunogenicity. AI and ML have become indispensable in this endeavor, capable of learning from large datasets to identify complex patterns and guide the design of sequences with superior characteristics, a task far beyond the reach of manual or simple algorithmic approaches.

Despite the remarkable advancements, current computational models for mRNA sequence optimization are not without limitations. Bridging the gap between in silico design and in vivo performance remains a significant hurdle.

The In Silico to In Vivo Gap: A primary challenge is that many models are trained or validated using in vitro data or simplified cellular systems, which may not fully capture the complexities of the in vivo environment within a human or animal host.⁴⁷ Factors such as tissue-specific tRNA availabilities, the dynamic landscape of RNA-binding proteins in different cell types, the precise interactions with the host immune system, and the influence of the delivery vehicle such as lipid nanoparticles (LNPs; key sequence features like GC content and secondary structure may directly influence the physicochemical interactions during LNP encapsulation)⁵² are difficult to model comprehensively and are often not fully integrated into sequence optimization algorithms.²⁹

Dataset Heterogeneity and Bias: The performance of AI and ML models is heavily dependent on the quality, quantity, and diversity of the data used for their training. For mRNA optimization, this includes data from ribosome profiling, mRNA stability assays, immunogenicity studies, and protein expression measurements. The availability of such data can be limited, and existing datasets may be heterogeneous or contain biases. Models trained on such data may not generalize well to new or diverse mRNA targets.

Model Interpretability: Many advanced AI models, particularly deep learning networks, function as “black boxes,” making it challenging to understand the specific sequence features or biological rules they have learned to make their predictions or design choices. This lack of interpretability can hinder scientific understanding, limit trust in the models, and make it difficult to rationally improve upon their designs.

Limitations of Single-Metric Approaches: As highlighted by several studies, relying on the optimization of a single metric (eg, CAI alone) can be misleading and insufficient, as it fails to capture the holistic requirements for optimal mRNA performance.²⁷

The field of computational mRNA optimization is rapidly evolving, with several exciting avenues for future development:

Advanced AI/ML Techniques: The broader adoption of cutting-edge AI methodologies is anticipated. This includes generative AI models like GANs, variational autoencoders (VAEs), and advanced LLMs (or transformers) for the de novo design of highly optimized sequences, potentially uncovering novel design principles by exploring previously uncharted regions of the vast sequence space.³⁹ Reinforcement learning could also be applied, where models learn to make sequential design choices that maximize a long-term objective, such as overall vaccine efficacy. Although models like Smart5UTR achieve high prediction accuracy, it remains a black-box model—highlighting the need for Explainable AI (XAI) methods to uncover which sequence motifs or structural features contribute most to high MRL, thereby enhancing biological understanding and model trustworthiness.⁵¹

Modeling Novel Chemical Modifications: As new chemical modifications are developed to further enhance mRNA properties, computational models will need to be adapted or retrained to accurately predict their impact on mRNA structure, stability, translation, and immunogenicity.

Improved Prediction and Modulation of Immunogenicity: Moving beyond simple motif avoidance, future models will aim for a more sophisticated understanding and prediction of how mRNA sequences interact with the full spectrum of innate immune sensors. This includes not only minimizing unwanted innate responses but also potentially designing sequences that elicit optimal and tailored adaptive immune responses.

Standardized Benchmarking and Open Data Sharing: To foster robust progress and enable fair comparison of different computational approaches, the establishment of standardized benchmark datasets and community-wide challenges will be invaluable. Promoting open sharing of curated datasets and model architectures will accelerate development and validation across the field.

Building on these directions, three particularly actionable priorities stand out for the next stage of the field. First, progress will depend on standardized, high-quality experimental datasets with harmonized readouts (eg, matched measurements of stability/half-life, translation outputs such as MRL/TE, and innate immune activation markers), ideally generated under comparable cell types, delivery conditions, and chemical-modification settings, to enable robust model training and benchmarking. Second, future model development should increasingly prioritize explainability (eg, interpretable motif/structure attribution and mechanistic hypothesis generation), so that AI systems can move beyond “black-box” prediction to reveal actionable sequence–function rules. Finally, optimization objectives should be explicitly application-dependent: infectious-disease vaccines may prioritize rapid, high peak antigen expression with controlled innate sensing, whereas cancer vaccines and therapeutic protein replacement may require different trade-offs in durability, dosing frequency, and safety—arguing for tunable, context-aware multi-objective frameworks and benchmarks tailored to each use case.

The “no free lunch” theorem seems applicable to mRNA optimization; there is unlikely to be a single, universally perfect mRNA sequence or a one-size-fits-all optimization algorithm. Different therapeutic applications (eg, a vaccine requiring transient but very high antigen expression versus a protein-replacement therapy needing sustained, moderate expression) will likely demand distinct optimization strategies and involve different trade-offs. This necessitates the development of tunable, context-aware computational design tools. Moreover, the critical feedback loop between computational prediction and experimental validation will continue to evolve. Tighter, more rapid iterations, where high-throughput experimental data continuously refines AI models, will be key to enhancing their predictive accuracy and reliability, transforming AI from a mere design tool into an active partner in the scientific discovery process. Ultimately, the field is moving from “sequence optimization” toward “therapeutic optimization,” where the goal is not just an optimal mRNA molecule in isolation but an optimal therapeutic outcome in patients. This will require future models to integrate a broader range of factors, including delivery system characteristics, route of administration, patient-specific variables, and the underlying biology of the target disease.

Conclusion

The optimization of mRNA sequences stands as a critical frontier in the development of next-generation vaccines and therapeutics. The journey from basic RNA biology to clinically successful mRNA products has been accelerated by the advent of powerful computational tools, particularly those driven by artificial intelligence. These approaches are beginning to unravel the complex “RNA code” that governs an mRNA’s fate and function, enabling the design of molecules with enhanced stability, higher TE, and improved safety profiles. While significant challenges remain in bridging the in silico-in vivo gap and in fully modeling the dynamic nature of RNA regulation, the pace of innovation is rapid. Continued interdisciplinary collaboration among molecular biologists, immunologists, chemists, computational scientists, and clinicians will be paramount. By harnessing the combined power of deep biological understanding and sophisticated computational strategies, the immense potential of mRNA technology to address a wide spectrum of human diseases is poised to be realized, heralding a new era of precision medicine.

Supplemental Material

sj-docx-1-bbi-10.1177_11779322261431255 – Supplemental material for Recent Progress in Sequence Optimization of mRNA Vaccine: Biological Mechanism, Quantitative Metrics, and Computational Model

Supplemental material, sj-docx-1-bbi-10.1177_11779322261431255 for Recent Progress in Sequence Optimization of mRNA Vaccine: Biological Mechanism, Quantitative Metrics, and Computational Model by Yunwei Wang, Yuheng Cai, Zhixing Wu, Jingming Zhang, Lance Turtle and Jia Meng in Bioinformatics and Biology Insights

Footnotes

ORCID iDs

Yunwei Wang

Jia Meng

Ethical Considerations

Not applicable. This article does not report any studies with human participants or animals performed by any of the authors.

Author Contributions

Yunwei Wang: Conceptualization; Investigation; Writing—original draft; Writing—review & editing; Methodology; Visualization; Formal analysis; Data curation; Resources.

Yuheng Cai: Resources; Supervision; Project administration; Writing—review & editing.

Zhixing Wu: Visualization; Writing—review & editing; Supervision.

Jingming Zhang: Writing—review & editing; Supervision.

Lance Turtle: Supervision; Writing—review & editing.

Jia Meng: Conceptualization; Supervision; Resources; Project administration; Writing—review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: National Natural Science Foundation of China [31671373]; Scientific Research Foundation of Nanjing University of Chinese Medicine [013038030001]; XJTLU Key Program Special Fund [KSF-E-51 and KSF-P-02]. This work was supported by the Supercomputing Platform of Xi’an Jiaotong-Liverpool University.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Not applicable. This article is a narrative review and did not generate or analyze new datasets.

AI Usage Declaration

The authors used an AI language model (ChatGPT, OpenAI) to refine and enhance the clarity and readability of the manuscript text. No scientific data were generated, analyzed, or modified using AI tools. The authors take full responsibility for the accuracy, originality, and integrity of the submitted work.

Supplemental Material

Supplemental material for this article is available online.

References

Pardi

Hogan

Porter

Weissman

MRNA vaccines — a new era in vaccinology. Nat Rev Drug Discov. 2018;17:261-279. doi:10.1038/nrd.2017.243

Jackson

Anderson

Rouphael

, et al. An mRNA vaccine against SARS—CoV—2—preliminary report. N Engl J Med. 2020;383:1920-1931. doi:10.1056/NEJMoa2022483

Castillo-Hair

Seelig

Machine learning for designing next-generation mRNA therapeutics. Acc Chem Res. 2022;55:24-34. doi:10.1021/acs.accounts.1c00621

Thess

Grund

Mui

, et al. Sequence-engineered mRNA without chemical nucleoside modifications enables an effective protein therapy in large animals. Mol Ther. 2015;23:1456-1464. doi:10.1038/mt.2015.103

Kudla

Murray

Tollervey

Plotkin

JB.

Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324:255-258. doi:10.1126/science.1170160

Karollus

Avsec

Gagneur

Predicting mean ribosome load for 5′UTR of any length using deep learning. PLoS Comput Biol. 2021;17:e1008982. doi:10.1371/journal.pcbi.1008982

Leppek

Byeon

Kladwang

, et al. Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics. Nat Commun. 2022;13:1536. doi:10.1038/s41467-022-28776-w

Zhang

Lin

, et al. Algorithm for optimized mRNA design improves stability and immunogenicity. Nature. 2023;621:396-403. doi:10.1038/s41586-023-06127-z.

Cetnar

Hossain

Vezeau

Salis

HM.

Predicting synthetic mRNA stability using massively parallel kinetic measurements, biophysical modeling, and machine learning. Nat Commun. 2024;15:9601. doi:10.1038/s41467-024-54059-7.

10.

Castillo-Hair

Fedak

Wang

, et al. Optimizing 5′UTRs for mRNA-delivered gene editing using deep learning. Nat Commun. 2024;15:5284. doi:10.1038/s41467-024-49508-2.

11.

Mignone

Gissi

Liuni

Pesole

Untranslated regions of mRNAs. Genome Biol. 2002;3:REVIEWS0004. doi:10.1186/gb-2002-3-3-reviews0004.

12.

Young

Wek

RC.

Upstream open reading frames differentially regulate gene-specific translation in the integrated stress response. J Biol Chem. 2016;291:16927-16935. doi:10.1074/jbc.R116.733899

13.

Sonenberg

Hinnebusch

AG.

Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell. 2009;136:731-745. doi:10.1016/j.cell.2009.01.042

14.

Cockman

Anderson

Ivanov

TOP mRNPs: molecular mechanisms and principles of regulation. Biomolecules. 2020;10:969. doi:10.3390/biom10070969

15.

Ambrosini

Destefanis

Kheir

, et al. Translational enhancement by base editing of the Kozak sequence rescues haploinsufficiency. Nucleic Acids Res. 2022;50:10756-10771. doi:10.1093/nar/gkac799

16.

Cai

A brief review on the mechanisms of miRNA regulation. Genomics Proteomics Bioinformatics. 2009;7:147-154. doi:10.1016/s1672-0229(08)60044-3

17.

Bicknell

Reid

Licata

, et al. Attenuating ribosome load improves protein output from mRNA by limiting translation-dependent mRNA decay. Cell Rep. 2024;43:114098. doi:10.1016/j.celrep.2024.114098

18.

Hia

Takeuchi

The effects of codon bias and optimality on mRNA and protein regulation. Cell Mol Life Sci. 2021;78:1909-1928. doi:10.1007/s00018-020-03685-7

19.

Bao

Loerch

Ling

Korostelev

Grigorieff

Ermolenko

DN.

mRNA stem-loops can pause the ribosome by hindering A-site tRNA binding. eLife. 2020;9:e55799. doi:10.7554/eLife.55799

20.

Varshney

Spiegel

Zyner

Tannahill

Balasubramanian

The regulation and functions of DNA and RNA G-quadruplexes. Nat Rev Mol Cell Biol. 2020;21:459-474. doi:10.1038/s41580-020-0236-x

21.

Lee

Lin

Kuo

RL.

Race with virus evolution: the development and application of mRNA vaccines against SARS-CoV-2. Biomed J. 2023;46:70-80. doi:10.1016/j.bj.2023.01.002

22.

Marques

Lacerda

Romão

Internal Ribosome Entry Site (IRES)-mediated translation and its potential for novel mRNA-based therapy development. Biomedicines. 2022;10:1865. doi:10.3390/biomedicines10081865

23.

Trotta

On the normalization of the minimum free energy of RNAs by sequence length. PLoS ONE. 2014;9:e113380. doi:10.1371/journal.pone.0113380.

24.

Terai

Kamegai

Asai

CDSfold: an algorithm for designing a protein-coding sequence with the most stable secondary structure. Bioinformatics. 2015;32:828-834. doi:10.1093/bioinformatics/btv678

25.

Imani

Chen

, et al. Computational biology and artificial intelligence in mRNA vaccine design for cancer immunotherapy. Front Cell Infect Microbiol. 2024;14:1501010. doi:10.3389/fcimb.2024.1501010.

26.

Q i

El- Kebir

Balancing minimum free energy and codon adaptation index for pareto optimal RNA design. Leibniz Int Proc Inform 2023;273:1-21. doi:10.4230/LIPIcs.WABI.2023.21

27.

Demissie

Park

S-Y

Moon

Lee

D-Y.

Comparative analysis of codon optimization tools: advancing toward a multi-criteria framework for synthetic gene design. J Microbiol Biotechnol. 2025;35:1-11. doi:10.4014/jmb.2411.11066

28.

Lee

Weon

Lee

Kang

Relative codon adaptation index, a sensitive measure of codon usage bias. Evol Bioinform Online. 2010;6:47-55. doi:10.4137/ebo.s4608.

29.

Zhang

Cheng

, et al. mRNAdesigner: an integrated web server for optimizing mRNA design and protein translation in eukaryotes. Nucleic Acids Res. 2025;53:W415-W426. doi:10.1093/nar/gkaf410.

30.

O’Brien

Ciryam

Vendruscolo

Dobson

CM.

Understanding the influence of codon translation rates on cotranslational protein folding. Acc Chem Res. 2014;47:1536-1544. doi:10.1021/ar5000117.

31.

Ando

Rashad

Begley

, et al. Decoding codon bias: the role of tRNA modifications in tissue-specific translation. Int J Mol Sci. 2025;26:706.

32.

Limbu

Xiong

Wang

A review of Ribosome profiling and tools used in Ribo-seq data analysis. Comput Struct Biotechnol J. 2024;23:1912-1918. doi:10.1016/j.csbj.2024.04.051

33.

Chu

, et al. A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions. Nat Mach Intell. 2024;6:449-460. doi:10.1038/s42256-024-00823-9

34.

Castruita

JAS

Schneider

Mollerup

, et al. SARS-CoV-2 spike mRNA vaccine sequences circulate in blood up to 28 days after COVID-19 vaccination. APMIS. 2023;131:128-132. doi:10.1111/apm.13294

35.

Liu

Zhao

Optimizing mRNA translation efficiency through rational 5′UTR and 3′UTR combinatorial design. Gene. 2025;942:149254. doi:10.1016/j.gene.2025.149254

36.

Moqtaderi

Geisberg

Struhl

Secondary structures involving the poly(A) tail and other 3′ sequences are major determinants of mRNA isoform stability in yeast. Microb Cell. 2014;1:137-139. doi:10.15698/mic2014.04.140

37.

Wayment-Steele

Kim

Choe

, et al. Theoretical basis for stabilizing messenger RNA through secondary structure design. Nucleic Acids Res. 2021;49:10604-10617. doi:10.1093/nar/gkab764

38.

LLY

Schiess

GHA

Miranda

Weber

Astakhova

. Pseudouridine and N1-methylpseudouridine as potent nucleotide analogues for RNA therapy and vaccine development. RSC Chem Biol. 2024;5:418-425. doi:10.1039/d4cb00022f

39.

Wang

Yang

, et al. Deep generative optimization of mRNA codon sequences for enhanced protein production and therapeutic efficacy. bioRxiv [Preprint]. 2024. doi:10.1101/2024.09.06.611590

40.

Agarwal

Kelley

DR.

The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol. 2022;23:245. doi:10.1186/s13059-022-02811-x

41.

Chen

Ezzeddine

Shyu

AB.

Messenger RNA half-life measurements in mammalian cells. Methods Enzymol. 2008;448:335-357. doi:10.1016/s0076-6879(08)02617-7

42.

Hofacker

IL.

Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429-3431. doi:10.1093/nar/gkg599

43.

Zuker

Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406-3415. doi:10.1093/nar/gkg595

44.

Sato

Kato

Hamada

Akutsu

Asai

IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics. 2011;27:i85-i93. doi:10.1093/bioinformatics/btr215

45.

Raab

Graf

Notka

Schödl

Wagner

The GeneOptimizer algorithm: using a sliding window approach to cope with the vast sequence space in multiparameter DNA sequence optimization. Syst Synth Biol. 2010;4:215-225. doi:10.1007/s11693-010-9062-3

46.

Grote

Hiller

Scheer

, et al. JCat: a novel tool to adapt codon usage of a target gene to its potential expression host. Nucleic Acids Res. 2005;33:W526-W531. doi:10.1093/nar/gki376

47.

Faizi

Sakharova

Lareau

LF.

A generative language model decodes contextual constraints on codon choice for mRNA design. bioRxiv [Preprint]. 2025. doi:10.1101/2025.05.13.653614

48.

Moayedpour

, et al. CodonBERT large language model for mRNA vaccines. Genome Res. 2024;34:1027-1035. doi:10.1101/gr.278870.123

49.

Noroozizadeh

Moayedpour

, et al. MRNA-LM: full-length integrated SLM for mRNA analysis. Nucleic Acids Res. 2025;53:gkaf044. doi:10.1093/nar/gkaf044

50.

Barazandeh

Ozden

Hincer

Seker

UOS

Cicek

AE.

UTRGAN: learning to generate 5′ UTR sequences for optimized translation efficiency and gene expression. Bioinform Adv. 2025;5:vbaf134. doi:10.1093/bioadv/vbaf134

51.

Tang

Huo

Chen

, et al. A novel deep generative model for mRNA vaccine development: designing 5′ UTRs with N1-methyl-pseudouridine modification. Acta Pharm Sin B. 2024;14:1814-1826. doi:10.1016/j.apsb.2023.11.003

52.

Zhao

Chen

Dai

, et al. Harnessing computational strategies to overcome challenges in mRNA vaccines. Physiology. 2025;40:487-501. doi:10.1152/physiol.00047.2024

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB