Nucleic acid modifications in DNA and RNA ubiquitously exist among all the three kingdoms of life. This trait significantly broadens the genome diversity and works as an important means of gene transcription regulation. Although mammalian systems have limited types of DNA modifications, over 150 different RNA modification types have been identified, with a wide variety of chemical diversities. Most modifications occur on transfer RNA and ribosomal RNA, however many of the modifications also occur on other types of RNA species including mammalian mRNA and small nuclear RNA, where they are essential for many biological roles, including developmental processes and stem cell differentiation. These post-transcriptional modifications are enzymatically installed and removed in a site-specific manner by writer and eraser proteins respectively, while reader proteins can interpret modifications and transduce the signal for downstream functions. Dysregulation of mRNA modifications manifests as disease states, including multiple types of human cancer. In this review, we will introduce the chemical features and biological functions of these modifications in the coding and non-coding RNA species.
Enzyme-mediated chemical modifications on RNA species across all domains of life have been documented over more than 50 years. A large repertoire of over 100 distinct modifications have since been described, with the first report of modified nucleotides from highly abundant RNAs dating back to 1960 [1,2]. These post-transcriptional modifications are enzymatically installed and removed in a site-specific manner by writer and eraser proteins, respectively, while reader proteins can interpret modifications and transduce the signal for downstream functions. These enzymes utilize myriad reactions, such as methylations, deaminations, thiolations, glycosylations, isomerizations, etc. . Advances in mass spectrometry and high-throughput sequencing have enabled the detection and characterization of modifications in relatively lowly expressed RNA species, revitalizing the study of RNA modifications and shaping the field of epitranscriptomics. In this review, we will introduce the chemical features and biological functions of these modifications in both coding and non-coding RNA species.
Chemical properties and biological functions of transfer RNA modifications
Transfer RNA structure
Across all organisms, transfer RNAs (tRNAs) undergo the most diverse chemical modifications as part of their maturation and function. Many unique modifications have only been described in the context of tRNA and ribosomal RNA (rRNA), both of which have historically served as the frontier for the discovery and characterization of RNA modifications. As such, there is no better place to begin exploring the variety of RNA modifications than in the world of tRNA. We will first extensively explore tRNA to cover a large swath of chemical diversity, and then focus on the biological implications of modifications of mRNA and other RNA species which code dynamic regulatory information.
The canonical and most common tRNA consists of 76 nucleotides and is conserved across all kingdoms of life . It adopts a clover-leaf secondary structure that terminates with a CCA at the 3′ end, with the terminal adenosine 2′- or 3′-OH acting as the site of aminoacylation during tRNA charging. The arms of the clover-leaf structure are termed as the acceptor stem, the dihydrouridine stem–loop (D-loop), anticodon stem–loop, and the thymine–pseudouridine–cytosine stem–loop (TΨC-loop) (Figure 1A). Once the tertiary L-shaped structure is adopted (Figure 1B), the TΨC-loop and the acceptor stems stack to form the 12 base pairs (bp) acceptor-TΨC minihelix, while the stacking of the anticodon stem–loop with the D stem–loop forms the 10 bp anticodon D-loop dumbbell. The tertiary right-angle L-shape is achieved largely by the conserved nucleotides between the TΨC and D stem–loop that interact to stabilize the structure. True to its name, the variable loop (V-loop) is where any additional nucleotides beyond the canonical 76 are most often found, which tends to bulge outward and avoids steric hindrances.
Transfer RNA secondary and tertiary structure.
In addition to cytoplasmic tRNAs, animal cells also host a pool of mitochondrial tRNAs which are usually shorter than the cytosolic counterparts . Interestingly, these tRNAs often have smaller loop structures and can even be missing entire domains, highlighting the versatility and high capacity for tRNAs to function after various processing events. It is also noteworthy that mitochondrial tRNAs are enriched in adenosine and uracil, resulting in reduced stability which is thought to promote base paring misalignments that are more susceptible to base substitutions, resulting in several mitochondrial tRNA-associated human diseases [5–7].
Isoacceptors and isodecoders
tRNA is the second most abundant species of RNA in cells following rRNA. There are roughly 300 tRNA genes in the human genome that have been validated for expression, and tRNA gene copy number and sequence diversity vary within the human population [8,9]. A large number of tRNAs in comparison with the 20 canonical amino acids used for polypeptide synthesis have given rise to tRNA isoacceptors; tRNAs with distinct sequences including different anticodon sequences that are charged with the same amino acid. In higher eukaryotes, tRNAs with matching anticodon and differing body sequences are known as isodecoders. As an example of the increased complexity of tRNAs brought about by these sequence variations, consider the large group of alanine-specific tRNAs (tRNAAla). There are 39 distinct tRNAAla in humans split into three isoacceptor families with multiple isodecoders present in each family [9,10]. This seemingly unnecessary degree of complexity in the world of tRNAs continues to underscore the importance of these molecules beyond their known functions in translation and is a trend that is taken further by tRNA modifications.
tRNA modifications and their chemical properties
The overwhelming majority of the greater than 100 unique annotated modified nucleotides have been found on tRNAs [1,11]. Some modifications, such as isopentenyl adenosine (Figure 2A) or wybutosine (Figure 2B), are infrequent and chemically complex, while other modifications, such as dihydrouridine or pseudouridine (Figure 2C), are present in nearly all tRNAs. These are only a small fraction of a vast array of modifications often found on tRNAs; human tRNAs on average harbor 11–13 modifications in a variety of combinations [12,13]. These numerous unique tRNA modifications provide a model to explore the chemical diversity of modified RNA. Detailed below is the current physio-chemical and biological understanding of several chemical modifications to specific tRNA nucleosides.
The chemical structures of nucleoside and ribose modifications addressed in this review.
N1-methyladenosine in tRNA
In the two-dimensional clover-leaf structure of tRNAs, nucleoside 9 has been shown to be critical for proper folding, most notably in the case of smaller mitochondrial tRNAs. N1-methyladenosine (Figure 2A) at A9 (m1A9) can disrupt the Watson–Crick base pairing of an intra-stem interaction during maturation of mitochondrial tRNALys, favoring the functional, canonical clover-leaf secondary structure [14–16]. Lack of the modification favors the intra-stem interaction, shifting the equilibrium toward a stable alternative extended hairpin structure that is not suitable for aminoacylation or translation. During tRNALys maturation, m1A9 modification confers 0.7–1 kcal/mol of stability, depending on Mg2+ concentration . This structural dependance on m1A9 modification and indeed many other base modifications is a common theme among animal mitochondrial tRNA biogenesis .
As was just noted, methylation of nucleosides can play an enormous role in the maturation of tRNAs. One of the critical steps of tRNA maturation is the convergence of the inward ends of each stem on the clover-leaf structure, forming the characteristic clover-leaf junction in the secondary structure and leading to the formation of an internal loop characteristic of tRNA's L-shaped tertiary structure. Methylations within this junction are crucial in preventing each stem from forming a spatially tight interaction, keeping the core open enough to allow proper folding. The family of 2-methylated guanosines, N2-methylguanosine (m2G), N2, N2-dimethylguanosine (m22G), and N2, N2, 2′-O-trimethylguanosine (m22Gm), are highly conserved at tRNA positions 10 and 26 across all domains of life, and several other tRNA positions also bare methylguanosine modifications (Figure 2B) . The methyl groups are located on the Watson–Crick face, disrupting proper base pairing and in the case of positions 10 and 26, terminating the duplexes in tRNA. It is noteworthy that m2G has been shown to be energetically identical with guanosine in the context of a G to C pairing in a synthesized RNA duplex , but likely has a more disruptive role in the context of tRNA. Additionally, m22G has an undisputed effect on duplex stability and base pairing properties, most notably that m22G stably forms a pseudo-Watson–Crick pair with adenosine (see below). Interestingly, substitution experiments replacing guanosine with inosine indicate that the 2-methylamino group of guanosines at tRNA positions 2, 3, and 10 play an important role in tRNA-protein recognition . These studies highlight how conserved methylated guanosine nucleosides can act as critical points of identification and recognition during aminoacylation.
Methylation of guanosine 26 occurs in roughly 80% of eukaryotic tRNAs that contain guanosine at position 26. However, there are documented unmodified G26 nucleosides such as yeast tRNAAsp . A crystal structure of an RNA duplex containing two m22G:A pairs has shown that the modified guanosine adopts an imino-hydrogen bond with adenosine which explains the mismatch pairing of G26 and A44 in these tRNAs [4,18,23]. A study on the biological effects of tRNA m22G26 showed that a reduction in RNA polymerase III production of tRNAs correlates with an increasing trend of m22G tRNA modification in yeast and human cells . It is known that m22G is important for decoding during translation, acting as an ‘activator’ of tRNAs and increasing tRNA usage in the ribosome. It is thought that there is an inverse link between the rate of RNA polymerase III activity and m22G26-linked activation of tRNA, perhaps as a compensatory mechanism for the slowdown in tRNA production [24,25].
Pseudouridine in tRNA
Pseudouridine (Ψ) is one of the most abundant modifications of RNA and is found in a variety of different RNA species such as tRNA, rRNA, mRNA, and some small RNA species [26–28]. Ψ is an isomer of uridine that replaces the N1-C1′ glycosidic bond between the nucleobase and the sugar with a C5′-C1′ bond (Figure 2C). The slight alteration of uridine to Ψ can induce drastic differences in the structure and function of RNAs. The unique C–C bond between the sugar and base affords greater rotational freedom to Ψ, and the free N1-H imino group can potentially act as an additional hydrogen donor outside of the Watson–Crick face. In the context of tRNAs, Ψ can be found on all loops of tRNAs. However, it most frequently plays a structural role as part of the TΨC or anticodon loop due to the striking effect on RNA stability conferred by the sugar of Ψ through a property known as sugar puckering.
The helix of an RNA containing a Ψ can be thermodynamically stabilized by the C3′-endo sugar pucker (Figure 2F) [26,29,30]. The puckered sugar intrinsically has increased stacking ability and can induce a localized increase in helix stability. This sugar conformation restricts the base to an axial anti conformation which also confers additional rigidity and stability to local helices, resulting in an additive, doubly stabilized Ψ residue [31–33]. Computational and NMR studies indicate that Ψ may fluidly adopt several different puckered conformations, but it is apparent that the context of Ψ in an RNA sequence contributes to the particular bond conformations that appear at a given residue [34–36].
Ψ in tRNA generally does not affect the overall three-dimensional structure and has been shown to be non-essential for cell viability and aminoacylation. However, Ψ’s stabilizing effects on the local structure are critical for the maintenance of the anticodon loop and in turn proper binding and recognition of tRNAs by the ribosome. It is believed that the stability conferred to the anticodon loop by Ψ may induce a tighter binding of the appropriate tRNA to the ribosome and facilitate codon–anticodon interactions [31,33,37,38]. This less remarkable role while not essential may nonetheless have a meaningful impact on maintaining proper, homeostatic levels of translational accuracy by slowing the rate of peptide synthesis and facilitating the rejection of improper tRNAs .
Dihydrouridine in tRNA
As the name may suggest, the most common and conserved dihydrouridine (D) modification (Figure 2C) in tRNAs can be found within the D-loop and is conserved in nearly all life. Reduction of the double bond between C5 and C6 of uridine generates D, and the presence of this modification in the D-loop is known to promote tRNA secondary and tertiary structure [40–42]. The nucleoside structure of D indicates that the loss of aromaticity prevents D from making base stacking interactions . Additionally, the ribose sugar of D prefers the C2′-endo conformation over C3′-endo (Figure 2F), unlike uridine which does not favor either of these two conformations. This conformation preference is due to the loss of a pi orbital interaction between uridine and its cognate sugar that is prohibited after conversion to D [44,45]. This induction of ribose C2′-endo conformation by D can ultimately have a dynamic effect on tRNA structure and stability. Dihydrouridine has been shown to both destabilize RNA helices and decrease melting temperatures in vitro , but is also capable of inducing stable hairpin loop structures within the D-loop as well . The destabilizing nature of D proves critical in mediating dynamic, flexible sugar arrangements necessary for proper D-loop folding.
5-methylcytosine in tRNA
The presence of 5-methylcytosine (m5C) (Figure 2D) is more varied than some other modifications among the trees of life. It can be found in tRNA and rRNA of eukaryotes and archaea but is absent from tRNAs from Escherichia coli despite also being found in E. coli rRNA . In tRNAs, m5C is most consistently found at positions 48 and 49 as part of the junction between the variable loop and the TΨC loop and is known to promote tRNA stability and protein synthesis in eukaryotes and prokaryotes [48,49]. The methyl groups of m5C48 and m5C49 are not located on the Watson–Crick interface of the nucleobase and thus do not interfere with canonical base pairing interactions. A ubiquitous component of all tRNAs is a non-canonical base pairing between nucleotides 15 and 48 known as the Levitt pair (Figure 3) . This pair assumes a reverse Watson–Crick base pairing geometry or a trans arrangement that helps to join the D-loop and the variable loop. Nucleosides 48 and 49 are modified to m5C in 26% and 80% of tRNAs, respectively, but what this modification contributes to this system is not known [51,52]. A study of the thermodynamic parameters of m5C40 in yeast tRNAPhe noted an increase in the melting temperature to a similar extent as Ψ in the same tRNA, although the contribution of m5C to the free energy of the tRNA was negligible . When investigating the biological functions of m5C in tRNAs, one group has shown that the growth and development of mice and flies are disturbed following the loss of tRNA m5C modifications conferred by two methyltransferases, DNA methyltransferase 2 (Dnmt2) and NOP2/Sun RNA methyltransferase family member 2 (Nsun2) [48,53]. The biological roles of m5C are more apparent in the realm of mRNA as will be discussed.
The chemical structure of Watson–Crick base pairing and Levitt base pairing between G and C.
The chemical variety of tRNA anticodon loop
The anticodon loop, responsible for decoding during translation, is the most chemically varied portion of tRNA, especially positions 34 and 37 [4,11,47]. Chemical modifications at these two positions have implications in a diverse set of roles related to RNA biology such as the maturation, stability, and conformational variety of tRNAs [54–58], enhancement of ribosomal A-site entry and binding [56,58–64], enhancement of mRNA translocation during translation , and modulation of mRNA decoding, frame maintenance, and frameshifting [65–67].
Position 34 of tRNA
Position 34 is often called the wobble position due to Francis Crick's Wobble Hypothesis which states that a guanosine, uridine, or inosine at position 34 can pair with two or three different nucleosides . This idea has since been updated to account for the dynamic changes in codon recognition conferred by chemical modifications at the wobble position . Modifications at position 34, especially in the case of uridines, can both expand and contract the decoding range of a given tRNA, and over 80% of cytosolic tRNAs across species carry a modified nucleoside at position 34 [56,70,71]. Surprisingly, the majority of modifications to nucleosides at position 34 occur at position 5 of pyrimidines, opposite of the Watson–Crick face, indicating that modifications are having an indirect effect on the chemistry of the anticodon–codon interface [11,68].
A common effect of modifications to position 5 of a pyrimidine is the induction of either a C3′- or C2′-endo ribose sugar pucker conformation which has been shown to modulate decoding capabilities of tRNAs (Figure 2F) [69,72–74]. An example of a modification that usually restricts decoding is the 2-thio-5-methyluridine family of modifications at position 34. The van der Waals radius of the sulfur atom sterically clashes with the ribose 2′-oxygen atom, favoring the C3′-endo conformation, ultimately restricting decoding of the third position of the codon to adenosine . Conversely, modification families that lack the additional sulfur group, such as 5-hydroxyuridine or 5-methyluridine, can adopt both sugar pucker forms and therefore have an expanded wobble recognition repertoire that includes adenosine, guanosine, and uridine . The 5-taurinomethylurdine modification at position 34 (τ5mU34) is a considerably larger modification that is common to several mitochondrial tRNAs (Figure 2C). This modification has been shown to contribute to decoding by stabilizing the wobble base pairing of U34 and G3. Through crystallographic and computational studies, τ5mU34 in mitochondrial tRNALeuUAA has been shown to stabilize hydrogen bonding with adjacent nucleosides and contribute to the wobble pairing of U34 and G3 via increased stacking interactions [62,75]. Similarly, bacterial geranylated 2-thiouridine (ges2U34) (Figure 2C) facilitates this same wobble pairing due to the large hydrophobic geranyl group modified at the two thio position [76–78].
An additional layer of mismatch pairing at the wobble position is conferred by modifications that can allow for tautomerization of the nucleobase. Modification of uridine 34 to 5-methoxycarbonylmethyl-2-thiouridine (mcm5s2U34) or 5-methylaminomethyl-2-thiouridine modification (mnm5s2U34) permits both human and E. coli tRNALysUUU to recognize the wobble codon AAG (Figure 2C). Crystal structures have shown that these modifications can shift the keto–enol equilibrium to favor the enol form which permits a guanosine–uridine base pairing that maintains Watson–Crick geometry within the ribosome [79,80]. This permission of Watson–Crick geometry despite a mismatch base pairing is thought to help such mismatches avoid the penalty by translational fidelity machinery and therefore allows tRNAs to utilize an expanded wobble recognition [79,81].
Position 37 of tRNA
Modifications at purine-invariant position 37, directly 3′-adjacent to the anticodon, can have varying effects on translational accuracy and efficiency [59,64,82,83]. Generally speaking, modifications at position 37 most often form a hydrophobic platform that supports the presentation of the anticodon to the codon and also proactively structures the anticodon loop for chemically favorable recycling [80,84–86]. When the first base of a mRNA codon is a uridine or an adenosine, position 37 of the tRNA is nearly always a modified adenosine which can often sport complex chemistry, such as N6-isopentenyladenosine (i6A) or N6-threonylcarbamoyladenosine (t6A) (Figure 2A). Phenylalanine tRNAs from archaea and eukarya have a G37 in place of A37 which can respond to codons starting with U. In the case of archaeal, eukaryotic, and mitochondrial tRNAs containing a G37 in place of A37, tricyclic wyosine (imG) or its derivative wybutosine (yW) is often found (Figure 2B) [87,88]. Each of these tRNAs also carries an invariant U33 that would form a stable cross-loop H-bond with A37 if not for the elaborate N6 modifications of A37 and G37 [56,84]. The rotation of t6A and i6A modifications induces steric hindrance that prevents the nucleoside from slipping into the helix, effectively disrupting the hydrogen bond between A37 and U33 . Modification of G37 to imG37 or yW37 can provide a large platform above the first codon–anticodon interaction. Altogether, the complex structures of these modifications can provide considerable energetic stability and accuracy to the codon–anticodon interaction through van der Waals forces, hydrophobic effects, and solvent displacement [80,90,91].
There are numerous other modifications to tRNA, more than can be feasibly discussed here in any significant detail. Even so, the variety and plasticity of modifications in tRNA make apparent the critical and dynamic role they play in RNA structure and biology. From here, we will discuss the variety and current understanding of modifications to mRNA.
Chemical properties and biological effects of mRNA modifications
The discovery of reversible mRNA modifications as well as improvements in transcriptome-wide high-throughput sequencing has generated a wave of interest and research towards understanding the role these modifications play in a wide range of cellular processes. Chemical modifications to mRNA may interact directly with specific reader proteins to recruit machinery for cellular processes such as translation, decay, or localization. In addition to direct reader protein recognition, these modifications can also have effects on the secondary structure of mRNA transcripts and can be interpreted via indirect reader protein–RNA substrate recognition . Of the various modifications, mRNAs especially utilize methylation which provides unique regulatory signaling capacities while also minimally perturbing mRNA nucleosides, thus maintaining translational efficiency. The use of the S-adenosylmethionine (SAM) cofactor, whose synthesis requires the net investment of 12–13 ATP, supports the importance of this modification in cell function. The discovery of the mRNA modification-specific reader, writer, and eraser proteins has continued to expand (Figure 4), yet there are still several important modifications that have either partial or no known enzymes associated with their deposition and regulation. However, we now know that the intricate and expanding network of readers, writers, and erasers that regulate the modification of mRNA confer a dynamic, combinatorial layer of complexity to what was once thought of as solely a gene expression intermediate.
Modifications and proteins related to mRNA modifications.
N6-methyladenosine in mRNA
N6-methyladenosine (m6A) (Figure 2) is the most prevalent internal modification in eukaryotic mRNA, occurring on an average of three sites per given mRNA molecule [93,94]. The methyl group of m6A can adopt both the syn and anti conformations based on the structural context of the given RNA. In single-stranded RNA (ssRNA), the syn conformation is the most energetically favorable geometry adopted by the nucleotide [95,96]. In double-stranded RNA (dsRNA), the methyl group must adopt the anti conformation to accommodate A to U Watson–Crick base pairing, positioning the methyl group in the major groove of an RNA helix and resulting in a 0.5–1.7 kcal/mol destabilization compared with an unmodified dsRNA [96,97]. This property has proved important as a means for m6A to induce localized structural switches from dsRNA to ssRNA [92,98]. These m6A switches allow for the indirect reading of m6A by readers that recognize not the modification itself but the primary sequence exposed or otherwise modulated by the structural switch. This includes the indirect reader ribonucleoproteins HNRNPA2B1 and HNRNPC [92,99]. The orientation of m6A towards the Watson–Crick geometry also explains why most reverse transcriptase enzymes are unimpeded by m6A modifications, as many polymerases can likely accommodate the anti methyl group in the major groove .
The heterodimeric enzyme complex of METTL3 and METTL14 is considered the canonical N6-adenosine methyltransferase complex responsible for the majority of mRNA m6A modifications [100–102]. Notably, crystallographic and biochemical studies have shown that METTL3/METTL14 function cooperatively, in which the enzymatic activity is greatly enhanced by the structural support of METTL14 . Additionally, WTAP serves as an associated subunit of METTL3/METTL14 that increases localization of the complex into nuclear speckles, while KIAA1429 has been shown to be critical in maintaining complex activity and cellular m6A profiles [103,104]. Other interactors of the complex include RBM15/RBM15B which bind U rich regions near DRACH, the consensus sequence of m6A, to recruit the m6A writer complex .
m6A is known to play an essential role in the processing and maturation of pre-mRNAs in the nucleus. An essential step of the mRNA maturation process is splicing, in which the excision of introns and the retention and ligation of exons from pre-mRNA transcripts is enzymatically driven and combined with additional processing mechanisms to form mature mRNAs. Additionally, alternative splicing can differentially include or exclude exons from a given mRNA. The mixture of splicing events allows for the generation of different mature mRNA which encodes different protein isomers. This process allows for additional levels of regulation and diversity within the proteome. Several m6A motifs have been found within introns and studies on several writers and erasers of the m6A modification have shown effects on alternative splicing [106,107]. Knockout of METTL3 in mice embryonic stem cells has shown trends of intron retention and exon skipping in transcripts, while depletion of the putative m6A eraser enzyme Fat mass and obesity-associated protein (FTO) results in increased levels of m6A near 5′ and 3′ splicing sites as well as enhanced recruitment of the serine-/arginine-rich splicing factor 2 (SRSF2) and increased exon inclusion [108,109]. The m6A reader protein YTHDC1 has been shown to associate with SRSF3 to inhibit binding of SRSF10 and promote exon inclusion rather than skipping .
A similar process to alternative splicing is alternative polyadenylation (APA) in which APA sites within coding or non-coding regions of transcripts can be cleaved and subsequently polyadenylated, effectively modulating transcript length and protein-coding potential. Transcript isoforms enriched with m6A are correlated with generally shorter 3′ UTRs, although what this functionally means and whether m6A is a causative agent in APA-dependent 3′UTR shortening is unclear .
Along with the maturation of pre-mRNA into mRNA, m6A is also involved in nuclear export. Studies have suggested that depletion of m6A results in loss of nuclear export of some mRNAs. Knockdown of METTL3 was shown to inhibit mRNA export and knockdown of the eraser enzyme ALKBH5 conversely enhanced mRNA export [112,113]. This evidence suggests reader proteins play a crucial role in identifying specific m6A motifs on mRNAs and facilitate their export into the cytosol.
Once exported to the cytosol, m6A plays further roles in enhancing the translation of mRNAs. The m6A reader protein YTHDF1 has shown the ability to enhance the translation efficiency of m6A-containing transcripts by binding m6A motifs and recruiting translation initiation machinery such as eIF3 . The IGF2BP class of m6A reader protein also promotes translation and increases the stability and storage of target mRNAs . In another completely independent pathway, METTL3 is also shown to recruit eIF3 to enhance eIF4E-dependent translation .
In addition to the effective translation of transcripts, m6A promotes the destabilization and degradation of mRNA. It has been shown that, in addition to the recruitment of YTHDF1 for translation efficiency, m6A is also targeted by YTHDF2 which targets transcripts for degradation . Though not entirely understood, evidence suggests that YTHDF3 also promotes translation efficiency and degradation by cooperating with YTHDF1 and YTHDF2 . This quality of m6A allows for swift and short bursts of translation and dynamic changes in gene expression in which modified transcripts are quickly translated and degraded. A recent study demonstrated that YTHDC2 might be capable of chaperoning both aspects of m6A bursts by interacting with both XRN exoribonuclease and the small subunit of the ribosome . This quality of m6A allows for additional levels of control and regulation so that cells only invest energy in translating crucial transcripts and is often employed during dynamic cellular physiological events such as differentiation.
N1-methyladenosine (m1A) in mRNA
m1A was initially only found in tRNA and rRNA . The ignorance of this modification's existence was due to its extremely low abundance in mRNA. In fact, a single transcript may only possess a single m1A modification which usually lies near the translation start site and the first splicing site [120,121].
The chemistry of m1A endows a positive charge which could play an essential role in reader protein interaction and potentially affect base pairing and secondary structure. As was discussed for tRNAs, m1A can disrupt canonical Watson–Crick base pairing and induces local RNA duplex melting , but little is known about the biological role of m1A although it is well known for its inherent ability to stall reverse transcription . mRNA and tRNA share the same writer protein of m1A, TRMT6/61-complex, which acts upon the same consensus motif in both RNA species . ALKBH3 is responsible for m1A demethylation on both RNA and DNA species. Transcriptome-wide mapping of m1A allowed for the identification of several dynamic sites of ALKBH3 demethylation . Also, gene ontology analysis using the transcriptome mapping data indicated that the m1A modification is involved in a wide variety of cell functions and is mainly enriched in the 5′ UTR or near the start codon. Currently, m1A is believed to most likely play a role in promoting translation, but more evidence will need to be shown such as the identification of additional m1A-specific writers, readers, and erasers.
5-methylcytosine (m5C) in mRNA
m5C is most well known as a DNA modification and for its role in epigenetics. However, this modification exists in mRNA as well, although its biological role has never been well studied until recent years. The classic bisulfite treatment, which has been used to characterize the distribution and function of this modification in DNA, has also been used to characterize m5C's role in RNAs. The tRNA methyltransferase Nsun2 (Figure 4) was shown to methylate cytosine in mRNAs and non-coding RNAs in addition to tRNA [125–127]. A few studies identified oxidative derivatives of m5C mediated by TET eraser enzymes (Figure 4) [128,129]. The identification of these derivatives suggests that the nucleotide could be involved in several signaling pathways related to its sequential modification and breakdown. There is also evidence that m5C plays a role in promoting the nuclear export of target transcripts. The m5C reader-cytoplasmic shuttle, ALYREF (Figure 4), has been shown to work with Nsun2 to promote the export of m5C transcripts . Knockdown of ALYREF was shown to result in an accumulation of nuclear m5C transcripts which could be rescued by reconstitution of wild-type ALYREF.
Pseudouridine (Ψ) in mRNA
Ψ is the most abundant of all RNA modifications. Though mainly abundant in rRNA and tRNA, this modification has also been found in mRNA transcripts. Mapping of Ψ in mRNA has been accomplished at single-base resolution, and hundreds of mammalian and yeast transcripts with this modification have been identified [27,131,132]. In particular, it was shown that pseudouridylation is dynamic and can change in response to several factors including stress and cell growth state. Analysis of yeast modifications at log phase and post-diauxic growth demonstrated significant changes in identified pseudouridylation sites in RBS28B, MRPS12, CDC33, U5 snRNA, RNase MRP, and snR37. HeLa cells grown in the presence or absence of serum showed dynamic changes in known modification sites of RPL19, ATP5E, MALAT1, and RN7SK. It is postulated that, due to Ψ’s ability to stabilize RNA structure, this modification could serve a role in several biological processes including enhancement of translation initiation efficiency, ribosome pausing, RNA localization, and RNA interference . Several tRNA Ψ writer proteins have been shown to modify mRNAs . This is indicative of the role that secondary structure plays in the identification of pseudouridylation sites in mRNA rather than a specific consensus motif. Ψ tRNA writers, such as Pus1 and Pus2, have no sequence similarity in their mRNA targets, while other writer enzymes like Pus4 and Pus7 showed highly defined consensus motifs . Also, several target RNAs were identified for Pus3, Pus6, and Pus9.
2′-O-methylation in mRNA
Little is known about the biological effects of 2′-O-methylation (2′-O-me) in the internal region of mRNA. In vitro studies of 2′-O-me of RNA have shown an ability to inhibit A to I editing . It is also suggested that small nucleolar RNAs (snoRNAs) play a role in site-directed 2′-O-me modification of specific mRNA transcripts [135,136]. 2′-O-me is involved in the discrimination of self and non-self mRNA . Daffis et al.  demonstrated that West Nile virus mutants lacking 2′-O-methyltransferase activity were attenuated but retained pathogenicity in cells lacking type I interferon . Similarly, Case et al.  observed that when mutations are inserted into the SAM-binding region of the N7-methyltransferase of coronavirus, there is a loss of viral replication and enhanced sensitivity to immune response. Taken together, this indicates that viruses may have adapted 2′-O-methyltransferase activity as a mechanism of evading host recognition systems. Given these findings, it is understandable why incorporation of 2′-O-me into small interference RNA (siRNA) optimizes stability and immunogenic properties .
Modifications in the 5′ cap
All eukaryotic mRNA contains N7-methylguanosine (m7G) (Figure 2B) linked by a reverse 5′ to 5′ triphosphate linkage. This capping is carried out co-transcriptionally by RNA triphosphatase, RNA guanylyltransferase, and guanine-N7 methyltransferase . The 5′ cap of mRNA plays essential roles in nearly every step of mRNA life cycle including splicing, export, protection from degradation, and translation initiation [141,142]. The 5′ cap protects transcripts from degradation by 5′ to 3′ exonucleases due to its conformation being similar to that of a 3′ end. Additionally, enzymes recruited to the 5′ cap such as CBC and eIF4E/eIF4G block decapping enzymes from acting on the cap. These factors increase the half-life of mRNA and are essential to allow for proper processing and translation to occur. The 5′ cap works synergistically with the poly(A) tail to stimulate translation by recruitment of translation initiation machinery . eIF4G, bound to the 5′ cap, interacts with PABP1, bound to the poly(A) tail, to create a pseudocircular structure which is thought to ensure that mature mRNAs are translated and assist in ribosome advancement along the transcript [143,144]. The poly(A) tail and 5′ cap alone have been shown to be insufficient to drive efficient translation . This synchrony between the poly(A) tail and 5′ cap is especially important in development where translation is known to be regulated by the lengthening and shortening of the poly(A) tail .
In addition to m7G,, several other variations of 5′ caps have been found. 5′-NAD+ capping is one unique pathway in which RNAs may be capped [146,147]. It was recently shown that eukaryotes may possess this type of cap and that these RNAs are unstable and inefficiently translated. The NAD+ cap plays a role in RNA degradation by recognition of DXO protein . It was demonstrated that DXO will preferentially favor the removal of NAD+ caps rather than m7G, which was thought to be DXO's primary function in the past. Another unique capping mechanism is the +1 and +2 ribose 2′-O-me. These reactions are carried out in humans by hMTr1 and hMTr2, respectively [149,150]. Details of the biological role of these modifications have yet to be elucidated, but they likely serve to modify efficiency of transcript processing, translation, and stability. It is also possible that 2′-O-me of the +1 and +2 cap serves a role in viral vs. host mRNA recognition; a possibility discussed earlier in this review. RNA capping could potentially be a new target for antiviral therapies as we continue to discover more about the nature of this relationship of host vs. pathogen recognition.
Ribosomes consist of both protein and RNA components whose subunits come together to form a functional complex responsible for mRNA translation and polypeptide production. Mapping has shown that rRNA modifications are mainly concentrated in functional regions of the ribosome such as the peptidyl transferase center (PTC). In total, eukaryotic rRNA has been mapped to have 91 Ψ, 105 2′-O-me of backbone sugars, and ten methylated bases . These modifications are still not well understood in their role or function in rRNA, but there have been some studies performed which provide some insight into their activity along with what we know about the chemistry of these modifications. The modifications most likely play a role in fine-tuning the structure of the ribosome and play a role in the processing of rRNA. The chemical activity of the ribosome is classically understood to be the responsibility of the protein components which make up the functional regions of the ribosome. There has been some controversy surrounding the possibility that rRNA modifications may be involved in the peptidyl transferase activity . This is based on the evidence of modification mapping; however, there is little evidence supporting any actual role which modifications may play in the catalysis of the reaction.
2′-O-methylation in rRNA
Modifications of rRNA are proposed to play a role in modifying ribosome structure and function by altering the structure or the molecular interactions taking place within functionally relevant regions such as the PTC. Based on our understanding of the chemistry, 2′-O-me prevents hydrolysis of the phosphate backbone and causes the ribose sugar to favor the 3′ endo conformation [26,29,30]. Williams et al.  demonstrated the effects of 2′-O-me modifications in synthetic UUGC tetraloops and found changes in the stability and flexibility of the stem–loops. The observed effects of the substitutions were due to differences in hydrogen bonding, solvation effects, and intrinsic puckering of the hairpins. This evidence further supports how the modification of 2′-O-me in rRNA may play a role in shaping the secondary and tertiary structures necessary for proper ribosome function.
Pseudouridine in rRNA
In rRNA, Ψ modifications have been mapped and found clustered in functionally important regions of the large subunit. [152,154]. The Ψ residues clustered within domain V, which constitutes the PTC, have been debated over as to whether or not these modifications play a role catalytical in peptidyl transferase activity of the ribosome. It is hypothesized that the N1 position of Ψ can allow for high group transfer potential for acyl moieties [152,155]. Currently, there is no support of Ψ direct role in peptidyl transferase activity. Mutational studies in yeast have shown that deletions of the eight residues which reside in the PTC do not affect growth or viability alone . However, these modifications are collectively essential, as introducing several mutations in yeast ribosome PTC results in changes in tRNA binding and peptidyl transferase rates [156,157]. It was also shown that the collective loss of these modifications results in several defects including cell growth and translation . Ψ residues have been found in the small subunit, but none as of yet have been seen in functionally important regions [156,158]. It is classically accepted that this modification mainly aids in forming the secondary and tertiary structure required for ribosome biogenesis and proper formation of protein–RNA interactions.
rRNA biogenesis and processing
Modification of rRNA is involved not only in its function and structure of ribosomal subunits but also in rRNA processing and cleavage . Similar to mRNA, the modifications of rRNA are performed by snoRNA–protein complexes known as small nucleolar ribonucleoproteins (snoRNPs) in which the snoRNAs act as a guide to target specific sequences on the rRNA and ensure that the exact target base gets modified . The Ψ residues of eukaryotic rRNA are synthesized by coordination between guiding H/ACA snoRNP complexes. Researchers can study the loss of Ψ and modifications of other RNA species by disrupting these specific snoRNPs and observing the outcome in the transcriptome. Several snoRNPs are involved in RNA processing and cleavage. The modifications which these essential snoRNPs make on pre-rRNA serve as markers for cleavage sites. These cleavages occur within the nucleus before being exported to the cytosol except the 18S unit in Saccharomyces which usually requires an additional cytosolic cleavage step of the 20S precursor . These structural rearrangements and cleavages mediated by rRNA modification give rise to mature ribosomal subunits. One particularly interesting snoRNA is snR35 which is involved in the processing of the site 1191 of 18S rRNA. This particular modification is a hypermodified Ψ m1acp3C1191. The process of forming this modification involves several steps—loss of this modification in the P-site by disruption of snR35 results in defects in pre-rRNA processing and translation due to loss of D-site cleavage during processing . Several nucleolar enzymes which make up the snRNPs have been identified and associated with their target modification. In some cases, the exact catalytic mechanism of the modification is not well understood due to the lack of sequence similarity to other well-characterized modifying enzymes [162–164]. Compared with our understanding and characterization of snoRNA, there is still much left to be discovered about the protein factors which make up pre-ribosomal complexes including the protein components of the snoRNPs, trafficking machinery [165–168], helicases [169,170], and GTPases [171–173].
snoRNAs can be separated into two classes based on their sequence elements; C/D box snoRNA and H/ACA box snoRNA [174,175]. These two classes are responsible for most rRNA modifications, and some may have several target sites . The H/ACA family is responsible for methylation modifications, while C/D box snoRNAs are responsible for pseudouridylation modifications. The nucleolytic cleavages and nucleolar structures of pre-rRNA processing are entirely dependent on proper modifications by the snoRNP complexes, and it is widely believed that these steps in rRNA processing serve as complex levels of quality control to ensure proper ribosome formation and healthy cell function. Future studies should attempt to characterize the complex dynamics and machinery involved in this quality control and the process of rRNA processing.
Modification of other non-coding RNAs
It has become greatly appreciated that eukaryotic cells transcribe a wide variety of RNA species with various regulatory functions, such as micro-RNA (miRNA), small interfering RNA (siRNA), P-element-induced wimpy testis (PIWI) interacting RNA (piRNA), small nuclear RNA (snRNA), long non-coding RNA (lncRNA), and tRNA-derived small RNA (tsRNA). What has more recently emerged is the importance of modifications to these RNA species and the information they carry. Here, we will briefly describe some known modifications that modulate the functions of several unique RNA species.
miRNAs are critical components in RNA silencing pathways that induce mRNA degradation and inhibit translation . It has been found that 2′-O-me at the 3′ end of plant miRNAs prevents poly-3′-uridylation, a known marker for small RNA degradation pathways [178–180]. The same protective mechanism using 2′-O-me has also been described in several mammalian small RNAs such as siRNAs and piRNAs .
The snRNAs that make up the family of spliceosomal RNAs are extensively modified, mainly bearing Ψ and 2′-O-me [181,182]. Modifications in snRNAs can influence several structural and mechanistic aspects pre-mRNA splicing, and thus can influence the expression and function of many genes. snRNA modifications can modulate RNA–RNA interactions, the interaction of spliceosomal snRNAs with spliceosomal proteins, and the direct catalysis of the splicing reactions . Although the exact role of all snRNA modifications has not been fully dissected, it is thought that the ten 2′-O-me and 13 Ψ present in the U2 snRNA can influence spliceosome assembly and splicing efficiency via the mechanisms just described .
As was stated in the tRNA section, Dnmt2- and Nsun2-mediated methylation of cytosine support tRNA stability. Additionally, loss of m5C can increase angiogenin-mediated cleavage of tRNAs, resulting in tsRNAs [48,185,186]. Accumulation of tsRNAs combined with m5C-deficient tRNAs alters protein expression due to codon mistranslation, resulting in disrupted body growth and neuronal cell dysfunction [48,185–187]. These studies hint at the complex interplay between the levels of methylated tRNAs, production of tsRNAs, and protein synthesis. Finally, lncRNAs are known to have a diverse and expanding array of functions in mRNA processing, transcription regulation, and chromatin remodeling [188–190]. Several lncRNAs have been shown to contain multiple m6A sites, such as MALAT1, TUG1, XIST, and NEAT1 [102,105,191,192]. It is known that XIST-mediated X-inactivation is, in part, accomplished by YTHDC1 recognition of m6A residues ; however, what RNA modifications contribute to the functional role of lncRNAs remains unclear.
The chemical diversity of RNA modifications found throughout the tree of life is staggeringly complex, and how these modifications are regulated and interpreted is even more daunting. In the early years of modification research, most believed that RNA modifications served a relatively static role in much the same way the scientific community once viewed tRNA or ribosomes. Recent discoveries of the dynamic and reversible nature of RNA modifications across many RNA species have caused a resurgence of interest and appreciation for these chemical variants. There remains much to be done, as more quantitative sequencing technologies will be required to precisely map and understand the roles of modifications to different RNA species. That being said, we now know that the chemical structure, occupancy, reversibility, and enzyme selectivity associated with different RNA modifications all play key roles in determining the combinatorial output that ultimately results in exquisite fine-tuning of gene expression and organismal physiology.
fat mass and obesity
long non-coding RNA
P-element-induced wimpy testis
peptidyl transferase center
small interference RNA
small nucleolar RNAs
small nucleolar ribonucleoproteins
small nuclear RNA
serine/arginine-rich splicing factor 2
tRNA-derived small RNA
The Authors declare that there are no competing interests associated with the manuscript.
These authors contributed equally to this work.