Structured cis-regulatory RNAs have evolved across all domains of life, highlighting the utility and plasticity of RNA as a regulatory molecule. Homologous RNA sequences and structures often have similar functions, but homology may also be deceiving. The challenges that derive from trying to assign function to structure and vice versa are not trivial. Bacterial riboswitches, viral and eukaryotic IRESes, CITEs, and 3′ UTR elements employ an array of mechanisms to exert their effects. Bioinformatic searches coupled with biochemical and functional validation have elucidated some shared and many unique ways cis-regulators are employed in mRNA transcripts. As cis-regulatory RNAs are resolved in greater detail, it is increasingly apparent that shared homology can mask the full spectrum of mRNA cis-regulator functional diversity. Furthermore, similar functions may be obscured by lack of obvious sequence similarity. Thus looking beyond homology is crucial for furthering our understanding of RNA-based regulation.
Cis-regulatory RNA elements are structured regions of an mRNA that regulate the transcription, translational efficiency, or stability of the mRNA and are found throughout all domains of life. While most frequently found in the 5′ untranslated region (UTR), structured RNA cis-regulatory elements may be found throughout the transcript with diverse mechanisms in prokaryotic (Figure 1A,B) and eukaryotic systems (Figure 1C–E). RNA cis-regulators frequently control functions that require swift changes such as cellular stress response or, in the case of pathogens, host switching. In contrast with DNA or protein-coding elements, the primary sequence conservation of cis-regulatory RNA elements can be extremely low because the secondary structure or folded structure is often more highly conserved than the primary sequence. However, just as with DNA or protein-coding elements, identifying apparent homology is still an integral component of the process for connecting growing sequence databases with biological functions. In recent years, improvements to computational methods have made identifying new and homologous cis-regulatory RNA elements easier. Yet, due to the unique properties of structured RNA, the use of homology still has some serious limitations. In this review, we highlight a series of vignettes that illustrate limitations of using homology to find or predict cis-regulatory RNA element function and discuss recent insights into the evolution and function of riboswitches, internal ribosome entry sites (IRESes) and 3′ UTR elements such as cap-independent translation elements (CITEs).
Cis-regulatory RNAs in Bacteria and Eukaryotes.
A riboswitch is a cis-regulatory noncoding RNA element typically found in the 5′ UTR that bacteria employ to control stress response as well as flux through essential biosynthetic pathways. Consisting of a ligand-binding aptamer domain that controls a downstream expression platform, riboswitches comprise a diverse collection of sequences and structures that interact with a host of different cellular ligands and employ diverse mechanisms of action such as transcription termination and translation inhibition (Figure 1A) (for review see [1,2]). Most riboswitches were discovered through comparative genomic analyses to identify structured noncoding RNAs. The majority of motifs defined by such searches have properties suggesting regulatory activity, but as of yet have no identified triggering ligand . Determining the correct ligand for a riboswitch candidate is typically aided by the functional characterization of the genes under its regulatory control. Riboswitches whose ligands were straightforward to identify have largely been associated with well-characterized metabolic pathways, such as coenzyme or amino acid biosynthesis. Riboswitch candidates whose ligands resist identification, collectively known as orphan riboswitches, are often associated with genes coding for proteins of unknown function, or genes for various proteins with no established link to one another. Even following ligand identification, homologous aptamer examples may utilize different mechanisms of regulation, or even have distinct ligand-binding preferences. However, experimental validation and structural determination for such examples has yielded important insights into the structural and mechanistic diversity employed by RNA to regulate gene expression.
Sequence and secondary structure similarity often suggest common ligands for homologous riboswitch aptamers, but the detailed biochemical characterization and subsequent three-dimensional structures can reveal minor sequence changes that lead to differences in ligand specificity. This phenomenon is exemplified by the ykkC riboswitches (Figure 2). Originally classified as a single type of riboswitch, the original ykkC aptamer evaded characterization for over a decade . Furthermore, the discovery of non-homologous elements regulating similar sets of genes (mini-ykkC and ykkC-III) [5,6] only served to increase interest in these elements. Eventually, biochemical characterization subdivided the original ykkC aptamer into multiple sub-classes that recognize more than 5 distinct ligands: guanidine (ykkC subtype 1) , guanosine-3′, 5′-bisdiphosphate (ppGpp) (ykkC subtype 2a), and phosphoribosyl pyrophosphate (PRPP) (ykkC subtype 2b) [8–11]. Further validation of the ykkC subtype 2c expanded the ligands bound by this motif to include adenosine- and cytidine 5′ diphosphates (in either their deoxyribose or ribose forms), while subtype 2d remains an orphan riboswitch whose ligand is unknown .
ykkC subtype 1, 2a, and 2b riboswitch comparison.
Subsequent structural data have demonstrated that the guanidine type I, ppGpp, and PRPP-binding riboswitches that appear homologous based on sequence and secondary structure do share a highly conserved structural core, but are able to distinguish between ligands specifically. In these riboswitches, the guanidine-binding site is above the site of PRPP and ppGpp binding (Figure 2A). Furthermore, ppGpp and PRPP are distinguished by a helical element revealed by structure determination . The shared structural domain makes switching ligand specificity facile through changes to the ligand-binding helix. A single G96A point mutation switches the ligand affinity for the PRPP-binding aptamer to ppGpp with a 40 000-fold change in selectivity (Figure 2B) . A single mutation to the structural core along with swapping the ligand-binding helix converts the PRPP aptamer to a guanidine aptamer . The selective pressures driving the evolution of the guanidine, ppGpp, and PRPP-sensing ykkC riboswitch classes thus show how subtle sequence changes and modularity can lead to the adaptability of RNA-based regulation from a shared core structure.
Conversely, a common ligand may be recognized by aptamers with no obvious homology. Following the characterization of ykkC subtype 1, the mini-ykkC and ykkC-III elements were demonstrated to also be guanidine binders [14,15]. Recent crystal structures of the three non-homologous guanidine-binding riboswitches (ykkC, mini-ykkC, and ykkC-III) revealed some common elements but overall distinct structures that specifically interact with the same ligand, suggesting independent evolution of these elements [7,16–20]. These aptamers, and other similar examples such as the PreQ1 [21–23] and S-adenosyl methionine (SAM) binding aptamers [24–28] highlight the flexibility and plasticity of RNA as a regulatory molecule.
While the ykkC aptamers make clear that RNA regulators preceding diverse sets of genes can evolve different specificities through subtle changes to shared core structures, divergent gene sets can also promote the selection of more drastic changes in aptamer structure that may not ultimately affect biological function significantly. This is best exemplified by the glycine riboswitch, which exists in both singlet and tandem conformations and can function as an ON or OFF switch, primarily regulating either glycine cleavage metabolism (GCV) or glycine transport (TP), respectively (Figure 3). A library of point mutations to the Bacillus subtilis glycine riboswitch showed that the first aptamer of the tandem ON switch (regulating glycine cleavage) is more essential for ligand binding and regulation [29,30]. Yet, extensive biochemical analysis of the tandem OFF switch (regulating transport) from Vibrio cholerae showed that ligand binding by the second aptamer is more important . A recent computational analysis of glycine aptamers suggested that most singlet glycine riboswitches may derive from these tandem riboswitches . In this scenario, the ‘ghost’ aptamer associated with most singlet riboswitches is a degraded form of the less critical aptamer within a tandem aptamer pair. Singlet glycine riboswitches are thus divided into type-1 switches if the ghost aptamer follows the ligand-binding aptamer (and most frequently precede glycine cleavage operons) or type-2 switches if the ghost aptamer precedes it (and most frequently precede transporters) [31,32]. The regulated genes appear to have driven whether the first or second aptamer is conserved, as the other aptamer degrades to a stem–loop ghost aptamer . These studies of glycine riboswitch homologs give insights into the context-dependent selective pressures driving the evolution of the glycine riboswitch that likely apply to riboswitches more generally.
Genomic context drives conservation of the glycine riboswitches.
Although the vast majority of predicted cis-regulatory RNA structures remain uncharacterized, the ykkC and glycine riboswitches highlight the deeper understanding we can gain from detailed mechanistic and structural studies. Furthermore, we are now beginning to appreciate that a single riboswitch example does not necessarily represent the whole picture. Even aptamers binding the same ligand can operate co-transcriptionally or translationally and be driven by kinetic or thermodynamic mechanisms . While homology is a good place to start, evolution yields multiple solutions to the same problem that may not share homology and effective regulatory solutions are frequently adapted for other purposes.
Internal ribosome entry sites (IRESes)
Unlike riboswitches, where close structural homologs may not have the same regulatory activity or bind the same ligand, internal ribosome entry sites (IRESes) are all functionally similar, but share little recognizable homology. IRESes are cis-regulatory RNA elements that recruit cellular translation machinery such as ribosomal-binding proteins to mRNAs for cap-independent translation (Figure 1C–E) and are found in eukaryotic and viral mRNA. In eukaryotes, although functional IRES sequences have been found and experimentally validated, there is very little primary or secondary conservation between these elements, and it is unclear how many transcripts have functional IRESes [34,35]. In contrast, viral IRESes do display some homology and are organized into types based on structural similarities; each type is often associated with specific viral clades . However, some viruses contain a different IRES than other members of their clade, allowing IRES sequences to provide insight into viral evolution.
Lack of homology has made computational identification and comparison of eukaryotic IRESes difficult and controversial. A new method using comparative genomics and machine learning identified over 6 000 predicted IRESes in 20 fungal genomes . Analysis of associated GO-terms showed IRESes predicted near genes involved in cell stress response. The conservation and distribution of the predicted IRESes supports the idea that cap-independent translation is an important part of the cell stress response. Another study used a fluorescent reporter method to identify thousands of viral and human sequences that could cause cap-independent translation, suggesting that this method of translation could be more widespread than previously thought . Further investigation into methods to study eukaryotic IRESes that do not rely on sequence homology is a worthwhile investment and will likely lead to new insights into the evolutionary history of eukaryotic viruses.
In the past few years, crystal structures of picornavirus IRESes (which are generally categorized by the presence of a conserved tertiary structural core) have revealed surprising relationships between the different IRES types. Comparisons between the crystal structures of a type II and type III IRES found a conserved three-way junction critical for IRES-mediated translation . This three-way junction may also be conserved in type I IRESes and bears similarity to a three-way junction in 3′ CITEs, another viral cis-regulator that allows for cap-independent translation. These examples reflect a potential role for the three-way junction in the recruitment of translation machinery. Similar studies of IRES crystal structures have also uncovered an alternative IRES strategy–tRNA mimicry . Type IV picornavirus IRESes and the flavivirus hepatitis C IRES mimic the structure of the tRNA acceptor stem to directly bind the 40s subunit [40,41]. These recent studies of IRES structure highlight tertiary structure commonalities between IRESes lacking apparent primary sequence and secondary structural homology and provide a foundation for further understanding the molecular mechanisms and improved tool development of these curiously diverse cis-regulators.
While the type of IRES carried by a virus generally varies by clade, some viruses do not share the same type of IRES as the rest of their clade (Figure 4) and recent work suggests that IRESes are more frequently exchanged in comparison with other viral elements. In picornaviruses, a recent analysis of type IV IRESes found greater structural diversity outside of the core elements involved in ribosome binding and subdomain orientation . Additionally, IRESes are often more varied between viruses that were otherwise more similar, suggesting that the IRESes spread through horizontal gene transfer and are further altered through recombination outside of the core subdomains required for function . For example, pasiviruses (infecting swine) show strong similarity over the entire genome to parechoviruses (infecting humans), supporting a common origin and both belong to the picornavirus family. However, despite their resemblance, these viruses display significant differences in their IRES sequences, as the pasivirus IRES is the same type employed by swine pestiviruses (family: Flaviviridae) . A second example is illuminated by the discovery of a novel picornavirus clade that infects birds rather than mammals, Falcovirus A1 (Harkavirus). This virus contains a type of IRES previously only found in enteroviruses, suggesting that shuffling of IRESes between viruses may be facile . These findings suggest that the 5′ region of the virus containing the IRES could be prone to recombination, perhaps reflecting adaptation to different hosts. In a practical application of IRES heterogeneity, pestivirus D (also known as border disease virus), which causes significant disease in livestock around the world, is differentiated from similar viruses and classified into different types based on the IRES secondary structure. Characterization of additional strains of the virus has expanded the number of known types from 8 to 10, and phenotypic analysis showed that types were separated geographically . Work to determine functional differences between IRES types may provide further insight into the evolutionary pressures driving IRES diversity.
Diversity of IRES type within Picornavirus clades.
Aside from a potential connection to altered host-specificity, cis-regulatory elements including IRESes can have important clinical implications for viruses causing disease. A recent phylogenetic study of Enterovirus D68 (a picornavirus) found that the 5′ end of the virus containing the IRES mutated over time. Experimentally, these IRESes were found to have increased activity in neuronal cell types, perhaps explaining an uptick in neurological symptoms of patients during this time . This is similar to findings that another cis-regulatory element in HIV, the Rev response element (RRE), was found to evolve over time to become more active through small changes to the RNA secondary structure . These studies demonstrate the impact the evolution of cis-regulatory elements has on viral behavior in a clinical setting.
IRESes provide an example of structurally diverse cis-regulators that have largely the same function with similar mechanisms. Despite the similar function of all lRESes, structural and sequence heterogeneity has hindered identification efforts. Eukaryotic IRESes remain somewhat controversial because they lack a degree of similarity that is shared among even the divergent viral IRESes. Investigations into viral IRESes bring to light the importance of the IRES for successful host infection and suggest that changes to the IRES through horizontal transfer or nucleotide substitution may be driven by adaptation to new cell types or species. Increased understanding into the evolutionary drivers of IRESes in viruses and eukaryotes will continue to lead to better methods to identify IRESes and improve our understanding of the biological mechanisms of individual IRESes and IRES classes.
3′ Untranslated regions (UTRs)
Cis-regulatory elements in the 3′ UTR may have a profound effect on the translation of an mRNA, but the types of elements and mechanisms of action are heterologous and diverse. In eukaryotes, the 3′ UTR is well known to be involved in mRNA post-translational regulation, stability, and localization through binding regulatory proteins or microRNAs. Viral 3′ UTRs also regulate translation and stability, but can include additional regulators for specific viral functions like recoding and host switching. Conversely, the 3′ UTRs of bacterial transcripts have not been as widely explored. However, recent work suggests the bacterial 3′ UTR could contain a wealth of information about evolutionary history as well as underappreciated regulatory mechanisms.
Viral 3′ UTR elements affect viral replication and are often critical for successful proliferation inside a host, including host-specificity and host-switching. RNA thermometers (RNATs) involved in host-switching are well known in bacteria , but an RNAT with a similar host-switching function was recently identified in a flavivirus. According to researchers, West Nile virus has a 3′ RNAT to aid in the switching between cold-blooded and warm-blooded hosts . This viral RNAT was found to alter the circularization rate, and thus the replication rate of the virus, allowing for persistent, low-level infection in insects and acute, high-level infection in warm-blooded hosts. To replicate effectively, plant viruses frequently use CITEs to initiate translation through binding host eIF4E (Figure 1E). Like IRESes, CITEs are organized into different types defined by the RNA secondary structure, but there is little obvious homology between these types. Despite the lack of homology, CITEs appear to function similarly as each class requires the formation of a kissing loop between the 3′ CITE and a stem-loop in the 5′ UTR as well as binding of eIF4F for efficient translation . To test the robustness of these elements to diverse cellular backgrounds, researchers tested the ability of CITEs from nine different viruses to initiate translation in both plant and mammalian cell lines and for eIF4E binding to the CITE, since eIF4E is conserved between these kingdoms . Results showed that while many of the CITEs did not initiate translation in mammalian cells, one CITE from thin paspalum asymptomatic virus could initiate translation in both plant and mammalian cells. Later results found binding between host eIF4E and the CITE and subsequent translation was determined by the presence of a guanosine-rich domain in the CITE pseudoknot. This finding suggests the potential for a similar element currently undiscovered in mammalian viruses. Indeed, a 3′ RNA element with structural similarity to putative plant virus CITEs was recently discovered in Sindbis Virus, a virus infecting insects . While this element did not cause cap-independent translation, the 3′ element did confer host-specificity to the virus, allowing for translation of viral proteins in insect cells, but not mammalian cells. When this 3′ element was added onto a mammalian virus not known to infect insect cells, the virus was able to translate proteins and replicate within insect cells. This example demonstrates how homology may not be completely predictive of function, and how many viral changes are required for host switching.
Like IRES sequences, the use of 3′-elements to regulate cap-independent translation is by no means restricted to viruses. One example of native 3′ UTR impact on translation efficiency in eukaryotes is the transcription factor c-myc mRNA, which includes many cis-regulatory elements. c-myc mRNA is known to be translated in a cap-independent manner using an IRES if cap-dependent translation is inhibited. A recent study found that the activity of both cap-dependent and independent translation is increased by a 3′ element . Thus, cis-regulatory RNA elements in eukaryotic 3′ UTRs may tune translation in eukaryotes to adjust to diverse cellular conditions.
The post-transcriptional regulatory role of 3′ UTRs in bacterial mRNAs is far less clear due to the tight coupling of transcription and translation. However, the 3′ UTR is known to affect mRNA stability , and changes in the sequence potentially lead to changes in expression. Comparative analysis of the 3′ UTRs of orthologous genes across Staphylococcus species showed that conservation is lost downstream of the coding sequence, and that chimeric mRNAs with 3′ UTRs from orthologous genes showed different expression levels. Despite lack of homology across these elements, there are clearly important determinants for RNA stability conferred by these sequence changes. This suggests that divergent 3′ UTRs significantly alter gene expression levels and could thus be important for generating bacterial diversity . Furthermore, technologies such as Term-Seq  have yielded insights into evolutionary differences between and within bacterial species affecting mRNA stability and transcript processing. An in vivo mapping of Rho-dependent transcripts in E. coli showed that the 3′ termini of essentially all protein-coding transcripts include stable structured RNA elements, protecting them from 3′–5′ exonucleases . These 3′ UTR stem–loops resemble those classically associated with Rho-independent termination, but lack a polyU tract. This finding suggests that switching between termination mechanisms may occur facilely, depending only on the presence or absence of polyU sequences.
Structured cis-regulatory RNAs are found across mRNA transcripts and across all domains of life. While homology-based tools allow for the identification of putative RNA elements based on sequence and structure relatively easily, assigning biological function is much more laborious. Mechanistic and validation studies have greatly expanded our understanding of the novel roles cis-regulators play. However, we are now beginning to appreciate that one example may not be representative of the whole. Cis-regulators that have evolved are increasingly found to be more unique than initially presumed, with diverse properties driven by a host of selective pressures. Only following characterization of homologous RNAs can we begin to distinguish them as siblings, doppelgängers, or distant relatives. The examples outlined in this review emphasize that while homology can be a promising starting point, it can also be a restrictive paradigm. Growing beyond homology in our understanding of and search for new regulators will likely reveal yet unexplored aspects of RNA cis-regulatory elements.
Structured cis-regulatory RNAs are important regulators found throughout the domains of life, but identification efforts have been stymied by lack of primary sequence and sometimes secondary structure homology.
Cis-regulatory RNAs have been classically identified through homology-based bioinformatic searches and validation, but new discoveries from structural and mechanistic studies highlight the sometimes unexpected evolutionary relationships between seemingly homologous and divergent regulators.
Recent advances have led to novel experimental approaches that identify RNA regulators lacking obvious homology. Continuing technical developments will allow researchers to better explore the evolutionary origins of such regulators in the absence of significant sequence similarity.
The authors declare that there are no competing interests associated with the manuscript.
This work is supported by the NSF grant MCB 1715440 to M.M.M.
E.C.G., D.M.B., and M.M.M. conceptualized the manuscript, E.C.G. and D.M.B. drafted the manuscript. E.C.G., D.M.B., and M.M.M. revised the manuscript.
We thank Matthew Crum for his work generating secondary structures.
These authors contributed equally to this work.