Splicing is a post-transcriptional modification of RNA during which introns are removed and exons are joined. Most of the mammalian genes undergo constitutive and alternative splicing events. In addition to the strong signals of the splice sites, splicing is influenced at a distance by a range of trans factors that interact with cis regulatory elements and influence the spliceosome. The intention of the present mini-review is to give some insights into the complexity of this interaction and to introduce the consequences of some kinds of detrimental genetic variation on alternative splicing and disease.
Mammalian genes are complex in their make up, typically composed of an average of eight exons, interspaced between much larger introns. Splicing is the highly regulated and precise process by which the introns are removed from the pre-mRNA and exons linked together. Within a given gene, constitutive splicing includes exons that are essential and common to the various transcripts of a gene, whereas alternative splicing allows flexibility, by way of inclusion of additional exons, elongated or shortened exons, or retention of intronic sequence, as well as excluding (skipping) exons in often a tissue-specific or developmentally discrete manner. Alternative splicing in humans occurs in up to two thirds of all genes , allowing a greatly expanded genomic repertoire of transcripts and, accordingly, proteins, thereby increasing proteomic diversity. Crucial to the processes of splicing and alternative splicing is the accurate recognition of exon–intron boundaries and, in consideration of the vast intronic regions between the comparatively smaller exons of some mammalian genes, a fundamental question that remains is how this process is regulated.
Basic splicing machinery
The general mechanism of splicing is carried out by the spliceosome, a large RNP (ribonucleoprotein) complex that is composed of five snRNPs (small nuclear RNPs) particles, U1, U2, U4, U5 and U6, each approx. 200 nt in length, and up to 300 structural proteins. The stepwise initiation of splicing and the coalition of the spliceosome begins with the recognition of the transcript 5′ and 3′ intronic ss (splice sites) by spliceosome components U1 and U2AF (U2 snRNP auxiliary factor) respectively and of the branch point by SF1 (splicing factor 1)/mBBP (mammalian branch-point-binding protein), this protein–RNA structure is called the E complex (early complex). Subsequently, the E complex recruits the branch-point-binding factor U2 snRNP and is converted into the A complex. The associated B complex is formed as the tri-snRNP particle containing U4, U5 and U6 joins the spliceosome, which then undergoes a structural transformation and becomes the catalytic C complex. Splicing is finally accomplished with ligation of the exons (reviewed in ). Far less frequently, minor U12-mediated splicing can also occur.
cis splicing motifs
Much of the specificity and activity of the spliceosome complex relies on the recognition of cis genetic (RNA sequence) motifs in the transcript that control the activity of constitutive and alternative splicing. The most important splicing motifs are the 5′ ss and 3′ ss, named in relation to their respective intronic positions. The 5′ ss has as its core the conserved motif of an exonic AG followed by an intronic GURAGU. The 3′ ss is composed of the branch point (YNYURAC), the polypyrimidine tract and the conserved splice site itself, YAG, adjacent to the exon boundary. These motifs are essential to splicing and determine the general ‘strength’ of intronic splice signals. Strong signals here can determine if an intron is constitutive or alternatively spliced. Despite their important function as splice-site recognition motifs, these sequences are quite degenerate, and it is possible to find many examples of these motifs in the genome that are not describing an exon.
In addition to the degenerate 5′ ss and 3′ ss, there are four additional important classes of splicing motifs. These are ESEs (exonic splicing enhancers), ESSs (exonic splicing silencers), ISEs (intronic splicing enhancers) and ISSs (intronic splicing silencers) (Figure 1). The ESEs and ISEs provide binding sites for a handful of trans factors that promote exon inclusion, with the predominant ones being the SR family of proteins [2,3], so named for their high serine and arginine content. ESSs and ISSs provide binding sites for splicing suppressors, which include members of the hnRNP (heterogeneous nuclear RNP) family of proteins.
Key pre-mRNA spicing regulatory elements
ESEs are the most defined of these genetic splicing motifs [4,5]; however, in general, ESE motifs are quite degenerate and often overlap. For a review of the various ESE sequences, see . The genetic context of these motifs as well as their distribution is of great importance to splicing control, in particular the location of these motifs in relation to the exons (reviewed in ). For example, ESEs tend to be clustered around splice sites  and are of particularly high density within constitutive exons, with different distributions for each of the SR protein motifs . These motifs have been utilized to predict the presence of ESEs by programs such as ESE finder .
Similarly, ESS motifs have also been quite well defined . Precise differences in the location of ESSs also help define exon boundaries and strongly affect splice site choice when there are cryptic or multiple splice sites of similar strength . When located within an exon, ESSs can inhibit exon inclusion and when located between two 5′ or 3′ alternative splice sites of similar strength, ESSs will inhibit the intron-proximal splice site, resulting in the shorter exon. Additionally, traditional ESS sequences, when located within an intron, can promote intron excision .
In contrast with exonic regulatory motifs, and at least partly because their localization is much less predictable, the sequences of ISEs and ISSs are less well defined. Many examples of each have been demonstrated, however, and a previous report has shown high conservation of the intronic sequence 100 bp adjacent to both 5′ ss and 3′ ss of alternatively spliced exons . Recently, computational conservation studies have begun to predict other, previously undescribed motifs , based on sequence and position.
For both the exonic and the intronic elements, the genetic context is of prime importance in determining the effect of the regulatory motif. The ‘G-run’, a repeat of three or more G nucleotides, has roles as both an ESS and as an ISE, since it can enhance exon inclusion if located downstream of the exon, or cause exon skipping if in an exonic location [14,15]. CA repeats can support correct 5′ ss identification and enhance or suppress binding of upstream exons . The CUCUCU binding sites for the hnRNP family also function from within the intron . This is also true of tissue-specific binding motifs such as the YCAY motif clusters recognized by the Nova family proteins  or the UGGAUG motifs found downstream of neural and muscle-specific exons that are targets for the Fox (forkhead box) proteins .
Trans splicing factors
Several families of trans splicing factors are known to bind to the above mentioned cis genetic motifs, with many factors exhibiting tissue-specific expression patterns, and others having ubiquitous expression.
The SR protein family includes a growing number of ubiquitously expressed splicing factors . They are characterized by having one or more RNA recognition motifs as well as an RS (arginine/serine-rich) domain for protein–protein interactions. Family members have slightly different sequence specificities and bind predominantly to ESEs where they influence the splicing of the exon in which they are located; however, in different genetic contexts, SR proteins can act as ISSs. SR proteins are concentrated within nuclear speckles of cells and shuttle to sites of active transcription to initiate their co-transcriptional splicing activity. The activity and subcellular localization of these splicing factors are highly regulated by phosphorylation by SRPKs (SR protein kinases), many of which are located in the cytoplasm and can themselves be regulated by intracellular signalling. In this way, alternative splicing can be regulated by extracellular signals [23–26]. SR proteins also have other functions such as mRNA nuclear transport.
The hnRNPs are a diverse family of approx. 20 ubiquitously expressed major proteins including the important members hnRNP A1 and PTB (polypyrimidine tract-binding protein)/hnRNP I. This protein family can be characterized by containing an RNA recognition motif and a KH (hnRNP K homology) domain or an arginine-glycine box . The hnRNPs operate predominantly as negative regulators of splicing, often in direct competition with SR proteins, or other splicing factors for binding sites. Like SR proteins, hnRNP activity is regulated in part by intracellular location and by interaction with other proteins . As well as their role in alternative splicing, hnRNPs have been reported to be involved in mRNA transport, stability and translation initiation.
Finally, shuttling of both SR proteins  and hnRNP proteins between the nucleus and the cytoplasm  (reviewed in ) is an important mechanism for the regulation and specificity of splicing by these factors. It is in the nucleus that these two antagonistic protein families have their effect on alternative splicing with the ratio of these proteins within the nucleus predetermining their functional roles in splicing.
Tissue-specific splicing is not well understood at present despite recent advances in our understanding of it. A hypothesis suggests that differential expression of SR proteins and their negative regulators, the hnRNPs, will bring about tissue-specific splicing in vivo. This review has discussed specific and localized interactions of SR proteins and their interacting partners from the perspective of local nuclear splicing; what this ignores is the abundance and localization of these ubiquitous splicing factors, which are regulated at the level of the target tissue. Binding motifs for splicing factors are often overlapping and degenerate and thus the splicing factors that bind may vary based on genetic context, affinity and the ratios of the different protein family members. This can determine tissue-specific alternative splicing . For example, artificially altering the ratio of the SR protein SF2/ASF to hnRNP A1 can lead to dramatic changes in alternative splicing [32–34].
In addition to tissue-specific splicing that is regulated by constitutive splicing factors, there are tissue-specific trans splicing factors that are expressed only in their target tissues that recognize their targets by their cis genetic motifs. These factors are fairly rare. One such factor is Nova, a splicing factor whose expression and action are restricted to neurons. Nova's binding sites, clusters of YCAY motifs , tend to be located within a network of transcripts that have related functions in synaptic development [35,36]. Nova1 has been shown to work by blocking the binding sites of the U1 snRNP at the 5′ ss  and brain-enriched PTB-related protein, brPTB, locally antagonizes the effect of Nova . Other examples of tissue-specific splicing factors are the Fox, CELFs [CUG-BP (CUG-binding protein), ETR3-like factors), MBNL (muscleblind-like), TIA (T-cell-restricted intracellular antigen) and HuR proteins [38–42]).
That many of the tissue-specific splicing factors act in the brain is consistent with the observation that the brain (along with the testes) has the highest proportion of alternatively spliced and novel transcripts [43,44]. The high degree of alternative splicing may also be related to the high variation in constitutive splicing factor expression in these two tissues .
cis genetic motifs
The coding sequence of genes is under high conservation pressure in order to maintain the integrity of (i) the protein coding sequence, (ii) ESEs and ESSs for the constitutive and alternative splicing proteins in the SR and hnRNP protein families, and (iii) exonically bound tissue-specific factors, and also of (iv) translational modifiers such as microRNA networks, which are not discussed here . Perhaps the degeneracy and flexibility of the genetic and splicing codes are to some extent explained by the complexity of these requirements? Intronic motifs, which are particularly important in alternative splicing, also show a high degree of conservation (77%) near alternatively spliced exons within 100 bp from the 3′ ss and 5′ ss. This contrasts with the otherwise low (17%) conservation of intronic sequence between mammals, and around constitutively spliced exons which have conserved regions . The binding site for the tissue-specific factor FOX-1 is conserved in its position, sequence and frequency downstream of tissue-specific alternatively spliced exons in mammals and birds .
Genetic variation in
cis motifs and disease
An early estimate suggested that 15% of disease-causing point mutations caused aberrant splicing. This estimate was based on mutations at highly conserved splice sites; however, a growing understanding of cis splicing codes at locations other than splice sites has led to the suggestion that that figure may be as high as 60% .
Changes in splicing regulatory elements can lead to exon skipping, intron retention, and the creation of ectopic splice sites or the activation of cryptic sites. Aberrantly spliced transcripts may lead to protein isoforms with altered properties or included exons with PTCs (premature termination codons), a signal that activates NMD (nonsense-mediated decay, a mechanism in which transcripts with PTCs are identified and efficiently degraded), with the effect of preventing translation of truncated proteins. Alternative splicing of the untranslated regions of transcripts may lead to changes in mRNA stability or localization or control of translation. Mutations in cis acting splicing motifs have been unambiguously linked to diseases such as spinal muscular atrophy (the SMN gene), hypercholesterolaemia (LDLR) and tauopathies (MAPT) among others. Genetic variation that affects trans factors has also been observed. One example is the generation of toxic RNA species by microsatellite expansion in which the presence of expanded CUG repeats in non-coding RNA of the DMPK (dystrophia myotonica protein kinase) or ZNF9 (zinc-finger protein 9) genes is sufficient to alter the activity of both CELF and MBNL proteins and lead to myotonic dystrophy. The mechanisms by which genetic variation in cis motifs leads to disease have been reviewed comprehensively [48,49].
Alternative splicing and its relationship to cancer is also of growing interest, as both cis-mutations and changes in splicing factor expression can cause disease. Mutations in cis-acting motifs, unless they are within splicing factor genes, usually cause discrete changes in splicing and this mechanism has been observed in oncogenes such as BRCA1 (breast-cancer susceptibility gene 1). At a more global level, relative SR and hnRNP protein expression often changes in cancer, which can lead to splicing changes. This is seen in the splicing of the CD44 extracellular domain, a protein involved in cell–cell and cell–extracellular matrix interactions. SR protein-associated changes in CD44's splicing have been linked to cellular invasiveness. Specific kinds of cancer may have similar changes in alternative splicing, leading to ‘signatures’ of splice variants compared with normal tissue. In breast and ovarian cancer, this has been related to the reduced expression of active Fox-2 .
The genetic variations discussed above could be described as severe and have a major impact on splicing. Such severe changes are either frank pathology or are associated mainly with Mendelian diseases that have high penetrance. Recent studies have shown that complex diseases are associated with genetic variation that confers only a modest effect on disease risk. It is plausible that some of these disease-associated variations are in cis regulatory elements and that they lead to subtle changes to the overall splice variant pool, which may in turn influence disease susceptibility.
The control of splicing and alternative splicing is a sensitive and precise system under the control of many interacting factors that rely on genetic sequence elements. Normal and pathological genetic variation within these control elements can affect gene splicing. Understanding the myriad ways in which genetic variation can affect gene expression, in addition to protein coding changes, is important if we are to make further progress in the genetic dissection of both simple and complex human diseases.
Gene Expression in Neuronal Disease: Biochemical Society Focused Meeting held at University of Cardiff, Cardiff, U.K., 16–18 July 2009. Organized and Edited by Nicola Gray (MRC Human Reproductive Sciences Unit, Edinburgh, U.K.), Lesley Jones (Cardiff, U.K.) and Ian Wood (Leeds, U.K.).
CUG-BP (CUG-binding protein), ETR3-like factor
- E complex
exonic splicing enhancer
exonic splicing silencer
heterogeneous nuclear RNP
intronic splicing enhancer
intronic splicing silencer
polypyrimidine tract-binding protein
premature termination codon
splicing factor 1
- SR protein
small nuclear RNP
U2 snRNP auxiliary factor
We thank Chelsea Hewitt, Helmut Butzkeuven and Julian Heng for a helpful reading of the paper.
C.J.J. is supported by a Monash Department Scholarship (Physiology), B.J.O. is a National Health and Medical Research Council of Australia Principal Research Follow., and J.P.R. is supported by a National Health and Medical Research Council of Australia Biomedical Career Development Award.