STAR (signal transduction and activation of RNA) proteins are a family of RNA-binding proteins that regulate post-transcriptional gene regulation events at various levels, such as pre-mRNA alternative splicing, RNA export, translation and stability. Most of these proteins are regulated by signalling pathways through post-translational modifications, such as phosphorylation and arginine methylation. These proteins share a highly conserved RNA-binding domain, denoted STAR domain. Structural investigations of this STAR domain in complex with RNA have highlighted how a subset of STAR proteins specifically recognizes its RNA targets. The present review focuses on the structural basis of RNA recognition by this family of proteins.
The STAR (signal transduction and activation of RNA) family of proteins
Throughout the 1990s, a novel family of proteins that functions in RNA metabolism and is regulated through signal transduction pathways, was discovered and named the STAR family . First, Sam68 (Src-associated in mitosis of 68 kDa) was initially identified as a 62 kDa protein binding to Ras-GTP and was later found to be a downstream target of Src and Fyn [2,3]. Sam68 is now the best-characterized member of the STAR family, and two mammalian orthologues have also been identified, SLM-1 and SLM-2/T-STAR (Sam68-like mammalian proteins 1 and 2/testis STAR) [4,5]. Shortly after the discovery of Sam68, a novel subfamily of evolutionarily conserved proteins that share sequence homology with Sam68 was described. These proteins are Caenorhabditis elegans GLD-1 (defective in germline development protein 1) , mouse QKI (quaking)  and Drosophila HOW (held out of wings) [8,9] and play roles in post-transcriptional gene regulation and developmental processes including muscle development, oogenesis and spermatogenesis. Finally, another member, SF1 (splicing factor 1) was identified in 1996 .
Domain organization of STAR proteins
Together, the STAR proteins are related by an evolutionarily conserved domain of approximately 200 amino acids, which is responsible for RNA binding. This domain, denoted STAR domain, can be divided into three distinct subdomains: the central well-known hnRNP (heterogeneous nuclear ribonucleoprotein) KH (K homology) RNA-binding domain surrounded by two flanking regions, the QUA1 and QUA2, which are involved in dimerization and RNA binding respectively  (Figure 1). In addition to this STAR domain, most STAR proteins contain domains or regions important for their proper function (Figure 1). Sam68 possess a nuclear localization signal and a tyrosine-rich domain at the C-terminus, which is phosphorylated by tyrosine kinases such as Src and Brk [2,11]. Sam68 also contain six proline-rich motifs, three to the N-terminal and three to the C-terminal of the STAR domain, which bind SH3 and WW domains to facilitate tyrosine phosphorylation , and two RG-rich (arginine–glycine-rich) motifs that are target sites for arginine methylation by PRMT1 (protein methyltransferase 1) . Finally, Sam68 is also subject to lysine acetylation , SUMOylation  and serine/threonine phosphorylation by kinases such as Cdc2 (cell division control protein 2 homologue) , ERK1/2 (extracellular-signal-regulated kinase 1/2)  and Nek2 [NIMA (never in mitosis kinase)-related kinase 2] . The domain architectures of SLM1 and SLM2/T-STAR are very similar to that of Sam68, with a deletion of the first 100 amino acids that form the N-terminus of Sam68 [4,5]. In addition to its STAR domain, QKI also contains a nuclear localization signal, a tyrosine-rich region and a proline-rich region. QKI has also been shown to be tyrosine phosphorylated by Src and Fyn . In contrast, GLD-1 lacks such domains but has been shown to be phosphorylated by the serine/threonine kinases cyclin E and CDK2 (cyclin-dependent kinase 2) . SF1 is the most divergent member of the STAR family. Its STAR domain contains the KH and the QUA2 regions but lacks the QUA1 dimerization domain. In addition, an N-terminal nuclear localization signal, and C-terminal zinc-knuckle and proline-rich regions are present in SF1.
Domain organization of STAR proteins
Function of STAR proteins in RNA metabolism
All identified STAR proteins have been shown to bind RNA and to be involved in post-transcriptional gene regulation. Sam68 affects various cellular processes such as differentiation, cell cycle progression and apoptosis, through its direct involvement in alternative splicing . Indeed, many pre-mRNA targets of Sam68 have been identified that encode proteins spanning a variety of cellular functions: mTOR (mammalian target of rapamycin) in adipogenesis , neurexin 1 in synapse function , cyclin D1 in cell cycle progression , SRSF1 (serine/arginine-rich splicing factor 1) (formerly ASF/SF2) in splicing , CD44 in cell migration  and Bcl-x (B-cell lymphoma X) in apoptosis . In addition, Sam68 is involved in the nuclear export of the unspliced HIV RNA through binding the viral Rev protein and the viral RNA directly . Whereas Sam68 is ubiquitously expressed in all tissues, SLM1 and SLM2/T-STAR exhibit preferential expression in the brain where they specifically act as alternative splicing regulators of the neurexin 1–3 genes [27,28]. QKI is involved in the regulation of alternative splicing, RNA export and mRNA stability of myelin-associated pre-mRNAs [29,30]. GLD-1 is a cytoplasmic protein that functions as a translational repressor [31,32]. SF1 is an essential SF that binds the pre-mRNA branchpoint sequence during the early stage of spliceosome formation .
Specific RNA recognition by STAR proteins
Several studies have investigated the RNA-binding specificity of STAR proteins. First, SELEX (systematic evolution of ligands by exponential enrichment) experiments have identified purine-rich RNA sequences specifically bound by Sam68 . Most of these sequences contained a conserved UAAA motif. The consensus sequence has been later confirmed and extended to SLM2/TSTAR that was found, like Sam68, to bind specifically a U(A/U)AA motif . The RNA-binding motifs specifically bound by QKI and GLD-1 have also been identified [36,37]. Both proteins recognize very similar motifs, with the consensus sequences being UACU(C/A)A for GLD-1  and NA(A/C)UAA for QKI . These motifs have been confirmed in vitro by SELEX  and in vivo by RIP (RNP immunoprecipitation)–ChIP microarray for GLD-1  and PAR–CLIP (photoactivatable-ribonucleoside-enhanced cross-linking and immunoprecipitation) for QKI . The RNA motif recognized by QKI and GLD-1 is very similar to the highly conserved branchpoint sequence (UACUAAC) specifically bound by SF1 . STAR proteins can therefore be divided into two subfamilies based on their RNA-binding specificities: the QKI/GLD-1/SF1 subfamily that recognizes a six-nucleotide UACUAA motif and the Sam68/SLM1/SLM2/TSTAR subfamily that recognizes specifically AU-rich RNAs containing the four-nucleotide UAAA motif.
Structural investigations of RNA binding by STAR proteins
The RNA-binding STAR domain is very well conserved within the STAR family (Figure 2) and this domain is crucial for the proper function of these proteins in RNA metabolism. Structural studies of STAR proteins have mainly so far focused on the STAR domain free and in complex with RNA. These structures, as detailed below, explain the specificity of RNA recognition by certain members of the STAR family.
Amino acid sequence alignment of the STAR domain of Sam68, SLM2/TSTAR, QKI and GLD-1
The first structure of a KH domain from the human FMR1 (fragile X mental retardation 1) protein was solved by NMR in 1996 . This structure showed that the KH domain adopts a βααββα fold and demonstrated that a single-point mutation on this protein (I304N) leading to the phenotype of fragile X syndrome  was responsible for the unfolding of the KH domain. Later the structure of NOVA-2 (neuro-oncological ventral antigen 2) KH3 was solved in complex with an RNA hairpin . This structure revealed how KH domains specifically bind RNA molecules and highlighted the importance of a highly conserved GXXG motif in RNA binding.
In 2001, the first structure of a KH domain from a STAR protein SF1 was solved by NMR in complex with the canonical branchpoint sequence, UAUACUAACAA  (Figure 3a). In contrast with the KH domains of FMR1 and NOVA, the KH domain of SF1 contains an additional flexible loop located between the second and third β-strands. Interestingly, this additional loop is conserved among the STAR family of proteins (Figure 2). Although this loop is not involved in RNA binding by SF1, deletion of this loop in Sam68 disrupted its RNA-binding ability . The RNA recognition by SF1 is driven by a mixture of hydrophobic, electrostatic and hydrogen bond contacts . The KH domain of SF1 specifically recognizes the U6AAC9 sequence of the U1AUACUAACAA11 RNA. The GXXG motif (GPRG in the case of SF1) recognizes specifically the base of U6 and contacts the backbone of the RNA. Specific hydrogen bonds to the bases of the RNA are formed in SF1 between Glu149 and A7, Ile177 and A8, and Ser182 and A11 (Figure 3a). In addition, Asn151 contacts specifically the base of C5 via a hydrogen bond. Interestingly, this structure also revealed that the QUA2 region that is conserved among the STAR family (Figure 2) adopts a helical conformation, packs against the KH domain and is also involved in RNA binding (Figure 3a). Indeed, the QUA2 domain recognizes four nucleotides, A2UAC5, of the RNA. Specifically, Thr253 contacts A2 through specific hydrogen bonds, and Arg255, Ala248, Leu244 and Leu247 make hydrophobic contacts to A4 and C5. This study also demonstrated that, although QUA2 is flexible in the absence of RNA, the α-helix is already pre-formed in solution and does not occur upon RNA binding . This was confirmed later through an NMR dynamic study of the KH–QUA2 domain of the Xenopus QKI protein showing that the QUA2 region is helical but highly flexible in the absence of RNA and becomes rigid in the presence of RNA .
Structures of members of the STAR family of proteins with their RNA
More recently, the structure of the QUA1 domains of Sam68, GLD-1 and QKI have been solved [45–47]. These structures showed that the QUA1 region of these proteins consists of a helix–turn–helix motif stabilized by a network of hydrophobic contacts. The dimer is formed by an almost perpendicular stacking of the two monomers and is stabilized by a network of hydrophobic contacts and hydrogen bonds involving mainly residues of the turn between the two helices. The structures demonstrated that the QUA1 domain alone is sufficient for STAR protein dimerization. In the case of Sam68, it has been proposed that the phosphorylation of one tyrosine residue (Tyr103), localized in the QUA1 domain, could be involved in the dissociation of the homodimer, and mutation of this residue leads to a loss of splicing activity . This suggests that the dimerization of the protein is compulsory for its function in alternative splicing and that tyrosine phosphorylation could regulate Sam68 function by disrupting its ability to dimerize.
In 2013, the structures of the full STAR domains of QKI and GLD-1 in complex with their target RNAs were solved by X-ray crystallography  (Figures 3b and 3c). These structures revealed that the RNA is only bound by the KH and the QUA2 regions, whereas the QUA1 region is involved in dimerization and protein–protein interaction with the QUA2 region. The RNA sequences bound by QKI and GLD-1 are very similar to the one bound by SF1 (CUAAC) and, consequently, the RNA recognition by GLD1 and QKI proteins involves the same conserved residues of KH and QUA2 as SF1. Specifically, the base of C1 is specifically recognized by Lys190 and Lys313 of QKI and GLD-1 respectively. The UAA motif that overlaps perfectly on all the structures contacts Gln193 (QKI) and Gln316 (GLD1) for U2, and Asn97 (QKI) for A3. The last adenine forms two hydrogen bonds with the main chain of the protein, a valine residue in both QKI and GLD-1. The last cytosine (C5) is recognized by an arginine residue (Arg124 and Arg247 for QKI and GLD-1 respectively). As for the structure of SF1, the QUA2 domain of QKI and GLD1 is strongly involved in the interaction with the RNA. These structures also show that the QUA1 domain interacts with the QUA2 region, forming a three-helix bundle. This interaction induces a precise orientation of the KH-QUA2 domains in the dimer and positions the two RNA-binding surfaces at opposite ends of the dimer. This orientation suggests that one dimer can bind a single RNA molecule only if the two binding motifs (CUAAC) are separated by more than ten nucleotides. In addition to the QUA1–QUA2 interaction, the structure of the GLD-1 STAR domain shows that the QUA1–KH linker contacts the KH domain through hydrogen bonds and van der Walls contacts. It was proposed that these contacts facilitate the orientation of the QUA1, KH and QUA2 regions within the STAR domain. It should be noted, however, that this linker is clearly visible in the GLD1 structure, but not in the QKI structure, suggesting that the linker is more flexible in the case of QKI. This flexibility could be important for STAR proteins to bind two RNA motifs that are relatively close (fewer than ten nucleotides on the same RNA) . Further structural studies will be needed to address this issue.
In contrast with SF1, QKI and GLD-1 that specifically bind a CUAAC RNA motif, Sam68, SLM1 and SLM2/TSTAR bind specifically AU-rich RNA sequences (see above). The molecular details of the specific AU-rich RNA recognition by these proteins remain unknown. We have therefore initiated a structural study of Sam68 and STAR in complex with AU-rich RNA . Using NMR spectroscopy, we have identified the optimal protein constructs of Sam68 and SLM2/TSTAR, and, through an AU-rich RNA NMR screen, the optimal RNA sequence for the structure determination of these complexes.
RNA UK 2014: An Independent Meeting held at Low Wood Hotel, Windermere, U.K., 24–26 January 2014. Organized and Edited by Niki Gray, Gracjan Michlewski and Steve West (University of Edinburgh, U.K.).
fragile X mental retardation 1
germline development protein 1
neuro-oncological ventral antigen
systematic evolution of ligands by exponential enrichment
Src-associated in mitosis of 68kDa
splicing factor 1
Sam68-like mammalian protein
signal transduction and activation of RNA
This work was supported by a Medical Research Council Career Development Award to C.D. [grant number G1000526] and by a College of Medicine, Biological Sciences and Psychology, University of Leicester, studentship to J.F.