Using genetically incorporated unnatural amino acids to control protein functions in mammalian cells

Genetic code expansion allows unnatural (non-canonical) amino acid incorporation into proteins of interest by repurposing the cellular translation machinery. The development of this technique has enabled site-specific incorporation of many structurally and chemically diverse amino acids, facilitating a plethora of applications, including protein imaging, engineering, mechanistic and structural investigations, and functional regulation. Particularly, genetic code expansion provides great tools to study mammalian proteins, of which dysregulations often have important implications in health. In recent years, a series of methods has been developed to modulate protein function through genetically incorporated unnatural amino acids. In this review, we will first discuss the basic concept of genetic code expansion and give an up-to-date list of amino acids that can be incorporated into proteins in mammalian cells. We then focus on the use of unnatural amino acids to activate, inhibit, or reversibly modulate protein function by translational, optical or chemical control. The features of each approach will also be highlighted.


Introduction
Knowledge of protein function is of pivotal importance to life science research. It can guide conventional drug development programmes and lead to novel strategies to address currently non-targetable systems [1][2][3]. In order to understand the precise role and interacting network of a protein, it is essential to analyse it within its native environment. For a mammalian protein, its function often also depends on its host cell (e.g. cell type and cell cycle stage), specific subcellular location and post-translational modifications. In addition, a protein of interest often exists in the presence of other closely related homologues (e.g. proteins within the same family), making it difficult to decipher the precise function of a specific protein in cells. Targeting the protein by small-molecule inhibition is often not possible in these cases, as protein homologues will also be affected. To tackle this problem, over the last two decades there has been a drive to develop and refine the technique of genetic code expansion which allows researchers to exploit the cellular translation machinery for site-specific incorporation of unnatural (non-canonical) amino acids into target proteins [4][5][6][7][8][9][10][11][12][13][14]. Consequently, this enables the use of building blocks beyond the 20 canonical amino acids and incorporation of unnatural amino acids with unprecedented functionality into target proteins in live cells. The repurposing of the translational machinery by this approach has paved the way for revealing the functions of proteins under physiological conditions [15][16][17][18][19]. For example, the technique can be used to site-specifically introduce an unnatural amino acid into the homologue of interest, whereby unique functionality (on the unnatural amino acid) can be used for selective activation, inhibition, or reversible regulation of the target homologue [7].
At the molecular level, the mechanism of protein translation is highly conserved in all organisms, where the cellular machinery 'translates' every nucleotide triplet as a codon consecutively on the mRNA into the corresponding amino acid. In nature, the endogenous aminoacyl-tRNA synthetase (aaRS)/tRNA pairs within the cell decode 61 of the total 64 codons to 20 canonical amino acids. The remaining three codons (UAG, UGA and UAA) are used for translation termination, and hence they are also known as 'stop' codons. In order to achieve site-specific incorporation of an unnatural amino acid, an orthogonal aaRS/tRNA pair is needed, which must decode a codon that does not correspond to any canonical amino acid, a so-called blank codon ( Figure 1). Stop codons are most commonly used as a blank codon in genetic code expansion, and decoding of a stop codon is known as 'suppression' because it suppresses the translation termination. The amber stop codon (UAG) is often used as the blank codon due to its minimal occurrence in most organisms.
Within the concept of genetic code expansion, 'orthogonality' refers to the non-reactivity of the orthogonal aaRS/tRNA pair with the endogenous pair and canonical amino acids in the host cell. The orthogonal synthetase must only acylate the orthogonal tRNA with the designated unnatural amino acid; neither canonical amino acids nor endogenous tRNAs are substrates of the orthogonal synthetase; similarly, neither the unnatural amino acid nor orthogonal tRNA is a substrate of the endogenous synthetases ( Figure 2).
Besides the amber codon, other stop codons [20][21][22][23][24][25][26] and different four-nucleotide codons [27,28] have been used as a blank codon. The use of four-nucleotide codons expands the theoretical codon numbers from 4 3 (64) to 4 4 (256) so that multiple different unnatural amino acids can be incorporated at the same time. However, decoding a four-nucleotide codon by the ribosome is less efficient than decoding the normal three-nucleotide codons. Although this issue has been addressed in Escherichia coli through ribosome engineering [29][30][31], the lower efficiency in decoding four-nucleotide codons remains an issue in mammalian systems [27,28].
To date, many unnatural amino acids (1-110, Table 1) can be site-specifically incorporated into proteins produced by mammalian cells using genetic code expansion [5,32]. While the amino acids are structurally diverse, the majority of them can be incorporated through only a few orthogonal synthetases and their mutants. Methanosarcina mazei (Mm) have proven to be extraordinarily useful pairs [4]. The tRNA Pyl naturally recognises the UAG codon and thus engineering of this tRNA is not needed. In addition, this pair is orthogonal in both E. coli and mammalian cells; hence, it facilitates the engineering of PylRS in E. coli and subsequently using the engineered PylRS mutant for incorporation of the designated unnatural amino acid in mammalian systems. As shown in Table  1, a wide range of amino acids has been incorporated into proteins in mammalian cells through only a few point mutations on the PylRS gene.
Many engineered E. coli aaRS/tRNA pairs have also been used as orthogonal pairs in mammalian cells. The most successful ones are the E. coli tyrosine, leucine and tryptophan pairs [33]. However, as all these synthetases naturally recognise a canonical amino acid, it is necessary to abolish their natural activity towards the canonical amino acid and to recognise only the designated unnatural amino acid. As it is technically difficult to perform directed evolution in mammalian cells due to low efficiency in transfection and screening, synthetase engineering is normally carried out in E. coli [34,35] or yeast [15,36,37] so that large mutant libraries can be easily screened. It is also necessary to modify the E. coli tRNA so that it decodes a blank codon instead of a codon corresponding to a canonical amino acid.
Based on the simplicity of the established methodology [38][39][40] and the promiscuity of many orthogonal synthetases towards different unnatural amino acids (vide infra) [41], the number of genetically incorporable unnatural amino acids has steadily increased. In addition, some orthogonal aaRS/tRNA pairs are mutually orthogonal [21,[24][25][26]42] and can be used at the same time to incorporate multiple different unnatural amino acids into a protein of interest.
As recent reviews cover fundamental aspects of genetic code expansion [4,6,9,13], the engineering of new orthogonal synthetases [8], and general [5,10,11,14] or specific [7,12] applications of genetic code expansion in eukaryotic systems, we will focus on recent advances and applications of genetic code expansion for controlling protein function in mammalian cells through translational, optical or chemical means.

Mx1201tRNA
Pyl CUA C41CA MmPylRS [124] wt MmtRNA Method development [140] Photocrosslinking [132,160] Protein engineering [132] Continued over   minimised by including multiple amber codons into the gene of interest. The translational control by amber suppression approach is fully complementary to conventional genetic approaches (e.g. knockout, knockdown) that deplete a protein in cells to 'switch off' its function. In addition, the unnatural amino acid approach is reversible, as removing the unnatural amino acid in the growth medium will 'switch off' the translation of the protein of interest. The translational switch-on process has been widely employed as a reporter system to test incorporation of new unnatural amino acids by using luminescent proteins like green fluorescent protein [46] or luciferase [20]. Upon successful incorporation, cells can emit light, whose intensity directly correlates to the unnatural amino acid incorporation efficiency. Apart from the reporter strategy, this approach has also been used to regulate function of other proteins, such as Cas9 for controllable gene editing in mouse embryos [47].
Besides the general use of the 'translational activation' approach to study protein function, this principle has been proven to be powerful in controlling virus replication [43,48,49]. By introducing TAG codons within the virus genes, viruses can only be generated using cell lines containing an orthogonal synthetase/tRNA CUA pair, and the resulting viruses are replication-incompetent in normal cells due to the lack of amber suppressor tRNA ( Figure 4A) [43,48]. Such replication-incompetent viruses offer an additional tier of control for live-attenuated vaccines and significantly increase their safety. This concept has been further developed by including the genes encoding the orthogonal aaRS/tRNA pair into the viral genome ( Figure 4B) [49]. In this case, viruses can be replicated in wild-type cells and the native hosts, as long as the unnatural amino acid is supplemented. Here, spatial control can also be achieved by local administration of the unnatural amino acid as demonstrated in examples of mice with an expanded genetic code [50,51]. Thus, the approach can be used for controlling viral vectors in gene therapy, where spatiotemporal virus replication and gene editing are highly desirable.
While the translational control approach is quite simple, the response is not instantaneous. There is always a lag time from when the unnatural amino acid is administered into the culture medium until the full-length protein is produced and folded. Similarly, depleting the unnatural amino acid in the growth medium will stop production of new proteins, but the protein function will only be completely switched off when all previously produced proteins are degraded in the cells. Thus, the kinetics of the switching off process largely depend on the half-life of the protein, so the response rate is the same as with genetic knockdown.
light irradiation, the photocage is removed, thereby restoring the protein's function. To date, photocaged cysteines (17)(18)(19)(20)(21), lysines (67, 72-74) and tyrosines (102)(103)(104)(105) have been used to control enzyme function, intein splicing, protein subcellular localisation, virus-host interactions, and cell signalling cascades [13]. The light-activation approach is particularly useful for kinetic studies as it provides extreme spatiotemporal resolution. Spatial control can be achieved to even subcellular locations using focused light beams, which is virtually impossible when using the translational control approach. Theoretically, it is also possible to incorporate photocaged serine in mammalian cells as it has been demonstrated in yeast [53]. Therefore, the light-activation approach is applicable to regulate any protein that has a functionally critical cysteine, lysine, tyrosine, or serine residue in mammalian cells, including but not limited to kinases, DNA-and RNA-binding proteins, proteases, phosphatases, oxidoreductases, isomerases, and ubiquitin-modifying enzymes [54]. In contrast, the incorporation of a photocrosslinking amino acid can be used to inhibit protein function upon light irradiation [55]. In this case, a photocrosslinking amino acid is placed in the interior of the target protein. Upon light irradiation, a highly reactive functionality (e.g. radical, nitrene, carbene) is generated and reacted non-specifically with a nearby amino acid residue, causing cross-linking of the protein and subsequent abolishment of the protein's activity. The feasibility of this approach has been demonstrated with the use of p-benzoylphenylalanine (41) in the study of glutamate receptors, GluA1 and GluA2 [55]. When compared with the use of a photocaged amino acid, inhibiting a protein by photocrosslinking does not rely on the existence of a functionally critical residue, and thus theoretically, it can be used to investigate any protein in mammalian cells. Nevertheless, for each protein target it is necessary to screen a suitable site for placing the photocrosslinking amino acid. Protein variants containing the photocrosslinking amino acid must (i) behave in the same way as the wild-type protein (i.e. phenotypically silent) before light irradiation; and (ii) be fully inhibited after light irradiation causing the photocrosslinking. Due to these criteria, the screening process can be laborious and time-consuming.
In addition to light-induced activation and inhibition, reversible regulation of a protein function can be achieved through incorporation of a photoswitchable amino acid (Table 3). For example, 48, containing an azobenzene functionality which undergoes reversible cis-trans isomerisation upon irradiation with blue and UV light, has been used to control the activity of a glutamate receptor [56]. However, the general applicability of this approach suffers from similar constraints as inhibition by photocrosslinking. Extensive screening is often needed to identify a suitable site for incorporation, such that the resulting protein variant is fully active or inactive upon irradiation with light of a specific wavelength. At the current state of the art, there is no guarantee that such a site can be found in the target protein. Overall, the use of light-responsive amino acids offers superior temporal control of protein function as the response is significantly faster (seconds) than the translational control by amber suppression (minutes to hours). Additionally, spatial control can be achieved at subcellular level, which is not possible with the translational approach. Generally, UV light at approximately 360 nm (i.e. UVA) is required (Tables 2 and 3) to induce the change (i.e. decaging, cross-linking, or isomerisation). However, UVA light has been shown to alter cellular signalling processes [57] or influence proper cellular function, if high intensity irradiation is applied (i.e. 50 J.cm −2 ) [58]. Though not necessarily problematic, this has to be considered when planning to apply light-responsive unnatural amino acids. Thus, there is a trend to develop new functionalities that can be modulated by light of higher wavelengths [52]. In particular, red and near-infrared light (650-750 nm) are appealing because they cause no harm to cells even under excessive exposure, and they can penetrate tissues for in vivo applications. To date, coumarin-caged lysines (73 and 74) are the only genetically incorporable unnatural amino acids that can be decaged within these wavelengths, although by two-photon approach that requires a specialised multiphoton laser setup [52]. Nevertheless, with the continuous advances in light-responsive chemical functionalities and orthogonal aaRS engineering, it is expected that more unnatural amino acids with the desired photophysical properties can be incorporated through genetic code expansion.

Small-molecule induced activation or inhibition
In addition to light, small molecules can also be used to unmask or modify unnatural amino acids and subsequently regulate protein function with prompt response. For example, several protecting groups can be removed bioorthogonally inside live mammalian cells, and these chemistries have been used to switch on protein function by genetic code expansion. Intracellular bioorthogonal reactions that have been used in this purpose include inverse electron demand Diels-Alder reactions [18,[59][60][61], 1,3-dipolar cycloadditions [62], Staudinger reactions [63], and palladium-catalysed propargyl removal (Table 4) [57]. Currently, all of these have only been demonstrated in caged lysine molecules (61, 70, 71, 85) through a number of examples, including activation of luciferases, kinases, nucleases etc. Theoretically, all these protecting groups can be applied to other nucleophilic amino acids (e.g. cysteine, serine, threonine, tyrosine) subjected to successful engineering of the corresponding orthogonal synthetases.   HEK293T GFP [61], Luciferase [59,61], MEK1 [18,60] and MEK2 [18] and FAK [60] and Src [60] kinases Inverse electron demand Diels-Alder reactions Tetrazines [18,[59][60][61] On the other hand, bioorthogonal amino acids (e.g. 77 and 92, Table 5) have been used for rapid and selective inhibition of a specific enzyme in live mammalian cells [64]. In this case, a bioorthogonal amino acid is placed into the target enzyme without affecting the enzyme function. Upon contact with an inhibitor conjugate bearing the complementary bioorthogonal group, the enzyme variant is tethered to the conjugate and thus the enzyme activity is inhibited ( Figure 5). The inhibition is exquisitely selective and can even discriminate between isoforms that differ by a single amino acid residue. Using this approach, selective inhibition of an intracellular kinase for which no selective small-molecule inhibitor exists was achieved. In addition, placing a photoswitchable moiety (i.e. azobenzene) into the inhibitor conjugate enables reversible modulation of enzyme activity by light.   In comparison with light-induced activation or inhibition, small molecules can be used to activate or inhibit the target protein in deep animal tissue or intact animals which are not easily accessible by light. However, as mentioned above, only a few reactions have so far been shown to be bioorthogonal with high reaction rates to allow fast response [65]. The extension of this methodology is therefore tied to the development of novel bioorthogonal reactions. In contrast to the use of light for activation or inhibition, the small-molecule approach, similar to the translational approach, only allows spatial control by using cell-compartment selective compounds or reactions, or local injections.

Conclusion
Genetic code expansion has matured into a technique that can be routinely used in mammalian systems. Controlling protein function is currently mostly achieved by translation, light, and small molecules. These methods have been summarised and discussed, and their features have been compared (Table 6). Translational control is arguably the easiest to perform but suffers from the longer response time (up to several hours). On the other hand, both light and small-molecule induced methods have a faster response (seconds to minutes). The light approach is particularly appealing where subcellular spatial resolution is needed. While all three approaches have shown promise, most of the reviewed applications are so far proof-of-principle studies. The dissemination of this technique could be enhanced by the community simplifying access to plasmids (e.g. through plasmid repository), standardising the reporting format of aaRS mutants with full sequencing information, and providing protocols with extensive details. Only general implementation of protein control by genetic code expansion in the wider scientific community to unravel new biological insights will truly demonstrate the power of these techniques [55,56,[66][67][68]. Since genetic code expansion has also been recently demonstrated in mice [47,50,51,60,69,70], we foresee that optimisation will be tailored to target cells, tissues, and mammalian models and many of the aforementioned approaches will be applied in vivo, providing the complete native environment to study function of mammalian proteins. With further promotion and adaptation of genetic code expansion, this is to be expected.

Summary
• Genetic code expansion can now be routinely used for incorporation of more than 100 different unnatural amino acids in mammalian cells using only four orthogonal aaRS/tRNA pairs and their mutants as shown in Table 1.
• Protein function can be temporally regulated (activation or inhibition) by simple translational control, i.e. supplementation of the desired unnatural amino acid to allow full-length, functional protein production.
• More rapid control can be achieved by incorporating stimuli responsive amino acids which allow activation, inhibition, or reversible regulation of protein function by light or small molecules.
• Most of the approaches have been demonstrated in proof-of-principle studies, but are ready for adaptation by the broader scientific community.