The race to identify a successful treatment for COVID19 will be defined by fundamental research into the replication cycle of the SARS-CoV-2 virus. This has identified five distinct stages from which numerous vaccination and clinical trials have emerged alongside an innumerable number of drug discovery studies currently in development for disease intervention. Informing every step of the viral replication cycle has been an unprecedented ‘call-to-arms' by the global structural biology community. Of the 20 main SARS-CoV-2 proteins, 13 have been resolved structurally for SARS-CoV-2 with most having a related SARS-CoV and MERS-CoV structural homologue totalling some 300 structures currently available in public repositories. Herein, we review the contribution of structural studies to our understanding of the virus and their role in structure-based development of therapeutics.
Introduction
The causative agent of the coronavirus disease 2019 (COVID-19) pandemic is a novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); a betacoronavirus closely related to SARS-COV and MERs-CoV. SARS-CoV-2 is a positive-strand RNA virus comprising a long single-stranded (30 Kb) RNA genome that encodes up to 14 open reading frames (ORFs) [1]. Two large polyproteins are translated from the 5′ ORF1a and ORF1b, and subsequently processed into 16 viral non-structural proteins (Nsp1–16) via auto-proteolysis. Several ORFs at the 3′ end encode the four main structural proteins of the virus, namely the Spike (S), envelope (E), membrane (M) and nucleocapsid (N) which associate with lipids to comprise the virus capsid (Figure 1A).
Schematic representation of the SARs-CoV-2 genome and viral replication cycle.
(A) schematic representation of the SARS-CoV-2 RNA genome, ORF1a and 1b and the component proteins. (B) cartoon of the SARS-CoV-2 viral entry, genomic transcription & translation, viral replication, host mediation, packaging and budding stages within the viral replication pathway. Credit to Maria Voigt/RCSB PDB for the Virus rendering.
(A) schematic representation of the SARS-CoV-2 RNA genome, ORF1a and 1b and the component proteins. (B) cartoon of the SARS-CoV-2 viral entry, genomic transcription & translation, viral replication, host mediation, packaging and budding stages within the viral replication pathway. Credit to Maria Voigt/RCSB PDB for the Virus rendering.
A concerted global scientific effort into the characterisation of the SARS-CoV-2 proteome has been undertaken on an unprecedented scale. Such work falls into two broad categories, (i) understanding the immune response to the virus such as the adaptive immune response, and (ii) defining the functional role of the SARS-CoV-2 proteins. The former informs vaccination and immunisation strategies whereas the latter serves to establish targets for therapeutic intervention at key viral replication stages. There are five main arenas where therapeutic interventions are gathering momentum, namely, (i) viral entry, (ii) RNA transcription and polypeptide processing, (iii) viral replication and viral assembly, (iv) host-mediation and (v) budding (Figure 1B). Here we will summarise the structural information of the SARS-CoV-2 genome available to date and highlight its respective impacts on understanding the SARS-CoV-2 replication life cycle and its attendant role in developing strategies for disease intervention. This includes drug repurposing, antiviral development and vaccine development. The order of this review (outlined in Figure 1C) will describe the structural proteins that comprise the viral envelope and their targeting for immunisation and neutralisation of the virus and then progress through to the non-structural proteins that play fundamental roles in the replication of the virus and how disruption of these processes through antiviral treatment offers targets for COVID19 treatment.
Viral entry and the role of the structural proteins
The envelope of the SARS-CoV-2 virus has been an area of much research and interest owing to its pivotal role in host-cell recognition and viral invasion. As such the proteins at the virion surface have been intensively studied in hopes of informing therapeutic targeting. No protein illustrates the rapid and prolific structural characterisation efforts into SARS-CoV-2 more than the Spike (S) protein, a heavily glycosylated trimeric class-I fusion protein.
Spike, S protein
SARS-CoV-2 S proteins are the virion surface proteins that provide the ‘corona’ evident on these viruses. The S-proteins comprise an S1 and S2 subunit whereby host cell entry is mediated via the S1 interaction with the angiotensin-converting enzyme 2 receptor (ACE2) which enables proteolytic processing of the S2 domain to facilitate membrane fusion [1]. The S1 subunit comprises a signal peptide, and four domains; an N-terminal domain receptor-binding domain (RBD), and two C-terminal domains (Figure 2A). The S2 subunit includes a fusion peptide, heptad repeats 1 and 2, a transmembrane domain and a cytoplasmic domain. Viral adhesion to a target cell is mediated through a trimeric association of the spike protein in a metastable pre-fusion form. The structure of the pre-fusion core was one of the first SARS-CoV-2 structures solved and described an asymmetric trimer with a trigonal shape that closely resembled that of SARS-CoV (Figure 2A) [2]. The Lower apex portion of the assembly was comprised of a trimeric arrangement of the S2 domains with the S1 domains wrapping around the 3-fold axis with the RBD domains sat atop. One of the RBD domains was in an upright conformation, which along with breathing of the S1 domain, indicated plasticity of this domain. Indeed, this observation of an up-down alternation of the RBD has been corroborated in addition to the observation of a fully down conformation of the RBDs (Figure 2C) [3].
Overview of the SARS-CoV-2 structural protein structures.
(A) Cryo-electron microscopy structures of the pre- and post- fusion spike protein (PDB: 6VSB & 6XRA) (B), with the N-terminal domain (NTD), Receptor binding domain (RBD), C-terminal domain (CTD) 1 & 2, central helix (CH), Fusion peptide proximal region (FPPR), Heptad repeat (HR) 1 & 2 coloured pale blue, pale pink, forest green, pale green, wheat, blue, pale yellow and raspberry, respectively. (C) Expanded view of the homotrimeric arrangement of the RBD domains which can move from an open to closed conformation. The antibody derived fragment antigen-binding (FAB, shown grey) bound complexes of the S-protein RBD domains in both the open and closed conformation (PDB: 6ZDH & 6WPS) (D). (E) Complex structure of the RBD domain bound to the full-length homodimeric human ACE2 receptor including the peptidase domain (PD) and collectrin-like domain (CLD) in complex with B0AT1 shown wheat and pale yellow, respectively (PDB: 6M1D). (F) Structural comparison of three structures of the Nucleocapsid protein RNA binding domain (RBD) domain illustrating the movement of the β-hairpin loop (PDB: 6VYO, 6WKP & 6M3M) and the obligate dimer of the NP dimerisation domain (PDB: 6WZO), (G).
(A) Cryo-electron microscopy structures of the pre- and post- fusion spike protein (PDB: 6VSB & 6XRA) (B), with the N-terminal domain (NTD), Receptor binding domain (RBD), C-terminal domain (CTD) 1 & 2, central helix (CH), Fusion peptide proximal region (FPPR), Heptad repeat (HR) 1 & 2 coloured pale blue, pale pink, forest green, pale green, wheat, blue, pale yellow and raspberry, respectively. (C) Expanded view of the homotrimeric arrangement of the RBD domains which can move from an open to closed conformation. The antibody derived fragment antigen-binding (FAB, shown grey) bound complexes of the S-protein RBD domains in both the open and closed conformation (PDB: 6ZDH & 6WPS) (D). (E) Complex structure of the RBD domain bound to the full-length homodimeric human ACE2 receptor including the peptidase domain (PD) and collectrin-like domain (CLD) in complex with B0AT1 shown wheat and pale yellow, respectively (PDB: 6M1D). (F) Structural comparison of three structures of the Nucleocapsid protein RNA binding domain (RBD) domain illustrating the movement of the β-hairpin loop (PDB: 6VYO, 6WKP & 6M3M) and the obligate dimer of the NP dimerisation domain (PDB: 6WZO), (G).
In this closed conformation, the ACE2-recognition interface is buried and interacts with the neighbouring protomer. An extensive conformational characterisation of the S-protein RBD variants recently built upon these observations and studied environmental factors in the dynamics of the S-protein [4]. In addition, a structure of the full-length S-protein including the transmembrane region offered further clarity on the N-terminus and how it is thought to clamp the RBDs in a closed-conformation [5]. Recently the structure of the post-fusion S3 trimer was solved and highlighted the extension of the central triple coiled-coil (Figure 2B) and a triple β-sheet of the S1/S2–S2′ to project the heptad repeat 2 toward the viral membrane [5]. Given the therapeutic potential of disrupting the S-protein interactions, a multitude of neutralising antibodies and analogous modalities targeting cryptic and conserved epitopes of the S-protein have also been determined. These include complexes targeting the open form of the spike protein, thought to be the state immediately prior to host-cell binding and viral entry, as well as targeting of the closed conformation which is progressing to clinical trials (Figure 2D) [6,7].
Besides the pre-and post-fusion structures, a myriad of complexes of the RBD and hACE2 receptors have also been determined. Two of the first such structures include a chimeric SARS-CoV-2 RBD and truncated RBD in complex with the peptidase domain of ACE2 [8,9]. The RBD has a twisted 5-stranded antiparallel β-sheet core. Sandwiched between β4 &7 is an extension comprising the short β5 & 6 and α4 & 5 and connecting loops that comprise the receptor-binding motif that cradles around an N-terminal helix of the ACE2 peptidase domain [8,9]. These structures highlight the interactions with ACE2 providing unrivalled insight into the molecular determinants of the interaction which in concert with biophysical data provides a molecular understanding of viral cell-binding and host cell-entry via ACE2 binding. More recently an RBD complex with full-length ACE2 showed the architecture of the natively dimerised hACE2 in complex with the sodium dependent neutral amino acid transporter B0AT1 in a new light (Figure 2E) [10]. ACE2 dimerises via a C-terminal collectrin-like domain and single-pass transmembrane that orchestrates a peripheral interaction with the amino acid transporter B0AT1 on either side of the ACE2 dimer. The structure of the higher-order assembly showed the closed state of ACE2 was bound by the RBD in contrast with the open variant observed in the unidentified ACE2–B0AT1 ternary complex [10].
Nucleocapsid, N protein
Whereas the S protein is involved in viral entry, another key component of the viral envelope is the nucleocapsid protein (N protein) which is involved in nucleocapsid formation. The SARS-CoV-2 N protein is an RNA binding protein involved in viral RNA replication and processing [11]. More specifically N protein binds and packages viral RNA into long helical ribonuclear core complexes, facilitating viral assembly and budding, RNA synthesis, and host-cell interactions [12]. N protein exhibits a conserved architecture across betacoronaviruses, comprising an N-terminal RNA binding domain, a central serine-arginine rich linker and a C-terminal dimerisation domain. The structures of the SARS-CoV-2 N protein RNA binding domain highlight the conserved architecture comprising a β-sheet core of 5 antiparallel β-sheets, an extended β3-4 hairpin and an acidic loop containing a 310 helix. Overall, the N protein RNA binding domain is largely basic and oft described as a right hand-like structure with a protruding basic finger, basic palm and acidic wrist (Figure 2F). The molecular determinants for RNA binding by N protein remains opaque [13], although structures have suggested a potential nucleotide-binding pocket [14].
Comparison to the homologous human betacoronavirus-OC43 N protein identified a hydrophobic pocket that bound adenosine monophosphate [15]. Notwithstanding a pronounced movement of the N-terminus, the hydrophobic pocket persisted in SARS-CoV-2 N protein with comparative coordination available for nucleotide binding, imparting N protein with a higher binding affinity to viral RNA [14]. Indeed, an nuclear magnetic resonance study of the SARS-CoV-2 N protein binding to an RNA duplex implicated these and neighbouring residues in RNA binding — although atomic detail is yet to be determined [16].
Numerous N protein dimerisation domains have recently been determined, highlighting the obligate dimeric nature of this domain across coronaviruses [17]. Indeed, such dimerisation and further oligomerisation are implicated in directing the helical assembly of the ribonuclear core complexes. N protein dimerisation is orchestrated by a protruding β-hairpin that intercalates with the corresponding hairpin of a second N protein to form extensive main chain contacts that form the core of the dimeric interface (Figure 2G) [18]. Furthermore, biophysical evidence is emerging that the dimerisation domain C-terminal tail directs further ribonuclear oligomerisation [17]. Thus, the N protein structures have revealed potential sites for drug targeting, namely the nucleotide-binding sites within the RNA-binding domain which are already being utilised for in silico drug screening, as well as broad drug repurposing studies to inhibit N protein function [19]. In addition, the SARS-CoV-2 N protein is highly immunogenic and elicits a potent, and potentially protective response from B and T cells [20–22]. As such N protein represents a promising target for vaccination strategies and its importance in viral genome packaging means it is also of interest for therapeutic development [23].
In addition to the structural proteins that make up its virion the SARS-CoV-2 genome encodes ∼16 non-structural (Nsp) proteins that play fundamental roles both in the formation of the viral replicase complex, the creation of the double-membrane vesicle and influence of host cellular pathways.
Transcription and translation
The SARS-CoV-2 RNA dependent RNA polymerase (RdRp) is a multi-protein complex that is central to the replication and transcription of the SARS-CoV-2 genome. As such the SARS-CoV-2 RdRp is a major target for antiviral development.
RNA dependent RNA polymerase, Nsp7, 8 and 12
Coronaviruses utilise a highly conserved RNA dependent RNA polymerase pathway for replication and transcription of the viral RNA genome. The SARS-CoV-2 RNA polymerase is a multimeric complex that encompasses the core catalytic domain Nsp12, which has little activity until complexed with the Nsp7 and 8 co-factors [24]. The recent structure of the SARS-CoV RdRp complex was closely followed by that of SARS-CoV-2 where both complexes included a single copy of Nsp7 and12 and two copies of Nsp8 [25–27]. Nsp12 comprises a β-hairpin, an extended nidovirus RdRp-associated nucleotidyltransferase domain (NiRAN), and an interface domain that links the NiRAN and the RdRp domain [26]. The SARS-CoV-2 RdRp displayed the classical right-handed architecture which includes finger, palm and thumb subdomains [26,27]. The catalytic core of the RdRp comprises seven conserved polymerase motifs termed A to G which reside in the palm domain closely resembling that of the SARS-CoV RdRp [25]. Furthermore, as with other RdRps the primer-template entry, nucleotide triphosphate (NTP) entry, and nascent strand exit channels are positively charged, solvent accessible and converge on the catalytic core [26,27].
Given its pivotal role in viral genome replication the RdRp is a top antiviral target with nucleotide analogues including Remdesivir, Favipavir, Ribavirin and Galidesivir all in various stages of development [28,29]. Indeed, structures of the SARS-CoV-2 RdRp in various states have been determined, including (i) in complex with RNA (ii) in complex with a modified base derived from the Remdesivir protide, (iii) in the pretranslocated state and (iv) in the posttranslocated state (Figure 3A,B) [26,27,30]. The template-Remdesivir triphosphate bound RdRp structure comprises a stoichiometric complex of Nsp7, 8 & 12 [27] whereas the pre- and posttranslocated states had dual Nsp8 occupancy with the pretranslocated complex including a helical Nsp8 extension (Figure 3B) [27,30]. The template complex included a 14 base RNA in the template strand, an 11-base RNA in the primer strand incorporating the modified RNA base at the 3′ end requiring only modest subdomain movements [27]. The double-stranded helix, formed by the union of the template and primer RNAs was bound within the finger, palm and thumb subdomains [27]. Although numerous Nsp12 contacts were evident to the RNA helix many were coordinating contacts to the 2′-OH groups of the backbone and thus sequence-independent [27]. The Remdesivir derivative is incorporated at the 3′ end of the primer strand, forming base stacking interactions with an upstream primer base and hydrogen bonds with the template uridine base, with conserved coordination within the pretranslocated RdRp complex at residues positions K545 and R555 of motif F of the finger subdomain (Figure 3C) [27,30]. In addition, two magnesium ions and pyrophosphate were evident at the catalytic site in one structure and served to coordinate the phosphodiester backbone and block nucleotide triphosphate entry, respectively [27]. Thus, Remdesivir is implicated in RNA chain termination. Indeed, additional nucleotides of the template sequence were evident within the posttranslated structure [27]. Targeting of the RdRp represents one of the major research avenues for SARS-CoV-2 with in vivo and clinical trials currently in progress.
Overview of the structures involved in viral RNA replication and posttranscriptional processing.
(A) Cryo-electron microscopy structures of the pre and post-translocated (B) (PDB: 7BZF & 7C2K), RNA polymerase complex encompassing Nsp7, Nsp8 and Nsp12 coloured aqua, pale green and pale yellow with the NiRAN domain, ‘Fingers’, ‘palm’,’ thumb’ and RNA (coloured pink, raspberry, forest green, wheat & light grey). (C) Coordination and incorporation of Remdesivir-monophosphate (Remdesivir-RMP) in the pretranslocated complex (yellow sticks). (D) Nsp13 helicase bound RdRp structure with the Nsp13 chains shown with the zinc binding domain (ZBD), 1B domain, RecA-like helicase (RECA) domains 1 & 2 coloured red, blue, salmon and purple, respectively (PDB:6XEZ). (E) The Nsp9 homodimer, coloured pale blue and pale green, highlighting the cavities about the GxxxG helical dimer interface (wheat) and the peptide bound within this cavity (red) (PDB: 6WC1) (F). The Nsp10–16 complex (shown as a semi-transparent surface over a cartoon representation in pale blue and green, respectively with the bound S-adenosyl-L-methionine (SAM), RNA cap, and adenosine shown as blue, yellow and white sticks, respectively. Electrostatic surface representation of Nsp16 bound to the RNA cap (G) in comparison with the binding modes of SAM (blue sticks) and the pan inhibitor Sinefungin (pink sticks) (PDB: 6WKS) (H).
(A) Cryo-electron microscopy structures of the pre and post-translocated (B) (PDB: 7BZF & 7C2K), RNA polymerase complex encompassing Nsp7, Nsp8 and Nsp12 coloured aqua, pale green and pale yellow with the NiRAN domain, ‘Fingers’, ‘palm’,’ thumb’ and RNA (coloured pink, raspberry, forest green, wheat & light grey). (C) Coordination and incorporation of Remdesivir-monophosphate (Remdesivir-RMP) in the pretranslocated complex (yellow sticks). (D) Nsp13 helicase bound RdRp structure with the Nsp13 chains shown with the zinc binding domain (ZBD), 1B domain, RecA-like helicase (RECA) domains 1 & 2 coloured red, blue, salmon and purple, respectively (PDB:6XEZ). (E) The Nsp9 homodimer, coloured pale blue and pale green, highlighting the cavities about the GxxxG helical dimer interface (wheat) and the peptide bound within this cavity (red) (PDB: 6WC1) (F). The Nsp10–16 complex (shown as a semi-transparent surface over a cartoon representation in pale blue and green, respectively with the bound S-adenosyl-L-methionine (SAM), RNA cap, and adenosine shown as blue, yellow and white sticks, respectively. Electrostatic surface representation of Nsp16 bound to the RNA cap (G) in comparison with the binding modes of SAM (blue sticks) and the pan inhibitor Sinefungin (pink sticks) (PDB: 6WKS) (H).
Helicase, Nsp13
Forming an adjunct of the RdRp complex, Nsp13 functions as a helicase that utilises the hydrolysis of NTP to unwind double stranded-RNA into two single-stranded chains and to translocate along single-stranded RNA [31,32]. As such Nsp13 precedes the replicative machinery to enable genome replication and transcription of the 8 coding mRNAs. Nsp13 is highly conserved amongst nidoviruses (only 1 amino acid substitution from SARS-CoV) together with its fundamental role in viral replication makes it an attractive target for broad spectrum therapeutics to target this viral family [33].
The apo-structure of the MERS-CoV and SARS-CoV Nsp13s are highly similar to the Nsp13 from the SARS-CoV-2 holo-complex and provided the first look at the unique domain architecture of the nidoviral helicases [34–36]. Comprised of 2 tandem C-terminal RecA-like helicase domains (RecA1 and RecA2) and bridging stalk and 1B domains, Nsp13 also has a unique N-terminal zinc-binding domain containing 3 zinc-finger motifs (Figure 3D) [34,35]. The RecA-like domains catalyse the unwinding of the double stranded RNA and translocation of the complex via NTP hydrolysis, with the single-stranded RNA passing at the junction of the tandem RecA-like domains and the 1B domain, with the 3′ end towards the RecA1 domain and the 5′ end towards the RecA2 domain [34]. The stalk domain serves as a link between the RecA-like/1B domain and the zinc-binding domain which in turn serves as an interface with other components of the replicative machinery (Figure 3D).
The structure of the Nsp13 in complex with the holo-RdRp/RNA assembly was recently determined by cryo-electron microscopy and revealed the assembly with the multi-subunit RdRp machine and the interactions between Nsp13 and the Nsp7, Nsp8 and Nsp12 components (Figure 3D) [35]. The RdRp complex comprised 2 copies of Nsp13 (Nsp13-1 and Nsp13-2). The zinc-binding domains of Nsp13 interact with the Nsp8 N-terminal extension. The other contacts are asymmetric between Nsp13-1 and Nsp13-2. Specifically, the Nsp13.1 zinc-binding domain bound the thumb domain of Nsp12 whilst the Nsp13-1 RecA1 domain bound Nsp7 and the head of Nsp8 (Figure 3D). In contrast the Nsp13-2 makes no additional contacts beyond the Nsp8 N-terminal domain and Nsp13-1 within the holo-RdRp complex (Figure 3D) [35]. Accordingly, the architecture of Nsp13 within the larger assembly provides insight into the role of the protein in RNA replication and a potential mechanism for template switching and delineates targets for antiviral therapeutic discovery [35].
As proof-of-concept, small molecule drugs that inhibit viral helicases have been previously developed for other viruses including herpes simplex virus and hepatitis C [37,38]. As such, identifying small-molecule Nsp13 inhibitors provides a pathway for therapeutic intervention. Target areas include; (i) Blocking the interaction and assembly of Nsp13 with Nsp7, Nsp8 and Nsp12; (ii) inhibiting NTPase hydrolysis with active site binding analogues; (iii) inhibiting the double stranded-RNA unwinding activity of the helicase; (iv) inhibiting nucleic acid binding and (v) sterically blocking the movement of the RecA-like domains relative to each other [39]. Of these, RNA aptamers containing adenosine/guanine-rich sequences of ∼15 nucleotides have been shown to inhibit the unwinding activity of Nsp13 [40]. Similarly, drugs that target the NTPase active site have shown efficacy in vitro [41,42].
Single-stranded RNA binder, Nsp9
In addition to the central role of RdRp, Nsp9 is thought to play a similarly essential role in viral RNA replication as a single-stranded RNA binder in SARS-CoV [43]. This small single-domain protein is described as a replicase and is thought to associate with the host nuclear pore [44], but its precise role in viral replication remains to be fully elucidated. Nsp9 adopts a coronaviral-unique fold consisting of a 6-stranded β-barrel that forms obligate dimers through a C-terminal α-helical element (Figure 3E) [45]. The Nsp9 dimerisation interface is essential for viral replication and is created through a mini coiled-coil interaction of the C-terminal helices overlaid by an N-terminal strand-like structure exchanged across the interface [45]. A conserved GxxxG interaction motif within the α-helix facilitates a mini coiled-coil dimerisation interaction and may also provide an extraneous peptide-binding site within this domain (Figure 3E) [45]. In other coronaviruses, Nsp9 analogues have been reported to bind long viral RNAs with weak affinity but the mode of binding is yet to be determined; several positively charged loops protruding from the β-barrel domain offer theoretical possibilities [46,47]. The RNA-binding activity of alphacoronaviral Nsp9 proteins is thought to depend upon a redox-induced parallel to antiparallel oligomerisation switch, however it is not clear if a similar mechanism is conserved within the betacoronaviruses [48]. The function of Nsp9 within the viral life-cycle may need to be better characterised before small molecule inhibitors can be effectively designed to target its activity, although it is noteworthy that it is one of the more conserved coronaviral proteins; it has been hypothesised that compounds that disrupt the Nsp9 dimer interface may have some potential as antivirals.
RNA CAP methyltransferase, Nsp10 and Nsp16
Capping of the viral RNA genome is critical for enhancing its stability inside the host cell while also serving to mediate polyprotein translation [49]. For SARS-CoV-2, RNA capping is achieved through the sequential action of a number of the non-structural proteins: Firstly, Nsp13 (a helicase described above) and an unidentified guanylyltransferase generate the Cap-0 (me7GopppA1) structure on the nascent end of mRNA, before Nsp14 (a guanine N7-methyltransferase), methylates the Cap-0 guanylate, and Nsp16 (a 2′O-methyltransferase) methylates the ribose 2′-O position of the first nucleotide to generate the completed RNA Cap-1 (me7GopppA1m) [49–51].
Nsp14 and Nsp16 each co-complex with Nsp10 to utilise S-adenosyl-L-methionine as a methyl donor to bind the RNA cap [52]. Disrupting this Nsp10/Nsp16 interaction is an attractive therapeutic target with Nsp10 derived peptide inhibitors specifically inhibiting 2′-O-methyltransferase activity and impairing viral replication in SARS-CoV replicons in culture [53,54]. To aid in inhibitor design, the X-ray crystal structure of the obligate Nsp10–16 complex has been solved both with and without the RNA cap substrate (me7GopppA1), with the overall complex displaying a high degree of similarity with the SARS-CoV Nsp10/16 complex (∼0.3 Å across 322–331 Cα atoms) (Figure 3F) [55–58]. As for SARS-CoV, Nsp16 is composed of twelve β-strands, seven α-helices, and five 310 helices whereas Nsp10 possesses a central antiparallel pair of β-strands and a helical domain with two zinc fingers. The Nsp10 zinc coordinating residues are stringently conserved across betacoronaviruses highlighting the requirement for zinc coordination. Along with the Nsp10/Nsp16 interface, the X-ray crystal structures highlight the S-adenosyl-L-methionine and RNA cap substrate binding pockets as opportunistic features for the development of antiviral therapeutics. The positively charged RNA cap-binding cavity envelops the me7GopppA1, whereas the adjacent highly conserved S-adenosyl-l-methionine donor binding site is formed by a deep negatively charged groove, exploited by the non-specific pan-methyltransferase inhibitor, sinefungin (Figure 3G,H) [57]. Nsp16 also features a novel binding pocket ∼25 Å away from the catalytic pocket, with various ligands crystallised at this site (including adenosine), although the functionality of this site requires further investigation.
Posttranslational processing
Following RNA-replication and translation, the viral polyproteins 1a and 1b require posttranslational processing, achieved via two viral proteases, Nsp3 and Nsp5. The SARS-CoV-2 papain-like protease complex comprises a multimeric assembly of Nsp3,4 & 6, the only non-structural proteins that contain transmembrane domains. Nsp3 is the largest protein (1945 amino acids) in the SARS-CoV-2 proteome [59] containing 16 domains. Nsp3 has broad function in coronaviruses including single-stranded RNA and nucleocapsid binding and is essential to SARS-CoV replication, pathogenesis, virulence and pathogen immunoevasion [59]. To date, two domains of SARS-CoV-2 Nsp3 have been described structurally, detailed below.
Papain-like protease, Nsp3
Nsp3 contains a papain-like protease domain (PLpro) which cleaves polyprotein 1a(b) to release Nsp1, 2 & 3 in SARS-CoV-2 and related SARS-CoVs [60–62]. PLpro includes a deubiquitinase domain that modulates host proteins by reversing ISGylation to control host innate immunity. Its deubiquitylation activity acts directly or indirectly upon proteins central to type I interferon and NFκB inflammatory signalling [63–65], and this function is retained in SARS-CoV-2 [66]. PLpro hydrolyses target polypeptides via recognition of an LXGG motif such as those between Nsp1-2, 2-3 & 3-4 (LNGG, LKGG & LKGG in SARS-CoV-2, respectively), whilst a similar sequence-motif is found at the C-terminus of host ubiquitin and ISG15 modifiers [60,67–70]. Structures of SARS-CoV-2 PLpro in complex with ubiquitin-propargylamide and ISG15 C-terminal domain-propargylamide have revealed how PLpro accommodates human ubiquitin and ISG15 onto the PLpro ‘palm' domain which, via either the ‘finger' (for ubiquitin; Figure 4A) or ‘thumb' (for ISG15; Figure 4B) domains, channel the ubiquitin/ISG15 C-terminal domains into the catalytic site for cleavage [71].
Structural overview of the SARS-CoV-2 proteins involved in posttranslational processing and host immune subversion.
Structural overview of the Nsp3 PL-pro domain in complex with ubiquitin-PA (PDB: 6XAA) (A) and ISG15-PA (PDB: 6XA9) (B). In both, Nsp3 PL-pro is coloured pale green and ubiquitin-PA or ISG15-PA pale blue. (C), Expanded view of the coordination of the VIR250 and VIR251 inhibitor bound Nsp3 PL-pro structures shown as yellow sticks (PDB: 6WUU & 6WX4). (D) Structures of Macrodomain X (MacroX) domain of Nsp3, coloured pale blue; in apo form (PDB: 6WEY) and bound to AMP (PDB: 6W02) or ADP-ribose (6W6Y), shown as yellow sticks. (E) Expanded view of AMP and ADP-ribose binding to Nsp3 MacroX. (F) Overview of the Nsp5 monomer, dimer and octamer with domains I, II & III coloured pale blue, pale green and wheat, respectively with domain III of the octameric neighbour shown in rust (PDB: 6LU7 and 3IWM). Expanded view of the peptide inhibitor bound Nsp5 (PDB: 2Q6G) (G) hexapeptidyl chloromethyl ketone (CMK) inhibitor (PDB: 1UK4) (H) and α-ketoamide inhibitor with a modified P1-γ-lactam bound (PDB: 6Y2G) (I) (drugs shown as yellow sticks).
Structural overview of the Nsp3 PL-pro domain in complex with ubiquitin-PA (PDB: 6XAA) (A) and ISG15-PA (PDB: 6XA9) (B). In both, Nsp3 PL-pro is coloured pale green and ubiquitin-PA or ISG15-PA pale blue. (C), Expanded view of the coordination of the VIR250 and VIR251 inhibitor bound Nsp3 PL-pro structures shown as yellow sticks (PDB: 6WUU & 6WX4). (D) Structures of Macrodomain X (MacroX) domain of Nsp3, coloured pale blue; in apo form (PDB: 6WEY) and bound to AMP (PDB: 6W02) or ADP-ribose (6W6Y), shown as yellow sticks. (E) Expanded view of AMP and ADP-ribose binding to Nsp3 MacroX. (F) Overview of the Nsp5 monomer, dimer and octamer with domains I, II & III coloured pale blue, pale green and wheat, respectively with domain III of the octameric neighbour shown in rust (PDB: 6LU7 and 3IWM). Expanded view of the peptide inhibitor bound Nsp5 (PDB: 2Q6G) (G) hexapeptidyl chloromethyl ketone (CMK) inhibitor (PDB: 1UK4) (H) and α-ketoamide inhibitor with a modified P1-γ-lactam bound (PDB: 6Y2G) (I) (drugs shown as yellow sticks).
Nsp3 is a major antiviral target with inhibitors of this multi-faceted proteolytic activity including disulfiram, mycophenolic acid, thiopurine analogues (6-mercaptopurine and 6-thioguanine) [72,73], pyrimidine derived ‘compound 6' [74] and tashinones derived from Salvia miltiorrhiza [75]. Tetrapeptide-derived inhibitors represent a further class of antivirals specifically targeting the PLpro [60] which incorporate unnatural amino acids that mimic the LXGG motif, and a vinylmethyl ester group which acts as a reactive warhead to the catalytic C111. Structures of the PLpro domain bound to VIR250 and VIR25 established a mechanistic basis whereby the vinylmethyl ester reacts to covalently crosslink VIR250/VIR251 to C111 via a thioether bond and effectively blocking the substrate-binding site and catalytic triad (C111, H272 & D286 in SARS-CoV-2) of PLpro [60] (Figure 4C). Such inhibitors have the potential to block this crucial multi-functional catalytic domain, inhibiting both viral replication and host cell evasion mechanisms of SARS-CoV-2.
The Nsp3 macrodomain X, also termed the ADP-ribose phosphatase domain, consists of an α/β/α sandwich fold which binds ADP-ribose [76]. In a process analogous to PLpro, viral macrodomains bind, and potentially reverse, host ADP-ribosylation, a posttranslational modification used to modulate protein fate, signalling and DNA repair [77]. Whilst not essential, mutation of SARS-CoV Nsp3-macrodomain X attenuated viral loads and led to enhanced interferon-α and -β production, interferon-stimulated gene activation and pro-inflammatory IL-6 and TNF signals in mice, implicating macrodomain X in dampening host innate immunity [76]. Binding of ADP-ribose and other related adenosine-derivatives is conserved by SARS-CoV-2 Nsp3-macrodomain X [78,79], with structures revealing high structural homology to the SARS-CoV Nsp3-macrodomain X (Figure 4D) [78,80,81]. Further, ADP-ribose and AMP bound Nsp3-macrodomain X structures revealed how a hydrophobic substrate-binding pocket (comprising I23, V49, P125, V155 & F156) anchored adenine moieties whilst a conserved acidic D22 formed a high-enthalpy interaction with the adenine amine group (Figure 4E) [78]. Few interactions anchored the first ribose ring, whereas a series of water-mediated interactions coordinated the ADP-ribose diphosphate, with the second ribose ring sequestered within a second hydrophobic cavity (F131 & I132) and anchored by the hydrogen bonding to N40. Such binding induced conformational changes around the second ribose ring relative to the apo-Nsp3-macrodomain X. Overall, SARS-CoV-2 Nsp3-macrodomain X shows high conservation of the key residues important to ADP-ribose binding. At present, there are no documented therapies targeting the Nsp3-macrodomain X of SARS-CoVs, likely due to the broad functionality and phylogenetic conservation between viral and human macrodomain proteins.
3C-like main protease, Nsp5
In concert with Nsp3 processing the two SARS-CoV-2 polyproteins 1a and 1ab require further proteolytic cleavage to release additional functional non-structural proteins. Nsp5, also referred to as 3C-like main protease (3CLpro) and main protease (Mpro) is a pro-enzyme that auto-processes and releases Nsp4 to Nsp16 [82,83]. The function of Nsp5 is critical to virus replication and does not have overlapping specificity with mammalian proteases, hence the protein is a central antiviral drug target. The structure of Nsp5 comprises three domains, domains I and II that form a β-barrel chymotrypsin-like fold that catalyses the proteolytic function of the protein and domain III that mediates the oligomerisation of the protein (Figure 4F) [84]. Predominantly observed as a dimer in crystal structures to date, Nsp5 from SARS-CoV has also been observed as a super-catalytically active octamer mediated through domain swapping of domain III (Figure 4F) [85].
The model proposed for Nsp5 auto-processing is based on that from SARS-CoV and suggests Nsp5 monomers first form an ‘intermediate dimer' via their domain III which enables their N-termini to fill the binding pocket of the adjacent monomer to form the S1 pocket of the active site and allow processing of the N-terminus. This ‘mature dimer' is then processed to remove the C-terminus by another mature dimer in trans [86]. Consequently, the monomer appears to exist as both catalytically active and inactive forms, dimerisation and auto-processing of the N-termini forms a competently active dimer in which the N-terminus of the adjacent monomer completes the active site. Further processing of the C-termini forms a 4-fold more active dimer still. Higher-order oligomers, such as the octomer may produce a super-active molecule which may be involved in auto-processing [82,85,86]. Thus, the oligomerisation and catalytic function of the protein are intertwined.
The active site of Nsp5 comprises a catalytic dyad comprising H41 (domain I) and C145 (domain II). Peptide substrates with the motif A/S/T-X-L-Q-A/S (from P4 to P1′) are bound within the active site via a conserved Y161–M162–H163 pocket that stabilises the P1-Q (Figure 4G) [87]. Peptides are subsequently hydrolysed via a nucleophilic attack on the carbonyl carbon of the P1-P1′ (Q-A/S) bond by the C145Sγ nucleophile leading to the formation of a covalent thioester enzyme-substrate complex and the release of the C-terminus of the peptide. The catalytic cysteine is regenerated by the activation of a water molecule to a nucleophilic OH− by H41 which attacks the carbonyl carbon of the thioester, thus releasing the N-terminus of the peptide [87–89] (Figure 4G). The active site topology of Nsp5 is similar to that of other viral chymotrypsin-like proteases including the 3C proteases of picornaviruses as well as the more divergent flaviviruses and enteroviruses. This led to the clinical testing of FDA approved protease inhibitors as ‘rapid-response' therapeutics including lopinavir/ritonavir for the human immunodeficiency virus-protease which appears ineffective against SARS-CoV-2 [90] and danoprevir for the hepatitis C virus-protease [91] that shows therapeutic promise [92]. The promise such trials hold has encouraged the search for more potent Nsp5-specific inhibitors.
Of the drugs and fragments that have been directed towards the two linked actions of the protein, proteolysis and oligomerisation, ∼85 that target the active site and ∼3 (fragments) that target the dimerisation domain of the protein are currently annotated in the Protein Data Bank [93]. Development of molecules that inhibit the dimerisation of Nsp5 remain in their infancy. Pioneered by Hilgenfeld et al. (reviewed in [94]), structural approaches to inhibit Nsp5 were initially based on porcine transmissible gastroenteritis virus Mpro and SARS-CoV Nsp5 in complex with hexapeptidyl chloromethyl ketone (CMK) inhibitors [95,96] (Figure 4H). The irreversible covalent attachment of reactive groups like CMK on P1 to the catalytic Cys145 has defined much of the development of further peptide analogues. This includes vinylogous alkyl ester derivatives [96] and α,β-epoxiketones [97] (reviewed in [94]), though fragment-based approaches have identified a number of other potential reactive groups for further investigation [93]. Nevertheless, the most successful iteration to date are the α-ketoamide inhibitors with a modified P1-γ-lactam group, which show significant therapeutic promise (inhibition of replication of SARS-CoV replicon, EC50 = ∼1.75 µM) [84,98,99] (Figure 4I). Though no drug to Nsp5 has yet been approved for therapeutic use, the speed with which drugs are being developed is a testament to the early work on picornavirus 3C proteases and SARS-CoV and MERS-CoV homologues and provides hope for a broad-spectrum small molecule solution to current and emerging SARS threats.
Host mediation and immune suppression
In addition to the viral entry and replication, numerous SARS-CoV-2 proteins play essential roles in the suppression of host-immune responses thereby facilitating viral progression and spread.
Translation inhibition, Nsp1
Nsp1 is a highly conserved protein that impairs host translation by two distinct mechanisms, firstly via binding the 40s ribosome to stall mRNA translation with a secondary impact of endonucleolytic cleavage and subsequent degradation of host mRNA [100]. A C-terminal motif of Nsp1 in SARS-CoV is conserved across betacoronaviruses and implicated in 40s binding with mutations ablating binding and allowing translation. Thus, one impact of Nsp1 is to disrupt host-cell protein production including the generation of innate antiviral immune responses [100]. The structure of Nsp1 bound to the human 40s ribosomal subunit was solved via cryo-electron microscopy resulting in nine compositionally heterogenous 40s and 80s ribosomal complexes (Figure 5A) [101]. In each structure, two C-terminal α-helices were found to bind inside the ribosomal mRNA entry channel between rRNA helix h16 and the uS3 and uS10 ribosomal subunits [102].
Structural overview of the SARS-CoV-2 proteins involved in host-cell mediation.
(A) Nsp1 in complex with the human 40s ribosome, shown pink and pale yellow, respectively. The N-terminal domain was evident in the reconstruction, whereas only the C-terminal helices could be built (PDB: 6ZLW). (B) Nsp1 made molecular contacts to uS3, uS5 and h18 of the 40s ribosome, coloured pink, aqua, pale green, pale grey and pale yellow, respectively. (C) Superposition of the active 40s ribosome showing the way Nsp1 plugs the mRNA path (PDB: 6HCJ). (D) Crystal structure of the Nsp15–RNA complex with N-terminal domain, middle domain coloured wheat, pale yellow and grey, respectively with expanded view of the RNA bound within the catalytic domain (E).
(A) Nsp1 in complex with the human 40s ribosome, shown pink and pale yellow, respectively. The N-terminal domain was evident in the reconstruction, whereas only the C-terminal helices could be built (PDB: 6ZLW). (B) Nsp1 made molecular contacts to uS3, uS5 and h18 of the 40s ribosome, coloured pink, aqua, pale green, pale grey and pale yellow, respectively. (C) Superposition of the active 40s ribosome showing the way Nsp1 plugs the mRNA path (PDB: 6HCJ). (D) Crystal structure of the Nsp15–RNA complex with N-terminal domain, middle domain coloured wheat, pale yellow and grey, respectively with expanded view of the RNA bound within the catalytic domain (E).
The Nsp1 C-terminus binding site was proximal to the ‘latch’ between ribosomalRNA helix h18 and h34 of the body and head, respectively and would block mRNA binding (Figure 5A). The first helix interacted with uS3 and uS5 with a subsequent KH motif-containing loop known to interact with a h18 loop that is implicated in ribosomal decoding [102]. The second, and larger, α-helix of Nsp1 also interacted with the h18 ribosomalRNA and also made contact to the uS5 subunit via C-terminal interactions. Cooperatively the two-helices stabilise each other and form a complementary charge and shape to the mRNA channel that closely mimicked the mRNA path [102]. More specifically the K164 of the Nsp1 KH motif bound to a negatively charged ribosomalRNA pocket within h18 mainly comprised by the phosphate backbone of bases G625 and U630, whereas H165 stacks in between U607 and U630 (Figure 5B) [102]. Indeed Nsp1 was recently shown to directly inhibit 40S translation of both native and viral mRNA, although the 5’ untranslated region of SARs-CoV-2 seems optimised for translation in these conditions however the mechanism remains unknown [103]. The Nsp1-mediated disruption to translation was thought to result from steric occlusion with superposition of an ‘active’ ribosome [104] showing complete occlusion of the mRNA entry channel (Figure 5C). Taken together, these specific molecular contacts rigidly anchor Nsp1 in place and thereby obstruct the mRNA entry channel thus inhibiting ribosomal function. The implications of this structure are yet to be fully realised, although the ability for SARS-CoV-2 to hinder host immune responses via hindering translation is likely an intriguing avenue for therapeutic intervention.
Uridine-specific endonuclease, Nsp15
Viruses within the Nidovirales order including coronaviruses all contain a cluster of conserved enzymes as part of their ORF1a/b polyproteins, a Nidoviral RNA uridylate-specific endonuclease (NendoU) is one such protein whose activity degrades polyuridine RNA extensions [105]. EndoU proteins are a common class of enzymes that are also found in humans [106], they generally utilise a catalytic two-histidine and lysine triad to cleave RNA via a mechanism that produces a terminal 2′–3′ cyclic phosphodiester.
The structure of the SARS-CoV-2 NendoU protein Nsp15 has been elucidated in its apo- and product-bound states [107,108]; although the catalytic activity of Nsp15 is thought to be manganese ion-dependent the location of an ion binding site is as yet unknown. In its different states Nsp15 adopts a relatively static L-shape (Figure 5D), an N-terminal and middle domain facilitates higher-order hexamerisation while the catalytic C-terminal domain contains the catalytic residues (H235, H250, K290) at the base of a V-shaped two β-sheet cleft. Substrate RNA associates via aromatic-stacking interactions within the cleft with backbone interactions and H-bond to S294 providing uridine cleavage specificity (Figure 5E). The hexameric assembly of Nsp15 is thought to be the functionally active assembly with 3 active sites available at the top and bottom of the head to head arrangement of a dimer of trimers [108].
It has previously been hypothesised that viral NendoU proteins may be potential therapeutic targets, the repurposing of the uracil derivative Tipiracil has been investigated as a potential scaffold for drug development as it has been shown to bind within the SARS-CoV-2 Nsp15 active site and inhibits its enzymatic activity [107].
Outlook
Currently, no effective antiviral treatment for SARS-CoV-2 is available. The traditional time frame for drug development is ∼15 years from bench to market. Thus, many of the current SARS-CoV-2 advances are built on the foundations of pioneering work from the previous 20 years on SARS-CoV and MERS-CoV. In addition, the structural repertoire of SARS-CoV-2 has expanded rapidly since the start of the year to be >300 structures solved to date. Despite the advancements reviewed here, gaps in our knowledge remain. These include information on the multimeric replication and budding complexes, and limited insight into SARS-CoV-2-host protein interactions. Also, individual structures of the membrane, envelope, Nsp2, 4, 6, 11 and 14 are yet to be determined. However, some homologous structures have been determined for SARS-CoV. A number of the enigmatic ORF proteins also remain to be fully understood. As already illustrated within this review, the ‘resolution revolution’ in cryo-electron microscopy will likely aid in the characterisation of some of these structures.
Nonetheless, the structures identified to date have already informed therapeutic development. The Spike and nucleocapsid proteins have been thoroughly studied as antigenic targets for vaccination trials and antibody-based therapeutics. Some of the most impactful SARS-CoV-2 structures solved to date include the early spike structures [2,3] which have paved the way for studies into optimised vaccination variants of the spike [109]. Furthermore, the antiviral development of the Nsp proteins has produced a swathe of ligand-bound structures with an emphasis on the Nsp3 and Nsp5 proteases. Furthermore, structural biology has greatly aided our understanding of nucleotide derived antivirals such as Remdesivir [26,27,30], indeed targeting of the RNA replication machinery which will only increase in scope as our understanding of this pathway becomes more complete [35]. Thus, structural biology has already validated immunogenic epitopes destined for vaccinations and informed drug development. Further structural endeavours will no doubt build on the data summarised here and inform the next steps in therapeutic development.
Perspectives
Structural biology has proven to be an invaluable technique in understanding SARS-CoV-2. Such insight has already aided the generation of therapies to ameliorate COVID19.
Of the 20 main SARS-CoV-2 proteins 13 have been structurally resolved with >300 structures determined thus far. The remarkable contribution of structural biology has aided in the rapid understanding of this virus. The function of many of these proteins have been revealed in new light aiding structure-based drug development.
Determination of the remaining SARS-CoV-2 structures that have evaded characterisation and the multimeric structures of the replicase complex could revolutionise our understanding of the virus and inform therapeutic development. In addition, a concerted effort to understand the intricacies of viral-host interactions including the SARS-CoV-2 accessory proteins would be of great interest.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
Funding for the work originated from the Australian Research Council Centre of Excellence for Advanced Molecular Imaging.
Open Access
Open access for this article was enabled by the participation of Monash University in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with CAUL.
Author Contributions
D.R.L., B.J.M., G.M.W., J.P.V. & B.S.G. reviewed the literature, wrote the manuscript and created the figures.
Acknowledgements
We thank the Biochemical Society editors for the invitation to contribute.
Abbreviations
- ACE2
angiotensin-converting enzyme 2 receptor
- ADP
adenosine diphosphate
- AMP
adenosine monophosphate
- COVID19
coronavirus disease 2019
- C-terminus
carboxy terminus
- DNA
deoxyribonucleic acid
- MERs-CoV
Middle East respiratory syndrome coronavirus
- mRNA
messenger ribonucleic acid
- N protein
nucleocapsid protein
- NendoU
Nidoviral RNA uridylate-specific endonuclease
- NiRAN
nucleotidyltransferase domain
- Nsp
non-structural proteins
- N-terminus
amino terminus
- NTP
nucleotide triphosphate
- ORF
open reading frame
- PDB
Protein Data Bank
- PLpro
papain-like protease domain
- RBD
receptor-binding domain
- RdRp
RNA dependent RNA polymerase
- RNA
ribonucleic acid
- SARS-COV
severe acute respiratory syndrome coronavirus