N-terminal phosphorylation regulates the activity of glycogen synthase kinase 3 from Plasmodium falciparum

As the decline of malaria cases stalled over the last five years, novel targets in Plasmodium falciparum are necessary for the development of new drugs. Glycogen Synthase Kinase (PfGSK3) has been identified as a potential target, since its selective inhibitors were shown to disrupt the parasitès life cycle. In the uncanonical N-terminal region of the parasite enzyme, we identified several autophosphorylation sites and probed their role in activity regulation of PfGSK3. By combining molecular modeling with experimental small-angle X-ray scattering data, we show that increased PfGSK3 activity is promoted by conformational changes in the PfGSK3 N-terminus, triggered by N-terminal phosphorylation. Our work provides novel insights into the structure and regulation of the malarial PfGSK3.


Introduction
Plasmodium falciparum is an intracellular eukaryotic parasite that causes the most severe form of malaria in humans, infecting over 200 million people every year [1]. The emergence of malaria parasites resistant to all currently used antimalarial drugs imposes a serious threat to public health [2]. Identification of new drug targets and drugs is therefore crucial for future handling of the malaria pandemics. Kinases comprise up to 10% of all drug targets [3] and a number of potent inhibitors were discovered that are directed against P. falciparum kinases, such as glycogen synthase kinase 3 (PfGSK3) [4]. Human GSK3ß plays a crucial role in a number of signaling processes, regulating glycogen metabolism [5][6][7], cell cycle and growth [8][9][10], translation [7], embryonic development [9,11] or differentiation of neurons [12][13][14][15][16]. Consequently, the inhibitors of GSK3 are investigated as potential treatment options for neurodegenerative and psychiatric diseases [17][18][19][20][21][22][23] and its structure, catalytic mechanism and regulation are well studied [24][25][26]. GSK3β is thought to be constitutively activated by autophosphorylation at Y216 [27][28][29] and inactivated by phosphorylation of N-terminal S9 mediated by several protein kinases, such as PKA and PKB [6,30]. The inactivation is achieved as the phosphorylated S9 enables binding of GSK3 N-terminus to a pocket for a priming phosphorylation in the substrate-binding region and thus reduces the binding affinity for substrates [24,25].
In P. falciparum, two genes encode GSK3 (PF3D7_0312400 and PF3D7_1316000) and these enzymes are much less understood in their physiological role and in the molecular mechanism. GSK3ß (PF3D7_0312400, further as PfGSK3) is the only isoform that is expressed in blood stages [31] of the parasite and is involved in the regulation of red blood cell invasion [32]. This essential process is parasite-driven and relies on the orchestrated interplay of multiple protein-protein interactions [33,34]. PfGSK3 is involved in activation of apical membrane antigen 1 (AMA1) that subsequently mediates the formation of the so called 'tight junction' with the host cell prior to the invasion [32,[35][36][37][38][39][40]. In this two-step phosphorylation event, protein kinase A (PfPKA) first phosphorylates S610 of AMA1 [41][42][43], which in turn enables subsequent phosphorylation of AMA1 residue T613 by PfGSK3 [32]. Thus, PfGSK3 serves as a secondary kinase, although both phosphorylation sites are equally important for AMA1 function and P. falciparum invasion [32,36,40].
PfGSK3 inhibitors, built on a thieno [2,3-b]pyridine or benzofuran scaffold, have been developed that display selective antiplasmodial activity with IC 50 values in the low micromolar to sub-micromolar range [44][45][46] and one of these inhibitors has been shown to abrogate red blood cell invasion [32]. Further improvements of these inhibitors could be guided by a three-dimensional structure of PfGSK3 that has, however, not been determined so far. Based on sequence homology across GSK3 proteins (Supplementary Figure S1A), PfGSK3 consists of two conserved structural domains: a β-sheet-rich lobe with an ATP binding pocket and an α-helical lobe with the activation loop and a substrate binding site. These domains are typically preceded by an N-terminal domain that is predicted to be disordered [25]. In PfGSK3, the important residues in the ATP binding pocket and in the activation loop remain conserved, however, its N-terminus displays an unique extension and its function remains unknown [47].
To analyze this in more detail, we performed thorough structural and functional characterization of PfGSK3 in vitro. We provide evidence that PfGSK3 exhibits autophosphorylation of the uncanonical N-terminal domain. We show that phosphorylation of the N-terminal residues induces conformational changes of the protein that lead to an increase in the enzymatic activity of PfGSK3. Furthermore, we reveal that binding of heavy metal ions such as zinc cations to PfGSK3 induce a reversible formation of large complexes that result in a complete loss of its enzymatic activity.

Cloning and mutagenesis
The vector with N-terminally His-tagged PfGSK3 was generated by PCR amplification of the GSK3 coding sequence from P. falciparum cDNA followed by Ligation Independent Cloning into HindIII/KpnI-cleaved plasmid pOPIN F [48] using the In-Fusion HD EcoDry Cloning Kit (Takara Clontech) according to the manufacturer's instructions. The mutants S226A, Y229A and S226A/Y229A were generated by overlap extension PCR amplification from the original vector and Ligation Independent Cloning as described above. The wildtype protein and the mutant K96A cloned in pET28a vector were ordered from GenScript. The N-terminally truncated constructs were cloned by amplifying the sequence from the original vector and subcloning into BsaI-cleaved plasmid pNIC28_Bsa4 by SLiCE cloning [49].

PfGSK3 expression and purification
E. coli C41(DE3) culture transformed with the PfGSK3 coding vector was grown in TB medium supplemented with 0.1 mg/ml ampicillin at 37°C, induced at OD 600nm = 0.7 with 0.5 mM IPTG and harvested after 4 h. The pellets were resuspended in 5 ml of lysis buffer (20 mM NaP pH 7.5, 300 mM NaCl, 15 mM imidazole, 5% glycerol, 0.5 mM TCEP, 1 mg/ml lysozyme, 5 U/ml DNase, 1 Roche protease inhibitor tablet/100 ml) per 1 g of wet weight culture. The suspension was passed three times through an emulsifier at 15 000 psi, centrifuged at 19 000g and the supernatant was loaded on a pre-equilibrated His-Trap column. The column was washed with lysis buffer supplemented with 40 mM imidazole and the protein was eluted at increasing imidazole concentrations. The protein containing fractions were concentrated (10 kDa cut-off concentrator) with the addition of 1 mg of 3C protease per 3 L of culture volume and further separated on a Superdex 200 HiLoad column (GE Healthcare) in 50 mM Tris pH 8.0, 20 mM NaCl, 0.5 mM TCEP. The protein containing fractions were concentrated again, loaded on ResourceQ anion exchange column (GE Healthcare) and eluted with increasing concentration of NaCl. Fractions representing different PfGSK3 species were pooled together, concentrated, dialyzed against the analysis buffer (typically 20 mM Tris pH 8.0, 100 mM NaCl, 0.5 mM TCEP) and flash-frozen in liquid nitrogen and stored at −80°C. All purification steps were performed at 4°C.

Expression and purification of insoluble PfGSK3 constructs
The insoluble PfGSK3 constructs (K96A, S226A/Y229A, ΔN64) were expressed as described for the wild-type PfGSK3. The pellets of harvested bacteria were resuspended in 5 ml of lysis buffer (20 mM NaP pH 7.5, 300 mM NaCl, 15 mM imidazole, 5% glycerol, 0.5 mM TCEP, 1 mg/ml lysozyme, 5 U/ml DNase, 1 Roche protease inhibitor tablet/100 ml) per 1 g of wet weight culture. The suspension was passed three times through an emulsifier at 15 000 psi, centrifuged at 19 000g and the pellet was dissolved in a denaturating buffer (20 mM NaP ( pH 7.5), 300 mM NaCl, 25 mM imidazole, 5% glycerol, 6 M urea, 0.5 mM TCEP) and incubated with 1 ml NiNTA bead slurry (Sigma) for 1 h, 4°C. The beads were subsequently washed twice with 10 ml of the same buffer and subsequently eluted with 5 ml of the same buffer containing extra 250 mM imidazole. The protein was concentrated (10 kDa c/o) and the sample for mass spectrometry was separated by SDS-PAGE gel.

Small-scale expression tests
An amount of 4 ml of TB supplemented with ampicillin was inoculated wit pre-cultured E. coli C41(DE3) that had been transformed with the tested plasmids. The cultures were grown at 37°C until the OD 600nm = 0.6 and further either induced with 0.5 mM IPTG and grown for 4 h at 37°C or 25°C, or induced with 0.1 mM IPTG and grown for 16 h at 18°C. OD 600 nm was measured and culture volume corresponding to OD 600 nm of 2.0/ml was harvested. The pellets were resuspended in 400 ml of lysis buffer (lysate fractions), then lysed by 10 min of vortexing with 100 ml glass beads and the cell debris were removed by centrifugation (soluble fractions). An amount of 5 ml of each fraction was mixed with 10 ml of loading dye and run on SDS-PAGE in duplicates. One gel of each set of samples was blotted on PVDF membrane (Bio-Rad). The membranes were blocked by 1% BSA in TBS-T, washed, incubated for 1 h with a HisProbe-HRP conjugated antibody (ThermoFisher), washed with TBS-T and the blot was developed using Super Signal West Pico Substrate (ThermoFisher) and Super Signal West Femto Substrate (ThermoFisher) in a 1 : 10 ratio.

Glutaraldehyde cross-linking
An amount of 1 mg/ml PfGSK3 in PBS and 0.5 mM TCEP was mixed with 8× molar excess of either EDTA or ZnCl 2 and 0.01% glutaraldehyde. The reaction was quenched with 330 mM Tris (final concentration) and SDS-PAGE loading dye and SDS-PAGE was run for 150 min at 150 V. The gels were stained with Direct Blue stain.

Thermal shift assay
The stability of PfGSK3 under different buffer conditions and additives was measured by nanoDSF (Prometheus NT.48, NanoTemper Technologies, GmbH). The protein concentration was adjusted to 1 mg/ml. 10 ml of samples were loaded in the glass capillaries and heated from 20°C to 90°C with a heating rate of 1°C/ min. The fluorescence signals with excitation wavelength of 280 nm and emission wavelengths of 330 and 350 nm were recorded and the melting temperature was calculated as maximum of the derivative of the fluorescence at 330 nm.

Circular dichroism
Circular dichroism was measured on a Chirascan CD spectrometer (Applied Photophysics). The protein concentration was adjusted to 1 mM by 2 mM NaP ( pH 7.5), 4 mM NaCl, 0.05 mM TCEP prior to the measurement. The circular dichroism spectrum was measured 10× between 185 nm and 260 nm with 1 nm step in 1 mm quartz cuvette and analyzed by the CDSSTR algorithm [50,51] using DichroWeb [52].

Analytical size exclusion chromatography
The analytical size exclusion chromatography was performed on an Agilent Bio-LC system using a homepacked Superose 5/150 column (25 ml sample). GSK3 was mixed with zinc chloride to achieve molar ratios ranging from 1 : 8 to 8 : 1 at fixed final GSK3 concentration of 1 mg/ml. PBS supplemented with 0.5 mM TCEP was used as a mobile phase and the resulting chromatograms were recorded using the as fluorescence signal with excitation wavelength of 280 nm and emission wavelength of 350 nm as readout.

Autophosphorylation IEX assay
An amount of 0.5 mg/ml of PfGSK3 was incubated at 37°C with 2 mM ATP or cAMP and 10 mM MgCl 2 overnight. The samples were subsequently dialyzed at 4°C overnight in 50 mM Tris pH 8.0, 20 mM NaCl, 0.5 mM TCEP. The samples were analyzed by analytical ion exchange chromatography using a Resource Q column (GE Healthcare) and the 1260 Infinity Bio-inert high-performance liquid chromatography system (Agilent Technologies) at 10°C. The system and column were equilibrated in 50 mM Tris pH 8.0, 20 mM NaCl, 0.5 mM TCEP. 100 ml of sample was injected and eluted with an increasing concentration of NaCl. The system was run at 0.2 ml/min ad the elution profile was analyzed by UV fluorescence detector with absorbance at 280 nm and emission at 350 nm.

Autophosphorylation gel assay
The protein concentration of separated IEX peak fractions was adjusted to 0.25 mg/ml and 20 ml of each fraction was mixed with 10 ml of 3× loading dye. An amount of 10 ml of samples were run on SDS-PAGE next to each other twice. One gel was stained with Direct Blue stain (C.B.S. Scientific) and the other gel was stained with Pro-Q Diamond stain (Thermofisher). The stained gels were imaged with Imager Gel Doc XR+ System (Bio-Rad) and the band intensity estimated with Image Lab software (Bio-Rad). The relative intensity of the bands was first estimated for each gel separately (relative to the band in lane 1) and the relative phosphorylation was calculated as the ratio between relative intensity of Direct Blue-stained bands and relative intensity of Pro-Q Diamond-stained bands.

Dynamic light scattering
Dynamic light scattering was measured 30× on Wyatt Dynapro Plate Reader II with 1 mg/ml GSK3 in 20 mM Tris ( pH 8.0), 100 mM NaCl, 0.5 mM TCEP. The molar mass was assessed from hydrodynamic radius (calculated from cumulants fit from autocorrelation functions averaged over 30 measurements), particle density and globular particle conformation model in DYNAMICS software (Wyatt Technology).

Small angle X-ray scattering
The SEC-SAXS data were collected at the P12 BioSAXS Beamline at the PETRA III storage ring (c/o DESY, Hamburg, Germany) [53]. The concentrated samples were dialyzed against 20 mM Tris pH 8.0, 100 mM NaCl, 0.5 mM TCEP overnight at 4°C and the concentration was adjusted to 10 mg/ml. The X-ray scattering data were measured in an on-line SEC-SAXS mode, using a SD200 Increase 5/150 column (GE Healthcare) at 0.4 ml/min with 1 frame recorded per second ( Table 1). The data were automatically processed using the SASFLOW pipeline [54] and further analyzed using the ATSAS suite [55] of programs to determine the molar mass, radius of gyration and real-space distance distribution. Scattering curves were computed from models using CRYSOL [56] with a constant subtraction and maximum order of harmonics set to 50. The rigid body modeling performed using CORAL [57], setting different parts of models yielding best CRYSOL fits as disordered. These were residues 1-65 and 403-440 in conformation A, resides 47-65 and 403-440 in conformation B and residues 403-440 in conformation C. The raw data were uploaded to SASBDB under accession numbers SASDL77 and SASDL87.

Kinase assay
To investigate enzymatic activity of recombinant PfGSK3 a commercial luminescence-based kinase assay (KinaseGlo Plus, Promega) was used as previously described [58]. Briefly, 20 ng PfGSK3, 12 mM GS-1 peptide substrate (residues 636-661 of the human glycogen synthase 1; sequence YRRAAVPPSPSLSRHSSPHQ( pS) EDEEE; pS = prephosphorylated serine, Promega) and 6 mM ATP (UltraPure, Promega) in the kinase reaction buffer (40 mM Tris-HCl pH 7.5; 20 mM MgCl 2 ; 0.1 mg/ml BSA) were used as standard reaction mix (final reaction volume 5 ml). The kinase reaction was performed for 30 min at 30°C followed by incubation with KinaseGlo reagent according to the manufacturer's instructions. The reaction was transferred to a solid white 384-well plate (NUNC, ThermoFisher) and the luminescence signal was measured in an EnVision Multilabel Plate Reader (PerkinElmer, Integration time 0.5 sec/well). To investigate reaction kinetics, kinase and substrate concentrations were varied between 0 and 100 ng or 0 and 15 mM, respectively. To investigate autophosphorylation, reactions were performed with kinase concentrations between 0 and 500 ng, 30 mM ATP and without a substrate.

Sample preparation for mass spectrometry
To assess the phosphorylation status of the recombinant PfGSK3 ( purified protein, separated IEX fractions, insoluble constructs), each sample was separated on SDS-PAGE gel. Coomassie-stained bands were excised, cut into small pieces and transferred to 0.5 ml Eppendorf tubes. For all following steps, buffers were exchanged by two consecutive 15   The values for the best model for each respective dataset are shown.
resuspension buffer (Promega, V542A) was diluted with ice-cold 50 mM AmBiC buffer to achieve a final concentration of 1 ng/ml. 50 ml thereof were added to gel pieces, which were incubated first for 30 min on ice and then over night at 37°C. Gel pieces were sonicated for 15 min, spun down and the supernatant was transferred into a glass vial (VDS Optilab, 93908556). Remaining gel pieces were washed with 50 ml of an aqueous solution of 50% ACN and 1% formic acid and sonicated for 15 min. The combined supernatants were dried in a speedvac and reconstituted in 10 ml of an aqueous solution of 0.1% (v/v) formic acid.

Mass spectrometry data acquisition
The mass spectra of the initial phosphorylation site assessment of PfGSK3 (Supplementary Figure S4B) were acquired as follows: An UltiMate 3000 RSLC nano LC system (Dionex) fitted with a trapping cartridge (Precolumn C18 PepMap100, 5 mm, 300 mm i.d., 5 mm, 100 Å) and an analytical column (Acclaim PepMap 100. 75 × 50 cm C18, 3 mm, 100 Å) was used. Trapping was carried out with a constant flow of trapping solution (0.05% trifluoroacetic acid in water) at 30 ml/min onto the trapping column for 6 min. Subsequently, peptides were eluted via the analytical column running solvent A (0.1% formic acid in water) with a constant flow of 0.3 ml/min, with increasing percentage of solvent B (0.1% formic acid in acetonitrile) from 2% to 4% in 4 min, from 4% to 8% in 2 min, then 8% to 28% for a further 37 min, in another 9 min. from 28%-40%, and finally 40%-80% for 3 min followed by re-equilibration back to 2% B in 5 min. The outlet of the analytical column was coupled directly to an Orbitrap QExactive™ plus Mass Spectrometer (Thermo) using the Nanospray Flex™ ion source in positive ion mode. The QExactive plus was operated in positive ion mode with a spray voltage of 2.2 kV and capillary temperature of 275°C. Full scan MS spectra with a mass range of 350-1400 m/z were acquired in profile mode using a resolution of 70 000 (maximum injections time of 100 ms, AGC target was set to 3 × 10 6 ions). Precursors were isolated using the quadrupole with a window of 1.7 m/z and fragmentation was triggered by HCD in fixed collision energy mode with fixed collision energy of 28%. MS2 spectra were acquired with the Orbitrap with a resolution of 17.500 and a max injection time of 70 ms.
The mass spectra of all other experiments were recorded as follows: Peptides were analyzed by LC-MS/MS on an Orbitrap Fusion Lumos mass spectrometer (Thermo Scentific) as previously described [59]. To this end, peptides were separated using an Ultimate 3000 nano RSLC system (Dionex) equipped with a trapping cartridge (Precolumn C18 PepMap100, 5 mm, 300 mm i.d., 5 mm, 100 Å) and an analytical column (Acclaim PepMap 100. 75 × 50 cm C18, 3 mm, 100 Å) connected to a nanospray-Flex ion source. For the detection of posttranslational modified peptides, peptides were loaded onto the trap column at 30 ml per min using solvent A (0.1% formic acid) and peptides were eluted using a gradient from 2 to 80% Solvent B (0.1% formic acid in acetonitrile) over 60 min at 0.3 ml per min (all solvents were of LC-MS grade). The Orbitrap Fusion Lumos was operated in positive ion mode with a spray voltage of 2.4 kV and capillary temperature of 275°C. Full scan MS spectra with a mass range of 375-1200 m/z were acquired in profile mode using a resolution of 120 000 (maximum injections time of 50 ms, AGC target was set to Standard with a max injection time of 50 ms. Precursors were isolated using the quadrupole with a window of 1.2 m/z and fragmentation was triggered by HCD in fixed collision energy mode with fixed collision energy of 34%. MS2 spectra were acquired with the Orbitrap with a resolution of 15.000 and a max injection time of 54 ms.

Mass spectrometry data analysis
Acquired data were analyzed using IsobarQuant [60] and Mascot V2.4 (Matrix Science) using a reverse UniProt FASTA database from E. coli (UP000000625) or Plasmodium falciparum (UP000001450) including common contaminants. The following modifications were taken into account: Carbamidomethyl (C, fixed), Acetyl (K, variable), Acetyl (Protein N-term, variable), Oxidation (M, variable), as well as the variable modification Phospho (ST) and Phospho (Y). The mass error tolerance for full scan MS spectra was set to 10 ppm and to 0.02 Da for MS/MS spectra. A maximum of two missed cleavages were allowed. A minimum of two unique peptides with a peptide length of at least seven amino acids and a false discovery rate below 0.01 were required on the peptide and protein level [61]. Prior to performing the search, the protein sequences corresponding to the expressed PfGSK3 constructs were added to the database for the expression host. The raw data have been uploaded to the PRIDE database with the dataset identifier PXD026387.

Correlation analysis
Each biological replicate of mass spectrometry experiment was paired with the corresponding biological replicate of the activity measurement. The relative phosphorylation of a residue was calculated as a ratio of number of spectra in which the residue was identified as phosphorylated (Mascot score >32 and MD score ≥5) to number of spectra in which the residue was identified in the biological sample. The relative activity was calculated as buffer-subtracted readout of the luciferase assay relative to control. The strength of the linear association was calculated as Pearson correlation coefficient using R ( packages stats) [62].

Negative stain electron microscopy
An amount of 4 ml of 0.01 mg/ml protein solution was applied to carbon-coated Cu/Pd grids and incubated for 1 min. Excess liquid was removed with the filter paper (Whatman), and the grid was washed twice with water for 5 s and 30 s. An amount of 4 ml of 2% uranyl acetate were applied twice for 5 s and 30 s. The micrographs were collected on a Talos L120C (CSSB Cryo-EM multiuser facility) with 70 000× magnification.

Bioinformatics
The homologous protein sequences were aligned with the program MAFFT [63] and depicted in Jalview. The secondary structure was predicted by Jpred [64]. The PfGSK3 structures were predicted using the Robetta server and the TrRefineRosetta method [65]. To simulate the different N-terminally phosphorylated species present in the protein fractions, we generated five models for each of the following PfGSK3 sequences: wildtype sequence, three sequences with a single phosphomimetic mutation (S40E, S42E, S43E) and one with a triple mutation S40E/S42E/S43E (abbreviated SE3). Per-residue local error estimates of the models calculated by Robetta [66] are written in the B-factor column of the PDB files. For phylogenetic analysis, protein sequences with highest identity scores were identified for individual organisms with the blastp suite of NCBI [67], the sequences were aligned with the ClustalW method and the R package msa [68], and the phylogenetic tree was calculated in PhyML with LG + G+I + F method [69]. The phylogenetic tree with multiple sequence alignment was plotted using R package ggtree [70].

Results
High yield expression and purification of enzymatically active PfGSK3 from E. coli To obtain recombinant PfGSK3 in high purity and homogeneity that allow functional and structural analysis, we extended the previously established PfGSK3 expression and purification protocols [32,44,45,71]. N-terminally tagged PfGSK3 ( Figure 1A) was purified by immobilized metal affinity chromatography (IMAC, Figure 1B) and subsequent size exclusion chromatography (SEC, Figure 1C). We deliberately excluded a negative IMAC step as PfGSK3 becomes sensitive to NiNTA beads upon the removal of the histidine tag and has a high tendency to precipitate. To remove impurities, we included an additional ion exchange chromatography (IEC) step ( Figure 1D). This yields highly pure PfGSK3 as judged by SDS-PAGE, but the protein elutes in several partially overlapping peaks with increasing concentration of sodium chloride. Such heterogeneity of PfGSK3 might be caused by different posttranslational modifications. To assess the quality of the purified protein and to characterize it further, we applied several biophysical methods. Dynamic light scattering data confirm high homogeneity of the sample with a calculated molecular weight of 52 kDa, in agreement with the expected mass for a monomer of recombinant PfGSK3 (52.05 kDa, Supplementary Figure S2A). The secondary structure content derived from circular dichroism data is 22% α helix, 26% β sheet, 24% turns and 28% disorder. This corresponds to a typical domain organization of homologous GSK3 proteins (such as human GSK3β) with an N-terminal β-sheet-rich domain and an α-helical C-terminal domain that are flanked by disordered N-terminal and C-terminal regions (Supplementary Figure S1A, Figure 2A) [24,72]. To verify that the protein retains its three-dimensional structure and kinase properties, we tested whether its thermal stability changes in the presence of ATP and ATP analogs. Indeed, the stability of PfGSK3 markedly increased in a concentration dependent manner from 45°C to over 50°C with increasing concentration of dATP or ATP ( Figure 2B and Supplementary Figure S2B). Moreover, the protein could also be stabilized by several nonhydrolysable ATP analogs (Supplementary Figure S2C). Additionally, we observed a concentration-dependent increase in the thermal stability of PfGSK3 with increasing concentrations of sulfate ions (Supplementary Figure S2D,E). Sulfate ions are bound in the activation loop of several human GSK3 structures [73][74][75][76][77][78][79], suggesting that the structural elements in this region are also conserved. Finally, we measured the PfGSK3 activity with a luciferase assay, using a peptide derived from human glycogen synthase 1 (GS-1) as a substrate, resulting in a catalytic rate constant of k cat = 9.5 s −1 , which is in a similar range as previously reported (between 0.7 and (A) Circular dichroism shows secondary structure composition similar to what is expected for a GSK3 protein, with 22% α helix, 26% β sheet, 24% turns and 29% disordered content. The circular dichroism was measured 10× and the data were averaged, buffer subtracted, and analyzed by 3.7 s −1 ) for human GSK3 ( Figure 2C and Supplementary Figure S2F) [72]. Thus, purified PfGSK3 is folded, stable, enzymatically active and retains the structure that is expected to be similar to human GSK3.

Divalent ions of heavy metals induce reversible formation of PfGSK3 oligomers
The observation that PfGSK3 after affinity tag cleavage has a strong tendency to precipitate upon interaction with NiNTA beads prompted us to investigate the impact of various heavy metals on PfGSK3. To this end, we measured the thermal unfolding profiles of the protein in the presence of different metal ions. Under standard conditions, PfGSK3 displays a clear transition midpoint at 45°C. However, in the presence of divalent ions, such as zinc, cobalt, nickel and copper cations, the typical unfolding transition profiles completely disappear, indicating structural changes (Supplementary Figure S3A). We selected zinc to investigate further the effect of divalent heavy metal ions on PfGSK3 because it showed the most pronounced effect on PfGSK3 unfolding. Interestingly, the secondary structure content measured by circular dichroism did not notably change upon addition of zinc, demonstrating that the secondary structural elements of the protein remain folded (Supplementary Figure S3C). However, analytical size exclusion chromatography revealed that the zinc cations induce the formation of high molecular weight (high-MW) protein species that elute earlier from the column than the metal-free protein ( Figure 3A) and this effect is dependent on the concentration of the zinc cations (Supplementary Figure S3B). Indeed, in a cross-linking experiment with glutaraldehyde, the high-MW species appear predominantly in the presence of zinc (Supplementary Figure S3D). The high-MW species can revert to monomers by the addition of the divalent metal chelator EDTA ( Figure 3A). To study the impact of heavy metals on the PfGSK3 function, we measured its activity in presence of zinc at different concentrations. The metal cations cause a dramatic decrease in the PfGSK3 activity ( Figure 3B). However, the activity is completely restored after EDTA treatment. Finally, we used small-angle X-ray scattering (SAXS, Figure 3C and Supplementary Figure S3E) and negative stain EM (Supplementary Figure S3F) to characterize the high-MW species. The distance distribution derived from the SAXS data demonstrates high anisometry of the particles ( peaking at 20 nm but with D max = 90 nm). In agreement, the cross-linked high-MW species visualized by negative stain EM vary in their shape and size (Supplementary Figure S3F). In summary, heavy metal ions, such as zinc cations, induce a reversible formation of large PfGSK3 particles. While the protein maintains its secondary structure, it completely loses its enzymatic activity. Importantly, the process is reversible and the restored PfGSK3 monomers fully regain their kinase activity.

PfGSK3 exhibits autophosphorylation
PfGSK3 elutes from the IEX chromatography column as at least four different species with distinct elution volumes that are of comparable purity ( Figure 1D). Such behavior could be explained by a heterogeneous A B C phosphorylation pattern. To investigate this possibility, we separately analyzed the four top peak fractions (F1-F4) from the IEX elution profile of the wild-type PfGSK3 preparation ( Figure 1D). SDS-PAGE stained with the Pro-Q Diamond stain, which specifically binds to phosphorylated proteins, shows that the amount of total phosphorylation increases towards later elution volumes (Supplementary Figure S4A). Mass spectrometry analysis revealed that the protein is partially phosphorylated at multiple sites. We identified a number of phosphorylated residues at the N-terminus (residues 13-62, see Supplementary Figure S4B). Additionally, several residues in the activation loop, primarily S226 and Y229, but also S228 and S232 were found to be phosphorylated (Supplementary Figure S4B). The phosphorylation of the activation loop residues and two N-terminal serine residues (S31, S62) in the recombinant protein mirrors previously identified phosphorylation sites in P. falciparum blood-stage parasites [38][39][40][80][81][82][83], while the phosphorylation of other N-terminal residues, predominantly Y39, S40, S42 and S43, has not been detected in parasite lysate-based phospho-fingerprinting. We further probed into the ability of PfGSK3 for autophosphorylation and its physiological role. First, we incubated recombinant protein with ATP/MgCl 2 and subsequently analysed the ion exchange chromatography profiles. Indeed, the treatment with ATP/MgCl 2 led to a drop in the fluorescence signal from the earlier eluting peaks and gain in the fluorescence signal from later eluting peaks, indicating that PfGSK3 is able to exhibit autophosphorylation in vitro ( Figure 4A). Second, we employed the luciferase activity assay to examine PfGSK3 activity in the absence of a substrate peptide. The assay showed the consumption of ATP even in the absence of the substrate peptide that was dependent on the concentration of PfGSK3, confirming that PfGSK3 exhibits autophosphorylation ( Figure 4B).

N-terminal domain is indispensable for PfGSK3 stability
To investigate the role of phosphorylation in PfGSK3 further, we attempted to express inactive point mutants to prevent autophosphorylation. Therefore, we changed the essential residue in the ATP binding pocket K96 or residues from the activation loop S226 and Y229 to alanine. The small-scale expression screen showed that the inactive mutants of PfGSK3 did not express in a soluble form but instead formed inclusion bodies ( Figure 5A). We then isolated the inclusion bodies of the PfGSK3 K96A and PfGSK3 S226A/Y229A mutants and examined their phosphorylation states by mass spectrometry. These mutants display the full length sequence but are devoid of any phosphorylation (ATP binding pocket mutant K96A) or exhibit markedly reduced phosphorylation (activation loop mutant S226A/Y229A) compared with the wild-type protein ( Figure 5C). As expected, these mutations drastically impair PfGSK3 function and indicate that PfGSK3 phosphorylation is important for folding A B The colored points (red, green and blue) represent individual replicates, while the large black points are their average. and solubility. The latter is further supported by a co-expression experiment of wild-type PfGSK3 with the dephosphorylating lambda phosphatase, which results in insoluble material only (Supplementary Figure S5C). As we found the N-terminal extension of PfGSK3 to carry several phosphorylation sites that might play a role in enzyme active regulation, we cloned a series of N-terminally truncated protein constructs (Supplementary Figure S1A). None of these constructs expressed in a soluble form ( Figure 5B) and mass spectrometry revealed a complete lack of phosphorylation in the PfGSK3 activation loop (ΔN64 in Figure 5C). Expression tests at different temperatures indicate that not only the aforementioned mutants and constructs, but also wild-type PfGSK3 is less expressed and mainly insoluble at lower temperatures. In fact, constructs with truncated N-termini did not express at 18°C at all, and even the wild-type protein forms only inclusion bodies under such experimental settings ( Figure 5A,B). These data support the role of the PfGSK3 N-terminus and its phosphorylation for folding and function of PfGSK3.

N-terminal phosphorylation fine-tunes the activity of PfGSK3
To further probe into the role of PfGSK3 N-terminus that might represent, like its human counterpart [24,25], the domain for enzyme activity regulation, we analyzed the individual separated fractions from IEX chromatography ( Figure 1D) by LC/MS-MS. Here, we observed that the extent of phosphorylation of residues in the activation loop of PfGSK3 is comparable between the four fractions, but large differences in the N-terminal phosphorylation patterns are obvious ( Figure 6A). Specifically, all N-terminal residues are significantly less phosphorylated in the first fraction (F1), while in the second fraction (F2), the extent of phosphorylation of residues Y39 and S40 is strongly increased and the third and fourth fractions (F3 and F4) have generally higher level of N-terminal phosphorylation, which is most pronounced on residues Y39, S40, S42 and S43. Of note, only residue S40 is well conserved across Plasmodium species, but all other Plasmodium species contain multiple nonconserved N-terminal residues that are potential phosphorylation sites (Supplementary Figure S6A,G).
To assess the relationship between the activity of PfGSK3 and its phosphorylation pattern, we made use of the differences in N-terminal phosphorylation between the individual IEX fractions and measured their kinase activity separately ( Figure 6B). PfGSK3 activity rises with an increase in N-terminal phosphorylation levels (from F1 to F4), supporting that phosphorylation is crucial for activity. Phosphorylation levels of residues S40, S42 and S43 significantly correlate with the PfGSK3 activity (Pearson R = 0.6, P < 0.05 in all cases, Figure 6C-E), while this correlation is absent or insignificant for the other residues (Supplementary Figure S6B-F). These data support the hypothesis that phosphorylation of N-terminal PfGSK3 residues enhances PfGSK3 activity.

N-terminal phosphorylation affects the PfGSK3 structure
The phosphorylated residues at the PfGSK3 N-terminus are, based on structural homologs [24,25], expected to be distant from the active site of the protein and it is therefore unclear how N-terminal phosphorylation regulates the protein activity. In the human counterpart, the phosphorylation of S9 prompts the binding of the N-terminus to a pocket for a priming phosphorylation in the substrate-binding region and reduces the binding affinity for substrates [24,25]. Hence, we investigated structural consequences of N-terminal phosphorylation in PfGSK3. Circular dichroism data on the least (F1) and most (F4) phosphorylated fractions show no pronounced differences in their secondary structure content (Supplementary Figure S7A). Thus, to investigate the impact of the N-terminal phosphorylation on the tertiary structure of PfGSK3, we used the de novo structure prediction server Robetta and the TrRefineRosetta method [65] to generate initial PfGSK3 models that were further fitted against the SAXS data measured on separated PfGSK3 IEX fractions F1 and F4 (Table 1, see Figure 1D). Using the wild-type PfGSK3 sequence as well as PfGSK3 sequences with phosphomimetic mutations (see Methods), we generated 25 models ( Figure 7A) that display similar structure of the core domains but are structurally diverse in both termini ( Figure 7A). We identified a subset of models that provide the best fits to the experimental SAXS data (Table 1, Supplementary Figure S2B) of fractions F1 or F4 ( Figure 7D). Overall, the models with a larger radius of gyration (see the point sizes in Figure 7D) display lower χ 2 (goodness-of-fit) values. Further refinement of these best-fitting extended models was conducted in the program CORAL [57] that enables assigning additional flexibility to existing models ( Figure 7E,H). This approach led to poorer fits when only the C-terminus (residues 403-440) or both N-terminus and C-terminus (residues 1-64 and 403-440) were assigned flexibility ( Figure 7E,H). However, conducting refinement with a disordered C-terminus (residues 403-440) and a flexible stretch of residues between the N-terminus and the PfGSK3 core (residues 47-63, conformation B, Figure 7E,H) led to two refined models displaying lower χ 2 values compared with the SAXS data from fraction F4. Ultimately, for both fractions, the models with the lowest χ 2 values display α-helical structure at the N-terminus. This α-helix is extended in the less phosphorylated F1 fraction ( Figure 7B) and folds back in the more phosphorylated F4 fraction ( Figure 7C). The raw SAXS data support such conformational differences, as both the calculated radius of gyration R g and maximum intra-particle distance D max decrease from F1 to F4 fraction (R g 3.4 vs 3.2 nm; D max 13 vs 12 nm, respectively, see Table 1). The dimensionless Kratky plot data representation also suggests a more extended conformation for the F1 fraction (Supplementary Figure S8C), as the F1 curve broadens and shifts to higher sR g values relative to F4. The refined models fit the SAXS data well (χ 2 values of 1.19 and 1.06, respectively, Figure 7F,G), but should be viewed cautiously as an average representation of the conformations present in solution that contains a mixture of differently phosphorylated species. In conclusion, the combinatorial approach indeed shows that upon phosphorylation, PfGSK3 becomes more flexible, enabling a conformational change in the N-terminus. This change presumably mediates the increased PfGSK3 activity upon phosphorylation.

Discussion
Identification and characterization of novel drug targets in malaria remains an important goal as drug-resistant parasites are a persisting threat in the fight against malaria [2]. PfGSK3 has been identified as a potential drug target since its inhibitors also possess potent and selective anti-plasmodial activity [32,[44][45][46]. Understanding its function is therefore important for further improvement of these inhibitors.
Here, we have shown that PfGSK3 is expressed in E. coli as a folded and functional protein with a heterogeneous phosphorylation pattern. We assume that the heterogeneity originating from this diverse phosphorylation hampers its detailed structural characterization by X-ray crystallography [84]. As the expression of inactive PfGSK3 protein (K96A, S226A, Y229A) as well as the removal of the heavily phosphorylated N-terminus result in insoluble protein that lacks phosphorylation, other protein constructs and strategies will likely need to be considered for detailed structural investigation of full-length PfGSK3 in the future. At the same time, these results highlight the importance of both PfGSK3 phosphorylation and the necessity of an intact N-terminus for protein stability.
Our mass spectrometry analysis of the phosphorylation pattern on recombinant PfGSK3 confirmed a number of PfGSK3 phosphorylation sites known from previous phosphoproteomics studies (S31, S62, S226, S228, Y229, S232) [38][39][40][80][81][82][83]. The additional phosphorylation at the N-terminus (Y39, S40, S42 and S43) described here could have occurred due to the heterologous expression, but might represent an accessory regulation site if present in vivo. In hGSK3β, the autophosphorylation of Y216 (homologous to Y229 in P. falciparum) in the activation loop is thought to maintain the constitutive activity of the protein [27,29], while phosphorylation of N-terminal S9 by other kinases, such as PKA and PKB, inhibits the enzyme [6,30]. Although the role of the activation loop autophosphorylation is likely conserved between the human and P. falciparum proteins, the role of N-terminal phosphorylation seems to be markedly different.
First, the N-terminal sequence is not conserved between the two homologous proteins, with Plasmodium species displaying an unusual extension with a number of potential phosphorylation sites. Second, whereas N-terminal phosphorylation of S9 in human GSK3 is mediated by other kinases, the phosphorylation of N-terminal residues in PfGSK3 is the result of autophosphorylation, as our data implies. Last, while phosphorylation of S9 in human GSK3 inhibits its activity, our work illustrates that the increasing amount of phosphorylation on the PfGSK3 N-terminus results in an increased enzymatic activitypointing towards a different mode of kinase activity regulation.
Our structural investigation using the combination of structure prediction and its validation using experimental SAXS data provides further insights into how the N-terminal phosphorylation affects PfGSK3 structure. Two SAXS datasets originating from two differently phosphorylated IEX fractions of PfGSK3 directed the modeling towards PfGSK3 conformations with an α-helical N-terminus in both cases. However, this helix extends away from the active side of PfGSK3 in the less phosphorylated fraction, or folds back towards the center of the protein in the more phosphorylated fraction. The change in the conformation is allowed because the phosphorylation loosens the α-helical structure at the site of phosphorylation ( Figure 7B) that consequently enables formation of a loop and bending of the N-terminus towards the center of the structure ( Figure 7C). It is possible that the 'folded back' conformation ( Figure 7C) is further stabilized by an N-terminal intramolecular coiledcoil interaction, as the probability of coiled-coil formation increases with the increasing number of N-terminal PfGSK3 phosphomimetic mutations in silico (Supplementary Figure S7D). Interestingly, human GSK3β undergoes a similar conformational change, where upon phosphorylation of S9, its N-terminus binds in its substrate-  binding region and competitively inhibits its activity [24,25]. However, N-terminal phosphorylation of PfGSK3 leads to an increase in enzymatic activity, suggesting a different regulatory mechanism. The conformational change upon phosphorylation might position the very N-terminus of PfGSK3 close to its core domains, potentially enabling intramolecular interactions that could enhance its activity. It should be noted that the N-termini of PfGSK3 homologs are not well conserved even among the representatives of Apicomplexa; however, a number of residues that can be phosphorylated could still enable a similar mode of regulation as we have described here (Supplementary Figure S7A,F). Indeed, S33 of Plasmodium berghei GSK3 (PBANKA_0410400), equivalent to S40 in PfGSK3, was found to be phosphorylated in early gametogenesis [85]. Residues equivalent to S40 of PfGSK3 are conserved across all Plasmodium species and residues that can be phosphorylated are also present in the vicinity of S40 in most Plasmodium species (equivalent to residues 38, 39 and 42 in P. falciparum), suggesting that the mechanism of regulation via N-terminal phosphorylation is preserved in all Plasmodium GSK3 proteins.
We have shown that this N-terminal phosphorylation, as well as the phosphorylation of the activation loop, are mediated by autophosphorylation. Autophosphorylation of human GSK3β occurs directly after protein synthesis while the protein is associated with chaperones, such as HSP90 [27]. Interestingly, a major contaminant during our PfGSK3 purification process was the E. coli chaperone DnaK. This could indicate that this autophosphorylation also takes place directly after protein synthesis, supporting correct PfGSK3 folding.
Additionally, we have revealed another potential means of PfGSK3 regulation: divalent heavy metal ions, such as zinc, inhibit the activity of PfGSK3 by inducing the formation of heterogeneous high-MW particles. The ionic radii of these cations are in a small range of 1.09-1.21 Å, indicating that the effect is mediated through a specific binding site. The fact that the protein becomes more sensitive to the metal ions only after the cleavage of the N-terminal affinity tag indicates that this potential binding site could be located in close proximity to the N-terminus of PfGSK3. Indeed, a partially overlapping set of divalent cations inhibiting human GSK3β can be found in the literature, including lithium, beryllium, zinc, copper and mercury [17,86,87]. Lithium and beryllium ions inhibit GSK3 by competing with magnesium ions that mediate ATP binding [87][88][89], but to our knowledge, they have not been shown to induce the formation of high-MW particles. To this date, there has been no structure of a GSK3 homolog with a bound heavy metal ion deposited in the PDB database, and thus the inhibition mechanism of heavy metal cations remains unknown. Importantly, as the high-MW particles can dissociate back into active PfGSK3 monomers, it cannot be excluded that this type of regulation is applied in vivo. We have only observed incomplete dissociation of high-MW particles upon addition of excess EDTA ( Figure 3A), which could indicate that the effect is partially irreversible or that the dynamics of the dissociation of high-MW particles is slow under the experimental conditions. On the other hand, metal chelating proteins and transporters were shown to affect the levels of heavy metal ions in both prokaryotes and eukaryotes [90][91][92] and could be utilized for regulation by Plasmodium falciparum in vivo. The concentration of zinc in the cytosol of infected erythrocytes peaks at the late blood stages and the intracellular levels of free zinc ions in parasites reach concentration of 1-5 mM [93], which is one order of magnitude higher than the highest concentration used in our SEC experiments (160 mM, Figure 3A) and four orders more than the highest concentration used in the kinase assay (100 nM, Figure 3B). Moreover, the concept of 'autoinhibitory polymerization' has already been described in detail for human CK2 [94], a kinase that is homologous to GSK3 [95], also exhibits autophosphorylation [96] and its P. falciparum counterpart displays a similar expression pattern [31], subcellular localization and function [97]. It remains to be evaluated experimentally in vivo whether the PfGSK3 high-MW particles play a role in its native environment.