Structural and functional characterisation of the entry point to pyocyanin biosynthesis in Pseudomonas aeruginosa defines a new 3-deoxy-d-arabino-heptulosonate 7-phosphate synthase subclass

In Pseudomonas aeruginosa (Pae), the shikimate pathway end product, chorismate, serves as the last common precursor for the biosynthesis of both primary aromatic metabolites, including phenylalanine, tyrosine and tryptophan, and secondary aromatic metabolites, including phenazine-1-carboxylic acid (PCA) and pyocyanin (PYO). The enzyme 3-deoxy-d-arabino-heptulosonate 7-phosphate synthase (DAH7PS) catalyses the first committed step of the shikimate pathway, en route to chorismate. P. aeruginosa expresses multiple, distinct DAH7PSs that are associated with either primary or secondary aromatic compound biosynthesis. Here we report the structure of a type II DAH7PS, encoded by phzC as part of the duplicated phenazine biosynthetic cluster, from P. aeruginosa (PAO1) revealing for the first time the structure of a type II DAH7PS involved in secondary metabolism. The omission of the structural elements α2a and α2b, relative to other characterised type II DAH7PSs, leads to the formation of an alternative, dimeric, solution-state structure for this type II DAH7PS with an oligomeric interface that has not previously been characterised and that does not facilitate the formation of aromatic amino acid allosteric binding sites. The sequence similarity and, in particular, the common N-terminal extension suggest a common origin for the type II DAH7PSs from P. aeruginosa. The results described in the present study support an expanded classification of the type II DAH7PSs as type IIA and type IIB based on sequence characteristics, structure and function of the resultant proteins, and on defined physiological roles within primary or secondary metabolism.


Introduction
Pseudomonas aeruginosa (Pae) is an opportunistic human pathogen often associated with the chronic infection of patients suffering from cystic fibrosis [1]. P. aeruginosa produces a number of virulence factors [2] that are involved in bacterial quorum sensing [3] and establishing long-term infections, particularly of the lungs. Pyocyanin (PYO) is a secondary metabolite derived from phenazine-1-carboxylic acid (PCA) that has been shown to interfere with a number of cellular processes [4][5][6][7] and is an essential virulence factor required for pathogenic infection [8].

Figure 1. The shikimate pathway is responsible for the formation of aromatic compounds in microorganisms and plants
In case of P. aeruginosa, the pathway end product, chorismate, is the last common precursor for the biosynthesis of both primary and secondary aromatic metabolites.
In P. aeruginosa, the shikimate pathway end product, chorismate, acts as the last common precursor for both primary aromatic metabolism, for the biosynthesis of the aromatic amino acids phenylalanine (Phe), tyrosine (Tyr) and tryptophan (Trp), and secondary aromatic metabolism including the biosynthesis of PCA and PYO. Starting from chorismate, the first steps of PYO biosynthesis are carried out by the duplicated seven-gene operon phzABCDEFG [9], leading to the formation of PCA. Expression of the phzA-G operon is under genetic control by the LysR-like transcriptional regulator mvfR [10,11] as part of the P. aeruginosa quorum-sensing circuitry. PCA is converted into PYO through additional two steps (phzM and phzS) ( Figure 1) [12].
DAH7PSs have been classified into three broad groupings based on enzyme sequence: type Iα, type Iβ and type II [20,21]. Although less than 10% sequence identity exists between the type I and II DAH7PS groupings, all characterised examples of DAH7PSs share a common (β/α) 8 -barrel fold, a common divalent metal-ion binding site and conservation of almost all of the residues involved with E4P and PEP binding [22][23][24][25][26][27][28][29][30][31][32][33]. Various structural elements, additional to the core catalytic barrel, are associated with a diverse set of allosteric responses and the formation of alternate quaternary assemblies. The nature and location of these additional structural elements within the core catalytic barrel is characteristic of each group of DAH7PS enzymes.
While the characteristics of many examples of type I DAH7PSs have been reported, characterisation of the type II DAH7PSs has focused primarily on a group of type II enzymes that, relative to the minimalist type Iβ unadorned catalytic barrels such as Pyrococcus furiosus DAH7PS [25], contain both an approximately 75-residue N-terminal extension (typically providing elements β 0 , α 0a , α 0b and α 0c ) and an approximately 60-residue extension to loop α 2 β 3 (typically providing elements α 2a and α 2b ). For example, Mycobacterium tuberculosis (Mtu) expresses a single type II DAH7PS (MtuDAH7PS), which contains these accessory structural elements. The extra-barrel elements in MtuDAH7PS provide three distinct allosteric binding sites, on the single enzyme, that are each selective for either Trp, Tyr or Phe, and together they contribute towards a complex allosteric regulatory mechanism where binary or ternary combinations of aromatic amino acids that include Trp act synergistically to inhibit the enzyme [34][35][36]. These extensions are also responsible for the formation of the oligomeric interfaces that are present in the homotetrameric assemblies of the characterised type II enzymes. The allosteric functionality of either MtuDAH7PS or the type II DAH7PS from Corynebacterium glutamicum (CglDAH7PS) is extended by the formation of a non-covalent complex with the AroQ δ subclass of chorismate mutase (MtuCM or CglCM respectively). The formation of this non-covalent complex results in an activity boost for the CM while allowing the CM to access and utilise the allosteric machinery located on the DAH7PS [32,37,38].
In comparison, P. aeruginosa expresses two type Iα and two type II DAH7PSs. The type II DAH7PSs are encoded by the ORFs PA1901 (and duplicated as PA4212) and PA2843 (PaeDAH7PS PA1901 and PaeDAH7PS PA2843 respectively). The structure and properties of PaeDAH7PS PA2843 have recently been reported [33] and show that PaeDAH7PS PA2843 contains an N-terminal extension that is 19 residues shorter in sequence length and has similar inserted α 2a and α 2b helices, as compared with MtuDAH7PS or CglDAH7PS. Although the quaternary assemblies of MtuDAH7PS and PaeDAH7PS PA2843 resemble each other, there are some differences in orientation of the extra-barrel elements within the tetramer, such that PaeDAH7PS PA2843 is inhibited by only Trp and is unaffected by combinations of Trp with Phe and Tyr due to the abbreviated N-terminal extension.
However, based on sequence alignments, there appears to be a second distinct grouping of type II DAH7PSs [39]. These DAH7PSs appear to contain an N-terminal extension to the core (β/α) 8 -barrel, comparable with that observed in PaeDAH7PS PA2843 , but lack sequence corresponding to the inserted helices α 2a and α 2b for PaeDAH7PS PA2843 . This difference in accessory structural elements may therefore have consequences for the formation of the quaternary assembly and hence for allosteric functionality. For example, PaeDAH7PS PA1901 , found as a part of the phzA-G PCA biosynthetic cluster, is 43 amino acids shorter in sequence length relative to PaeDAH7PS PA2843 , which is not found within the phzA-G PCA biosynthetic cluster. A primary structure alignment among type II DAH7PSs (Supplementary Figure S1) indicates that PaeDAH7PS PA1901 appears to lack much of the sequence that corresponds to the α 2a and α 2b helices of PaeDAH7PS PA2843 , MtuDAH7PS or CglDAH7PS, and hence belongs to this structurally and functionally uncharacterised second group of type II DAH7PSs.
Here we report the structure of PaeDAH7PS PA1901 , characterising for the first time a short-form type II DAH7PS enzyme that is involved in secondary metabolism. The absence of the α 2a and α 2b helices relates directly to the formation of a novel, dimeric, solution-state structure for this type II DAH7PSs with altered allosteric functionality, in-line with the enzyme's role within secondary metabolism. The structure and properties of PaeDAH7PS PA1901 , in combination with those recently reported for PaeDAH7PS PA2843 [33], indicate that the evolutionary trajectory for the type II DAH7PSs may have diverged to deliver type II enzymes that function within either primary or secondary metabolism. The distinct structural and functional properties of PaeDAH7PS PA1901 , in combination with sequence characteristics of the type II DAH7PSs, suggest that the type II DAH7PSs be further classified into two groups noted here as type II A and type II B .

Sequence clustering analysis
Sequences of type II DAH7Ps were extracted from the Pfam database [40] (seed sequences of DAHP synthetase II -PF01474), aligned with Clustal [41] and submitted to jackhmmer (part of the HMMER web server [42]) to create a hidden Markov model (HMM) profile used to scan, using an E-value sequence cut-off of 10 −15 , four different databases: UniProt [43], RefSeq [44], Pfamseq [40] and NR [44]. The E-value cut-off was carefully chosen to ensure that no type I sequences were retrieved but all type II sequences would be. This was possible due to the low sequence similarity between type I and II DAH7PSs, which is much lower compared with the sequence similarity between the two identified clusters of type II DAH7PSs. Therefore, selecting an E-value cut-off at the limit of inclusion of type I sequences enabled us to retrieve all the type II sequences available in those databases, at the time of the analysis. For each database, the scan was iterated until convergence (i.e., no new sequences identified) and the results grouped together and duplicates removed using Jalview [45], leading to 2678 non-redundant type II DAH7PS sequences which were used as an input for the clustering method implemented in CLANS [46]. Specifically, after an all-against-all BLAST search of the sequences, a force-directed pairwise similarities clustering algorithm was run for more than 500 iteration cycles at a P-value of 10 −15 .
sequence-verified (Macrogen), transformed into Escherichia coli BL21*(DE3) cells and co-expressed with the chaperonins pGroES and pGroEL. Expression was achieved following the addition of 1 mM IPTG and subsequent incubation at 23 • C for 16 h. Cells were harvested by centrifugation (12000 g, 15 min). Cell lysis was achieved in lysis buffer (10 mM bis-tris propane pH 8.0, 200 mM KCl, 1 mM tris(2-carboxyethyl)phosphine hydrochloride, 200 μM PEP, 10 mM imidazole) by sonication (4 × 5-min cycles at 80% power). Cellular DNA was degraded by the addition of benzonase before the removal of cellular debris by centrifugation (40000 g, 30 min). Purification was carried out using Co 2+ affinity chromatography, incubation with TEV protease (4 • C, 3 h), and size-exclusion chromatography. In brief, the soluble fraction of the cell lysate (containing PaeDAH7PS PA1901 ) was loaded on to a talon trap column pre-equilibrated with lysis buffer. Contaminating E. coli proteins were washed through the column before isocratic elution of PaeDAH7PS PA1901 in buffer containing 10 mM bis-tris propane pH 8.0, 200 mM KCl, 1 mM tris(2-carboxyethyl)phosphine hydrochloride, 200 μM PEP, 100 mM imidazole. Protein samples were diluted (1:1) with lysis buffer immediately after elution from the column. The His 6 purification tag was cleaved by incubation with TEV protease (2 mg, 4 • C, 3 h) before the cleaved tag was removed from the protein sample by a second round of affinity purification. Protein samples were concentrated and loaded on to a Hiload TM 26/30 Superdex TM 200 column pre-equilibrated with buffer containing 10 mM bis-tris propane pH 8.0, 200 mM KCl, 1 mM tris(2-carboxyethyl)phosphine hydrochloride, 200 μM PEP. Protein concentrations were determined using a Nanodrop ND-1000 spectrophotometer, at 280 nm, using the molar extinction coefficient (54430 M −1 .cm −1 ) calculated for the protein using ProtParam. Purified protein samples were flash frozen in liquid nitrogen and stored at −80 • C.

MS
The molecular weight of PaeDAH7PS PA1901 was determined by ESI MS (Bruker maXis 3G). Protein samples were dialysed into Milli-Q water and diluted to a concentration of 0.3 mg.ml −1 prior to analysis. The molecular mass of a single chain of PaeDAH7PS PA1901 was found to be 44470 Da compared with the calculated theoretical mass of 44468 Da (ProtParam).

Enzyme kinetic assays
The activity of PaeDAH7PS PA1901 was monitored over a range of temperatures (from 35 to 50 • C) and a range of pHs (pH 6.5-8.5) based on methods previously described [26] using a Varian Cary 300 UV-Vis spectrophotometer. Metal ion dependency was determined by monitoring the activity of PaeDAH7PS PA1901 in the presence of 100 μM of various divalent metal cations. The enzyme was pre-treated with EDTA (0.5 mM, 2 h) to remove background metal ions before being buffer-exchanged into assay buffer that had been pre-treated with Chelex (Bio-Rad). PEP (Sigma) and E4P (Sigma) concentrations were held constant at 150 μM, except when determining the respective K M values, determined by monitoring the activity of PaeDAH7PS PA1901 in the presence of 10-200 μM (E4P) or 10-400 μM (PEP) of the substrate for which K M was being measured. For the inhibition studies, stock solutions of either Trp, Tyr or Phe were prepared in ultrapure water. Stock solutions of phenazine or PCA were prepared in DMSO and activity was compared with controls where phenazine or PCA was substituted for an equivalent amount of DMSO. All reactions were carried out in the presence of 100 μM Co 2+ , except when determining metal ion preference, and the reaction was initiated by the addition of purified PaeDAH7PS PA1901 . Initial reaction rates were determined using a least-squares fit of the data.

Analytical ultracentrifugation
Sedimentation velocity experiments were performed in a Beckman Coulter Model XL-I analytical ultracentrifuge equipped with UV/Vis scanning optics. Reference buffer solution (50 mM bis-tris propane, pH 7.5, 200 mM KCl, 100 μM cobalt chloride, 200 μM PEP) and sample solutions (including reference buffer solution with PaeDAH7PS PA1901 at three concentrations: 0.34 mg.ml −1 (8 μM), 1.0 mg.ml −1 (23 μM), and 1.35 mg.ml −1 (30 μM)) were loaded into 12-mm double-sector cells with standard Epon 2-channel centerpieces and sapphire windows. For the two higher concentrations (23 and 30 μM), cells were mounted in an eight-hole An-50 Ti rotor and centrifuged at 50000 rpm at 20 • C, with absorbance measurements at a wavelength of 295 nm (collected in intensity mode) recorded over a radial position range of 5.8-7.3 cm within the cells taken at sediment boundary intervals of 0.003 cm. In order to gain a more optimal signal-to-noise ratio for the lowest concentration (8 μM) and buffer without protein present, cells were mounted in a four-hole An-60 Ti rotor and centrifuged at 40000 rpm at 20 • C, with absorbance measurements at a wavelength of 240 nm (collected in intensity mode) recorded over a radial position range of 5.8-7.3 cm within the cells taken at sediment boundary intervals of 0.003 cm. Further sedimentation velocity experiments, utilising protein at 17 μM, in the presence or absence of 200 μM of either PYO, Phe, Tyr or Trp, were carried out using an eight-hole An-50 Ti rotor and centrifuged at 35000 rpm at 20 • C, with absorbance measurements at a wavelength of 290 nm recorded over a radial position range of 5.8-7.3 cm within the cell taken at sediment boundary intervals of 0.003 cm. Buffer density (1.0129 g/ml) and buffer viscosity (1.050 cP) were experimentally measured with an Anton Paar DMA4100M density meter and Anton Paar Lovis 2000 ME microviscometer respectively. The 2DSA-Monte Carlo, van Holde-Weischet, and Discrete Model Genetic Algorithm (DMGA) analyses were performed using Ultra-Scan III [47][48][49][50]. Bead modelling and hydrodynamic calculations were performed using UltraScan Solution Modeller (US-SOMO) [51,52].

Small angle X-ray scattering data collection and analysis
Size-exclusion chromatography coupled small angle X-ray scattering (SEC-SAXS) data were collected at the SAXS/WAXS beamline at the Australian Synchrotron [53] using a sheath flow sample environment [54] at 12 keV (1.0332Å), using a detector distance of 1600 mm, and at a temperature of 293 K. Data were collected immediately after elution from a Superdex S200 (5 × 150 mm) column at a flow rate of 0.2 ml.min −1 [55]. Samples were loaded on to the column at protein concentrations of 8.0, 5.0 and 1.0 mg.ml −1 in buffer containing 50 mM bis-tris propane pH 7.5, 100 μM cobalt chloride, 200 μM PEP, 5% glycerol.
Data were processed using the reduction software ScatterBrain 2.83, developed at the Australian Synchrotron. Scattering intensity (I) was plotted versus q, as a log-linear plot, and analysed using the ATSAS package [56]. Deconvolution of the data was achieved using the HPLC module of the SOMO package [52,57] by fitting two pure Gaussian functions to each SEC-SAXS dataset. GASBOR [58] was used to generate ab initio dummy residue models from the P(r) obtained from the deconvoluted data for peaks A and B, which were overlaid with the crystal structure of PaeDAH7PS PA1901 (Protein Data Bank (PDB): 6BMC).

Crystallography and structure determination
Protein crystals were prepared, by microbatch crystallisation [59], by mixing equal volumes of purified protein (final protein concentration 3-5 mg.ml −1 (67-112 μM)) with reservoir solution (0.2 M sodium fluoride, 1 mM cobalt chloride, 1 mM PEP, 18% PEG 3350) and incubating at 278 K for 1-2 days. Crystals were flash frozen at 110 K in cryoprotectant containing 25% glycerol and mother liquor. X-ray diffraction data were collected at the Australian Synchrotron using the MX2 beamline [60], equipped with an Eiger 16M detector, at a wavelength of 0.9536Å. Diffraction data were processed using XDS [61] and AIMLESS [62], and the structure of PaeDAH7PS PA1901 was solved by molecular replacement (MOLREP) [63] using a single chain of PaeDAH7PS PA2843 (PDB: 5UXM) [33] as the search model. All ligands and waters were removed from the search model prior to molecular replacement, as were residues corresponding to the inserted helices α 2a and α 2b . The sequence identity between the search model and the target protein was 43%. The model was built using COOT [64] and refined with REFMAC [65].

Interface analysis
PISA [66] was used to visualise and examine the residues involved in interface formation. LSQKAB [67] was used to superpose and compare the structures.

PDB accession codes
Atomic co-ordinates and structure factors for the structure described in this work have been deposited in the PDB with the accession code 6BMC.

Clustering of type II DAH7PS sequences reveals an uncharacterised subgroup of type II enzymes
Clustering of type II DAH7PSs, based on pairwise sequence similarity, enables the identification of two main clusters of sequences presenting high intra-and low inter-cluster sequence similarity ( Figure 2). The main cluster contains sequences corresponding to full-length type II DAH7PSs (including PaeDAH7PS PA2843 , MtuDAH7PS and CglDAH7PS) that contain both an N-terminal extension and the α 2a and α 2b inserted helices. However, a second distinct group of sequences, which are distant from the main cluster, is also evident. Sequences from this second grouping (of which PaeDAH7PS PA1901 is a member) are shorter in sequence length, relative to those found in the main type II DAH7PS cluster, due to the predicted omission of the sequence corresponding to the α 2a and α 2b helices. Although there is high sequence homology among members of each subgrouping (for example, PaeDAH7PS

Figure 2. CLANS clustering analysis of type II DAH7PS sequences reveals two distinct groups of type II DAH7PSs
Each dot represents a type II DAH7PS sequence. The main group of type II DAH7PSs (1) is indicated by the red dots. The second group of type II DAH7PSs (2) is indicated by the blue dots. Lines connecting the dots indicate the sequence similarity relationship at the BLAST P-value cut-off of 10 −50 , the darker the colour, the higher the sequence similarity. Crosses marked (a-e) correspond to the sequences of PaeDAH7PS PA1901 , PaeDAH7PS PA2843 , MtuDAH7PS, CglDAH7PS and Helicobacter pylori DAH7PS (HpyDAH7PS) respectively.

PaeDAH7PS PA1901 is insensitive to aromatic amino acids or PCA
The purified recombinant PaeDAH7PS PA1901 was found to be catalytically active over a range of temperatures between 35 and 50 • C and over a range of pH between pH 6.5 and 7.5 (Supplementary Figure S2), in contrast with PaeDAH7PS PA2843 where maximal activity is observed over a narrow range of temperatures and pH [33]. Maximal PaeDAH7PS PA1901 activity was observed at pH 7.5 and 45 • C. Metal ion preference was investigated by monitoring the activity of PaeDAH7PS PA1901 in the presence of various divalent metal cations, and it was found that Mn 2+ was most the activating ( Figure 3A). Subsequent assays were carried out at pH 7.5, 37 • C in the presence of Co 2+ in order to provide a comparison with PaeDAH7PS PA2843 , which exhibits maximal activity under these conditions [33].  PaeDAH7PS PA2843 , and the turnover number, k cat , for PaeDAH7PS PA1901 was determined to be 19.8 The activity of PaeDAH7PS PA1901 was monitored in the presence of increasing concentrations of the aromatic amino acids Trp, Tyr, Phe or the secondary metabolites phenazine or PCA. At concentrations up to 200 μM Trp, Tyr, Phe, phenazine or PCA, PaeDAH7PS PA1901 activity was found to be comparable with that observed in the absence of aromatic amino acids or secondary metabolites, analogous to the allosteric behaviour of the unregulated type Iβ DAH7PSs [69] (Figure 3B,C). Combinations of aromatic amino acids appear to have no inhibitory effect on PaeDAH7PS PA1901 activity similar to that observed in the absence of aromatic amino acids ( Supplementary Figure S3). The observed absence of allosteric sensitivity in PaeDAH7PS PA1901 is in contrast with MtuDAH7PS or PaeDAH7PS PA2843 where allosteric inhibition was observed under the same conditions that were used to evaluate the allosteric properties of PaeDAH7PS PA1901 . In particular, in MtuDAH7PS, any binary or ternary combination of aromatic amino acids that includes Trp acts to synergistically inhibit the enzyme [34][35][36] or, in PaeDAH7PS PA2843 , sensitivity to Trp alone was observed, but this sensitivity was diminished in comparison with that observed for MtuDAH7PS [33].

The crystal structure of PaeDAH7PS PA1901 reveals novel quaternary assembly
The crystal structure of PaeDAH7PS PA1901 (phzC) was solved (resolution 2.70Å, R free = 0.280) in complex with the substrate PEP and a Co 2+ ion, with attached water molecule, bound at the active site, revealing for the first time the structure of a short-form type II DAH7PS that is involved in secondary (here phenazine) metabolism. PaeDAH7PS PA1901 crystallised in the space group C222 1 , with two DAH7PS chains present in the asymmetric unit. Application of a two-fold crystallographic symmetry operation results in the assembly of a homotetrameric species, which comprises both a major and minor interfaces. Chain A residues 119-123, 172-177 and 389-405, and chain B residues 121-123, 170-177 and 389-405 are not resolved in this structure and were therefore not included in the final model ( Figure 4). Data collection and refinement statistics are shown in Table 2.
The active site for PaeDAH7PS PA1901 is located at the C-terminal end of the core (βα) 8 catalytic barrel and is comparable with that observed among the type II DAH7PSs in terms of residue identity. The PEP phosphate group is co-ordinated by atoms Glu217 N, Arg218 NH1, Arg271 NE, Arg271 NH2 and Lys240 NZ whereas the carboxylate group of PEP is co-ordinated by atoms Arg106 NH1 and Lys240 NZ ( Figure 5 and Supplementary Figure S4).  The minor interface (indicated by the cyan dashed line) is formed by residues primarily located on helix α 2b . 95 The major interface is formed by residues primarily located on helices α 4 and α 5 (and β-strands β 5a and β 5b ) for the type Iα DAH7PSs [22,30,70] or by residues primarily located on helices α 6 and α 7 for the type Iβ DAH7PSs [25,31] or by residues primarily located on helices α 0c and α 2 for the type II DAH7PSs [26,32,33]. Based on inspection of the protein sequence, and comparison with that of PaeDAH7PS PA2843 or MtuDAH7PS, we naïvely anticipated that the interface involving helices α 0c and α 2 in PaeDAH7PS PA2843 or MtuDAH7PS would be preserved in PaeDAH7PS PA1901 . Surprisingly, this is not the case ( Figure 6). The major interface for PaeDAH7PS PA1901 is instead formed through interactions between chains A and B (or chains C and D) primarily involving residues located on helices α 1 and α 8 , assembling a completely distinct major interface compared with that observed for any of the DAH7PSs characterised to date. For PaeDAH7PS PA1901 , a total of 22 residues from chain A and 23 residues from chain B are involved in the formation of this major interface, burying 840Å 2 (5.5%) or 819Å 2 (5.3%) of the surface area of each chain respectively, comparable with the surface area involved in the formation of the major interface observed    for either PaeDAH7PS PA2843 or MtuDAH7PS [26,32,33]. Four equivalent pairs of salt bridges were identified to form between chain A atoms Glu77 OE1 (and Glu77 OE2), Glu87 OE1 (and Glu87 OE2), Arg94 NH1 (and Arg94 NH2), Arg103 NH1 (and Arg103 NH2) and chain B atoms Arg94 NH2, Arg103 NH2, Glu77 OE1, and Glu87 OE1 respectively (Figure 7). A pair of equivalent hydrophobic contacts are made between chain A residue Leu 95 and chain B residues Pro 348 and Val 85 (and vice versa), and between chain A residue Ala 92 and chain B residue Leu 88 (and vice versa). Hydrophobic contacts are also found between chain A residue Leu 88 and two residues from chain B: Leu and Leu 88 (and vice versa). Further hydrophobic contacts are found between chain A residues Trp 379 , Leu 382 and chain B residues Met 386 and Leu 382 respectively. Unexpectedly, despite the absence of the α 2a and α 2b helices in PaeDAH7PS PA1901 , which are associated with the formation of the minor interface in either PaeDAH7PS PA2843 or MtuDAH7PS, an alternative interaction to deliver a homotetrameric species is observed in PaeDAH7PS PA1901 (Supplementary Table S3). This alternative minor interface for PaeDAH7PS PA1901 is formed through interaction between chains A and D (or chains B and C) by residues located primarily on loop α 3 β 4 and is distinctive from the minor interface observed for all characterised DAH7PSs (both type I and II). In comparison, the minor interface is formed by residues primarily located on helix α 0a for the type Iα DAH7PSs [22,30,70] or by residues located on helices α 4 and α 5 for the type Iβ DAH7PSs [25,31] or by residues primarily located on helices α 2a and α 2b for the type II DAH7PSs [26,32,33]. For PaeDAH7PS PA1901 , a total of 20 residues each from chain A or D are involved in the formation of this minor interface with a buried surface area of 598 A 2 (4.0%) or 602Å 2 (4.0%) respectively. A pair of salt bridges is formed between chain A atom Asp2 OD1 and chain D atom Arg203 NH2 (likewise for chain A atom Arg203 NH2 and chain D atom Asp2 OD1) as well as a hydrogen bond between chain A atom Arg198 NE and chain D atom Glu201 O (likewise for chain A atom Glu201 O and chain D atom Arg198 NE). In addition, a limited suite of hydrophobic contacts is found between methylene groups of Gln 202 and Arg 199 in chain A and Pro 36 and Arg 203 in chain D (and vice versa).

RMSDs from ideality
For MtuDAH7PS, three distinct aromatic amino acid allosteric binding sites exist that are each selective for either Trp, Tyr or Phe. The Phe and Trp sites are located at the oligomeric interfaces and are intimately associated with the formation of the quaternary assembly [34,36,71]. In comparison, for PaeDAH7PS PA2843 a single allosteric binding site exists at the tetramer interface that is sensitive for Trp [33] and structurally comparable with the Trp site of MtuDAH7PS. For PaeDAH7PS PA1901 , the alternative oligomeric interfaces and subsequent formation of a significantly different quaternary assembly, relative to either PaeDAH7PS PA2843 or MtuDAH7PS, disrupts completely the formation of any aromatic amino acid allosteric binding sites that are comparable with those observed for either PaeDAH7PS PA2843 or MtuDAHPS. Consistent with this is the observation made during functional characterisation that PaeDAH7PS PA1901 is insensitive to allosteric regulation by aromatic amino acids, confirming that PaeDAH7PS PA1901 functions primarily within secondary metabolism.

Solution-state structure of PaeDAH7PS PA1901
SEC-SAXS data were collected using three different starting protein concentrations: 1.0, 5.0 and 8.0 mg.ml −1 (22-180 μM) to investigate the solution-state structure of PaeDAH7PS PA1901 and the concentration dependency of quaternary structure ( Figure 8 and Table 3, Supplementary Figure S5 and Tables S1 and S2).
For the SAXS data collected using an injection concentration of 8.0 mg.ml −1 (180 μM), PaeDAH7PS PA1901 eluted as a single peak with a trailing back edge, indicating polydispersity in the sample. The scattering data were deconvoluted using the HPLC module of the SOMO package through the fitting of Gaussian functions to the SEC-SAXS data [52,55,57]. The analysis indicated that there were at least two protein populations contributing to the single elution peak of the SEC-SAXS data. Two pure Gaussian functions were applied to the data, resulting in two distinct scattering profiles; peak A and peak B. Peak A represents the front edge of the elution peak (R g = 36.0 + − 1.2Å, d max = 114Å) while peak B was found to spread across the entire elution peak (R g = 33.0 + − 1.4Å, d max = 99Å). The calculated d max values from the crystal structure of PaeDAH7PS PA1901 (PDB: 6BMC) for the tetramer, dimer, or monomer are 115.5, 93.3, or 62Å respectively, with the calculated d max values for peaks A and B more closely resembling that determined from the tetrameric or dimeric crystal structures of PaeDAH7PS PA1901 respectively. In addition, the calculated R g values from the crystal structure of PaeDAH7PS PA1901 for the tetrameric, dimeric, or monomeric species are 39.2, 29.2, and 20.9Å respectively, with the calculated R g values for peaks A and B more closely resembling those determined   from the tetrameric or dimeric crystal structures of PaeDAH7PS PA1901 respectively. Estimated molecular weights for peaks A and B were calculated using SAXS MoW, which is typically accurate within + − 10% [72]. The estimated molecular weights for peaks A and B were 124.5 and 84.6 kDa respectively and are comparable, albeit slightly smaller, with the expected molecular weights for the tetrameric or dimeric PaeDAH7PS PA1901 of 177.88 and 88.94 kDa respectively. Ab initio bead models (GASBOR) were generated from the deconvoluted scattering profiles obtained for both peaks A and B to reconstruct the solution-state tetrameric and dimeric species of PaeDAH7PS PA1901 and to compare the resultant bead models with the oligomeric assemblies observed in the crystal structure (PDB: 6BMC) (Figure 9). Comparison between the theoretical scattering profiles calculated from the ab initio models and the deconvoluted experimental data ( Figure 9C,F) suggests that the ab initio models are representative of the solution-state tetrameric and dimeric species of PaeDAH7PS PA1901 , which are remarkably similar to those observed in the crystal structure.
Due to the decreased signal-to-noise ratio for the SEC-SAXS data collected using an injection concentration of 1.0 mg.ml −1 (22 μM), deconvolution of this dataset was not attempted. CRYSOL analysis of the SEC-SAXS data, collected using an injection concentration of 1.0 mg.ml −1 , indicates that the enzyme exists primarily in the dimeric form (χ 2 = 0.31 for the fit of the dimeric crystal structure PDB: 6BMC to the experimental data, Figure 10). The d max value determined from the 1.0 mg.ml −1 SEC-SAXS data of 100.2Å is consistent with the d max value determined either from the dimeric crystal structure of PaeDAH7PS PA1901 (93.3Å) or for the deconvoluted peak B (99.0Å). In addition, the SAXS MoW estimated molecular weight of 95.0 kDa from this low concentration SEC-SAXS data is in close agreement, albeit slightly larger, with the value estimated from the deconvoluted peak B (84.6 kDa) and the expected molecular weight for dimeric PaeDAH7PS PA1901 (88.94 kDa). The SEC-SAXS parameters determined for the data collected using an injection concentration of 1.0 mg.ml −1 , in combination with those determined for the deconvoluted 8.0 mg.ml −1 data, show that PaeDAH7PS PA1901 exists in a concentration-dependent equilibrium that favours the dimeric form on decreasing enzyme concentration.
Analytical ultracentrifugation (AUC) experiments carried out at enzyme concentrations ranging from 0.34 to 1.35 mg.ml −1 (8-30 μM) were used to confirm the oligomeric state of PaeDAH7PS PA1901 in solution. Analyses of the absorbance data, collected in intensity mode, by van Holde-Weischet analysis reveal half-parabola shaped s-distributions, which shift to the right ( Figure 11A) upon increasing protein concentration, suggesting an interacting, reversible system [50]. Non-interacting species between 1-2 S are likely sedimenting buffer components, as illustrated by analysis of buffer without protein present ( Figure 11A). 2DSA-Monte Carlo sedimentation coefficient distributions reveal species with sedimentation coefficients between 5.8 and 6.8 S (Figure 11B), consistent with a molecular weight in the range of 70-96 kDa (Supplementary Figure S6), suggesting that at these concentrations, PaeDAH7PS PA1901 exists predominantly as a homodimer. Species at ∼3 S, present in the 8 μM distribution (collected at 240 nm), are likely buffer components that absorb at wavelengths lower than 280 nm, as these species are also present in distributions (also collected at 240 nm) of buffer without protein (data not shown), and to a lesser extent in the 11, 23, and 30 μM samples ( Figure 11B). A bead model based on the dimeric crystal structure of PaeDAH7PS PA1901 (PDB: 6BMC) was created with US-SOMO and used to calculate a theoretical sedimentation coefficient of 5.5 S, further suggesting that the species observed for PaeDAH7PS PA1901 is primarily dimeric. Additional sedimentation velocity experiments, carried out in absorbance mode in the presence of 200 μM of either PYO, Phe, Tyr or Trp, and analysed by van Holde-Weischet analysis, indicate that the presence of either PYO or aromatic amino acids does not influence the oligomeric state of the protein ( Figure 11C).
While the formation of a tetrameric species for PaeDAH7PS PA1901 is observable both in the crystal structure and in solution by SAXS at high injection concentrations (112-180 μM), the nature of the alternative minor interface (and lack of hydrophobic interactions), in combination with the observation of a primarily dimeric species by AUC at protein concentrations less than 30 μM, suggests that at physiological concentrations PaeDAH7PS PA1901 predominantly persists in the dimeric form. The observation of higher-order solution-state species by SEC-SAXS appears to be the consequence of high enzyme concentration.

Evolutionary implications
The structural similarities between the N-terminal extensions (helices α 0a , α 0b and α 0c ) found in PaeDAH7PS PA1901 , PaeDAH7PS PA2843 or MtuDAH7PS, suggest a common origin for this structural element in the type II DAH7PSs. The distinct functionalities of the N-terminal extension within these three enzymes (burying a hydrophobic surface or interface formation for the delivery of allosteric binding sites or combinations thereof), coupled with the physiological roles of these enzymes within primary or secondary metabolism, indicate an evolutionary divergence. The evolutionary trajectory for the type II DAH7PSs appears to have diverged to deliver both an unregulated dimeric group of type II DAH7PSs, suitable for a role within secondary metabolism, and a regulated tetrameric group of type II DAH7PSs that functions within primary metabolism.
For the type II DAH7PSs from P. aeruginosa, direct control of enzymatic activity by pathway end products appears largely superfluous as genetic level regulation may be better suited to differentially regulate the expression of multiple DAH7PSs, that function within primary or secondary metabolism, where the presence of aromatic amino acids acts to divert metabolic flux away from primary metabolism and towards the biosynthesis of PCA and its derivatives. Under these conditions, the DAH7PSs that are involved directly within primary metabolism would likely be allosterically inhibited by Trp, Tyr or Phe and thus unavailable to provide chorismate to support the biosynthesis of secondary aromatic metabolites. The presence of PaeDAH7PS PA1901 within the phzA-G biosynthetic cluster allows for the synchronised expression of these proteins required for PCA production. The omission of the α 2a and α 2b helices in PaeDAH7PS PA1901 , and subsequent insensitivity to allosteric inhibition by Trp, Tyr or Phe, allows for the continued production of chorismate under conditions of high aromatic amino acids, consistent with the alternative, dimeric solution-state structure observed for PaeDAH7PS PA1901 .

Conclusion
The structure of PaeDAH7PS PA1901 further highlights the complex evolutionary trajectory for the type II DAH7PSs that has delivered type II enzymes which exhibit a diverse range of quaternary assemblies, and associated allosteric functionalities, required to support the efficient production of chorismate within either primary or secondary metabolism. PaeDAH7PS PA1901 adopts a dimeric solution-state structure, unlike any other quaternary association observed for the DAH7PSs characterised to date. Surprisingly, PaeDAHPS PA1901 contains a novel major interface that has not previously been characterised in any DAH7PS. The formation of this alternative major interface in PaeDAH7PS PA1901 , relative to either of the oligomeric interfaces observed in PaeDAH7PS PA2843 or MtuDAH7PS, disrupts completely the formation of any aromatic amino acid allosteric binding sites that are comparable with those observed in PaeDAH7PS PA2843 or MtuDAH7PS. The subsequent insensitivity of PaeDAH7PS PA1901 to allosteric inhibition by aromatic amino acids is compatible with delivering chorismate to support secondary metabolism, in contrast with PaeDAH7PS PA2843 or MtuDAH7PS, which are sensitive to either Trp or combinations of aromatic amino acids that include Trp, and function primarily within primary metabolism.
Clear sequence diversity exists between the two type II DAH7PS groups identified by sequence clustering analysis. These different sequence characteristics translate directly into two groups of type II DAH7PSs that form significantly different oligomeric interfaces and quaternary assemblies with associated distinct allosteric functionalities. In addition, these differences in quaternary assembly and allosteric behaviour between the two type II DAH7PS groups relate to their defined physiological roles within either primary or secondary metabolism. On this basis, we propose that there is sufficient diversity between these two groups of type II DAH7PSs, both in terms of primary structure and functionality of the resultant enzymes, that the type II DAH7PSs be further categorised as type II A and type II B . The type II A DAH7PSs comprise full-length enzymes containing both an N-terminal extension and the α 2a and α 2b helices (for example PaeDAH7PS PA2843 , MtuDAH7PS or CglDAH7PS). Type II A DAH7PS function primarily within primary metabolism, whereas the type II B DAH7PSs comprise short-form enzymes that contain the N-terminal extension but omit the α 2a and α 2b helices and these function primarily within secondary metabolism (for example PaeDAH7PS PA1901 ).