Cell walls of marine macroalgae are composed of diverse polysaccharides that provide abundant carbon sources for marine heterotrophic bacteria. Among them, Zobellia galactanivorans is considered as a model for studying algae–bacteria interactions. The degradation of typical algal polysaccharides, such as agars or alginate, has been intensively studied in this model bacterium, but the catabolism of plant-like polysaccharides is essentially uncharacterized. Here, we identify a polysaccharide utilization locus in the genome of Z. galactanivorans, induced by laminarin (β-1,3-glucans), and containing a putative GH5 subfamily 4 (GH5_4) enzyme, currently annotated as a endoglucanase (ZgEngAGH5_4). A phylogenetic analysis indicates that ZgEngAGH5_4 was laterally acquired from an ancestral Actinobacteria. We performed the biochemical and structural characterization of ZgEngAGH5_4 and demonstrated that this GH5 is, in fact, an endo-β-glucanase, most active on mixed-linked glucan (MLG). Although ZgEngAGH5_4 and GH16 lichenases both hydrolyze MLG, these two types of enzymes release different series of oligosaccharides. Structural analyses of ZgEngAGH5_4 reveal that all the amino acid residues involved in the catalytic triad and in the negative glucose-binding subsites are conserved, when compared with the closest relative, the cellulase EngD from Clostridium cellulovorans, and some other GH5s. In contrast, the positive glucose-binding subsites of ZgEngAGH5_4 are different and this could explain the preference for MLG, with respect to cellulose or laminarin. Molecular dynamics computer simulations using different hexaoses reveal that the specificity for MLG occurs through the +1 and +2 subsites of the binding pocket that display the most important differences when compared with the structures of other GH5_4 enzymes.

Introduction

For a long time, the presence of mixed-linked glucans (β-(1,3-1,4)-glucans, MLG) in primary cell walls was considered a unique feature that has evolved in flowering plants (for review, see, for example, ref. [1]). This vision was first challenged by a large and systematic analysis across the plant kingdom using a glycan microarray approach, which highlighted that MLGs were also present in some less commonly found, early diverging vascular plants and freshwater green algae [2,3]. Surprisingly and more recently, β-(1,3-1,4)-glucans have been identified in the cell wall of brown macroalgae [4]. Well studied for their occurrence in the cell walls of grasses, these β-(1,3-1,4)-glucans are a major component of carbohydrate storage compounds in the endosperm of cereals, such as barley, rice or wheat [2]. These glucans consist of linear chains of β-1,3- and β-1,4-linked glucosyl residues, and the pattern of the distribution of these two linkages varies according to the plant botanical origin and growth conditions [4]; in particular, the distribution of β-1,3-linkages was found to be more frequent in the marine brown algae [4].

Involved in important carbon storage catabolizing processes, enzymes that efficiently hydrolyze these substrates [frequently named lichenases, mixed-linked-glucanases or termed β-(1,3-1,4)-glucanases] are found largely distributed in many kingdoms of life (i.e. plants, bacteria and fungi) and their sequences are present in numerous glycoside hydrolase (GH) families, which are GH5, GH9, GH16, GH17 and GH26 according to the CAZy (carbohydrate active enzymes) database (http://cazy.org; [5]). Among these different GH families, to date, most characterized bacterial β-(1,3-1,4)-glucanases are found in the families GH16 [6] and GH5 [7] based on the CAZy database [5].

GH5 is one among the large families in the CAZy database, with more than 12 000 available sequences. Enzymes belonging to this family are retaining glycoside hydrolases that operate via the classical Koshland double-displacement mechanism [8]. The first crystallographic structure of a member of the GH5 family, solved in 1995 [9], was considered a pure β-1,4-glucanases (cellulase). It revealed a (β/α)8 barrel fold, common to several other GH families, founding the structural clan GH-A. Since then, up to 20 different activities have been reported for this large family [7], hindering the assignment of enzyme specificity, although they are predicted to be involved mainly in plant cell wall degradation. Family GH5 has recently been subdivided into 51 subfamilies to improve correspondence between specificity and sequence [7]. Several recent structure–function studies [1012], covering various GH5 subfamilies with formerly undefined specificities, have shed new light on important residues lining the catalytic active site cleft that govern substrate specificity.

GH5 enzymes are relatively frequent in marine Bacteroidetes, especially in Flavobacteriia, which are the prevalent class of Bacteroidetes in the marine ecosystem [13]. However, Flavobacteriia species do not efficiently degrade crystalline cellulose [14,15]. Regrettably, and without taking into account its polyspecificity, the GH5 family has often been used as a ‘marker’ for cellulose occurrence in marine environments in microbial ecology studies. Nonetheless, a study highlights that the abundance of GH5 enzymes (mainly belonging to Gammaproteobacteria, Firmicutes and Actinobacteria) positively correlates with chlorophyl concentration in the eastern part of the North Atlantic Ocean, and that the diversity of GH5 enzymes was greater in coastal water than in the open ocean [16].

In the present study, we have analyzed, using multiple biochemical approaches, the detailed structure–function relationship of one of the three GH5 enzymes from Zobellia galactanivorans DsijT, a model macroalgae-associated bacterium [13]. The gene name of this GH5 enzyme (engA) was given in the initial genome annotation of Z. galactanivorans [13] by homology to the closest characterized enzyme, the endoglucanase EngD from Clostridium cellulovorans [17]. The corresponding recombinant enzyme will thus be named here ZgEngAGH5_4. The evolutionary trail of this enzyme leading to its presence in the genome of this marine flavobacterium is also discussed.

Experimental

Unless otherwise stated, all chemicals were purchased from Sigma.

Phylogenetic analysis

Homologues ZgEngAGH5_4 (gene: engA; systematic ID: ZGAL_208) were identified using BlastP at the GenBank database. These sequences were aligned using MAFFT with the iterative refinement method and the scoring matrix Blosum62 [18]. This multiple alignment allowed the calculation of model tests and maximum-likelihood trees with MEGA version 6.0.6 [19]. Tree reliability was tested by bootstrap using 100 resamplings of the dataset. The trees were displayed with MEGA 6.0.6.

Cloning and site-directed mutagenesis

The engA gene encodes a 397 amino acids protein, which includes a peptide signal (residues 1–20, analyzed with LipoP 1.0 [20]) and a large GH5 module (residues 21–397) (Supplementary Figure S1). For the biochemical and structural characterizations, the precise boundaries of the catalytic module were identified using the Hydrophobic Cluster Analysis (HCA) plot [21]. Genomic DNA from Z. galactanivorans was prepared as previously described [22]. The primers forward (5′-ggggggagatctaatatgagggagatagcccctaag-3′; BglII restriction site is underlined) and reverse (5′-cccccccaattgttacttaacaatggcctcggcaatttc-3′; MfeI restriction site is underlined), deduced from the engA gene of Z. galactanivorans (GenBank accession no. CAZ94281.1), were used to amplify the sequence encoding for the catalytic module (residues 56–385) (Supplementary Figure S1). After digestion with the restriction enzymes BglII and MfeI, the purified PCR product was ligated using the T4 DNA ligase into the expression vector pFO4 predigested by BglII and MfeI, resulting in a recombinant protein with an N-terminal hexahistidine tag. The plasmid was then used to transform Escherichia coli DH5α strain for storage and E. coli BL21 (DE3) strain for expression as described in ref. [23]. The sequence of the gene was checked using a genetic analyzer ABI 3130xl (Applied Biosystems) equipped with 50 cm capillaries and POP7 polymer. Site-directed mutagenesis of ZgEngAGH5_4 was performed using the QuikChange II XL site-directed mutagenesis kit according to the manufacturer's instructions (Agilent Technologies). Sixteen individual mutations were produced using specific forward and reverse primers described in Supplementary Table S1. Mutated plasmids were then used to transform E. coli XL-10 GoldR ultracompetent cells (Agilent Technologies) for storage and E. coli strains BL21(DE3) (NovagenR) for protein expression. Mutated plasmids were sequenced to confirm the effectiveness and the position of the mutation.

Production and purification of ZgEngAGH5_4 and mutant ZgEngAGH5_4_E323S

Unless otherwise stated, experiments were performed at 20°C. Transformed E. coli strains BL21(DE3) (NovagenR) were grown for 72 h in 250 ml ZYP 5052 medium [24] containing 100 μg ml−1 ampicillin. E. coli BL21 (DE3) bearing pFO4 without insert was used as the negative control. Culture was stopped by centrifugation at 3000g for 20 min at 4°C and the pellet was stored at −20°C until further use. The pellet was then suspended in 5 ml of 25 mM Tris–HCl (pH 7.5), 100 mM NaCl, 15 mM imidazole (buffer A) containing 5 μl DNAse I (500 units μl−1). The suspension was incubated for 20 min at 4°C. The cells were then disrupted using a Cell disruption system (Constant Systems Ltd). After centrifugation for 1 h at 29 000g and 4°C, the cell-free supernatant was then 0.2 µm filtered before being loaded at a flow rate of 1 ml min−1 onto a HisPrep FF 16/10 column (1.6 × 10 cm, GE Healthcare) equilibrated in buffer A. The column was washed at a flow rate of 2 ml min−1 with buffer A until the absorbance at 280 nm was negligible. Elution of the protein was performed at the same flow rate using a linear gradient increasing from 15 to 500 mM imidazole. The final concentration of imidazole was reached after 10 column volumes. Fractions (2 ml) were collected during the elution step. Fractions containing the recombinant tagged enzyme were estimated by SDS–PAGE analysis and Western blot. Transfer from SDS gel onto ready to use 0.2 µm nitrocellulose membrane (Bio-Rad) was performed using a Trans-Blot Turbo system in the conditions specified by the manufacturer (Bio-Rad). Monoclonal anti-polyhistidine peroxidase conjugate (Sigma) was used at a final concentration of 1/10 000 to specifically recognize the His-tagged fusion protein. Immuno-detection was performed by chemiluminescence using the Clarity Western ECL Substrate kit (Bio-Rad) and visualization was achieved using the Chemi-Capt 50001 software. Fractions containing the his-tagged protein were then pooled prior being loaded at a 2 ml min−1 flow rate on top of an HiPrep Desalting FF 26/10 column (2.6 × 10 cm, GE Healthcare) previously equilibrated in 25 mM Tris–HCl (pH 7.5), 100 mM NaCl (buffer B). The same flow rate was used during the elution step and 1 ml fractions were collected. Purity of the desalted ZgEngAGH5_4 and ZgEngAGH5_4_E323S was further checked by SDS–PAGE analysis and dynamic light scattering (DLS).

Protein quantification

Protein amount was estimated at 280 nm using a Thermo Scientific NanoDrop One spectrophotometer. A molar extinction coefficient of 85 500 M−1 cm−1 and a molecular mass of 37.5 kDa, both deduced from the protein sequence, were used to calculate the concentration of ZgEngAGH5_4 protein solutions.

Enzymatic activity assay of pure enzymes

Unless otherwise stated, assays were performed using β-d-glucan from barley [0.2% (w/v) in 50 mM MES [2-(N-morpholino)ethanesulfonic acid hydrate] buffer pH 6.5] as a substrate. The activity was determined using the reducing sugar assay described by Kidby and Davidson [25]. Reactions were performed at 30°C upon incubation of 180 µl of substrate with 20 µl ZgEngAGH5_4 (100 nM). Reaction mixture (20 µl) was withdrawn every 15 s and up to 105 s and added to 180 µl of ferricyanide reagent. The samples were then incubated for 15 min at 95°C and cooled down to 20°C. The absorbance was read at 420 nm using a Spark 10M microplate reader (Tecan, Switzerland). A calibration curve was performed under the same conditions, using glucose solutions at different concentrations (from 0.1 to 1.2 mM) as standard.

Substrate specificity of ZgEngAGH5_4

To assess the enzyme specificity among glycan polysaccharides, degradation of the following substrates was assayed: β-d-glucan from barley, lichenan, glucomannan, xyloglucan, carboxymethylcellulose (CMC), Avicel, laminarin and curdlan. Activity was measured using the ferricyanide assay described above. Unless otherwise stated, all these substrates were used at a final concentration of 0.2% (w/v) in 50 mM MES, pH 6.5. The enzymatic activity was expressed in min−1.

To refine the characterization of substrate specificity for ZgEngAGH5_4, standard commercial oligosaccharides were used as substrates and the hydrolysis products were analyzed by HPAEC (high-performance anion exchange chromatography) coupled with pulse amperometry. Based on the major activities on polysaccharides, the following oligosaccharide substrates were chosen: laminaribiose (G3G), and different cello-oligosaccharides (G2, G3, G4, G5 and G6), and also β-(1,3-1,4)-oligosaccharides (G3GG, GG3G, GGG3G and GG3GG). All oligosaccharide substrates were purchased from Megazyme, except for GG3GG (Carbosynth). Briefly, 2 ml of oligosaccharides (100 µM) were incubated with 150 µl recombinant ZgEngAGH5_4 (0.5 µM). Aliquots (170 µl) of the reaction mixture were taken at different times (from 0 to 120 min) and boiled for 15 min to stop the reaction. Samples were then filtered through a 4 mm syringe filter (Millipore) and 20 µl were injected onto a CarboPac PA1 column (4 × 200 mm, Thermo Scientific) equipped with the accompanying guard column (4 × 50 mm, Thermo Scientific), both thermostated at 30°C. Elution was carried out using an isocratic flow rate of 1 ml min−1 with 175 mM NaOH containing 50 mM NaOAc. Detection of the oligosaccharides was performed by integrated amperometry using a quadruple pulse waveform (E1 + 0.1, E2 − 2.0, E3 + 0.6 and E4 − 0.1). Integration of signal intensities was performed using the Chromeleon 6.80 software. Calibration of the different oligosaccharides was done using different concentrations of appropriate oligosaccharides from which a dose–response curve was determined using the Chromeleon software.

Fluorophore-assisted carbohydrate electrophoresis (FACE) was performed to further analyze the specificity of ZgEngAGH5_4 on oligo- and polysaccharides. Depending on the objective, labeling of oligosaccharides was either performed prior hydrolysis with ZgEngAGH5_4 or after hydrolysis. However, the applied reaction conditions were the same. Briefly, 100 µg of poly- or oligosaccharides were labeled with 2 µl of 150 mM 8-aminonaphthalene-1,3,6-trisulfonate (ANTS) and incubated for 30 min at 37°C before the addition of 5 µl of 1 M NaBH3CN (cyanoborohydride) in DMSO (dimethylsulfoxide). Incubation at 37°C was further continued for ∼4–5 h. Samples were then dried under vacuum before being suspended. The labeled oligosaccharides were either suspended at ∼2–2.5 µg µl−1 in 25% glycerol (w/v) or, if used as a substrate for enzymatic hydrolysis, at ∼3.5 µg µl−1 in 50 mM MES (pH 6.5).

Conditions used for the hydrolysis of oligosaccharides were as follows: 50 µg of oligosaccharides (2 µg µl−1 non-labeled and ∼3.5 µg µl−1 labeled) in 50 mM MES (pH 6.5) were incubated overnight at 30°C with 4 µl of 100 nM ZgEngAGH5_4. Hydrolysis of β-d-glucan (450 µg in 50 mM MES pH 6.5) was performed overnight at 30°C using either 10 µl of 100 nM ZgEngAGH5_4 or 10 µl of lichenase (0.18 U mg−1, 0.7 U ml−1; Megazyme). After incubation, enzymes were inactivated for 10 min at 100°C. For each reaction, a blank was made under the same conditions, except that the enzyme was first inactivated for 10 min at 100°C prior to the incubation with the poly- or oligosaccharides.

About 8–10 µg of labeled oligo- and ∼12.5 µg of labeled β-d-glucan [both in 25% glycerol (w/v) final concentration] were loaded on a chilled 27% polyacrylamide gel. The electrophoresis was performed in the dark at 125 V (constant voltage), 4°C, using chilled migration buffer (25 mM Tris, 192 mM glycine, pH 8.5). Visualization of the fluorescent oligosaccharides was achieved under UV using a UV Transiluminator (Thermofisher Scientific Bioblock).

Optimal pH determination of ZgEngAGH5_4

The Teorell and Stenhagen buffer (pH 4.2–8.5) [26] was used at a final concentration of 100 mM to evaluate the pH optimum. Both enzyme and MLG were diluted in this buffer prior hydrolysis reactions which were performed as described above. The enzymatic activity was estimated using the ferricyanide assay. Results are expressed as the percentage of relative activity.

Optimal temperature determination of ZgEngAGH5_4

For this measurement, both ZgEngAGH5_4 and β-d-glucan from barley were incubated at different temperatures (from 5 to 60°C), in steps of 5°C. The produced amount of reducing sugars was determined as described above. Results are expressed as the percentage of relative activity.

Thermostability analysis

The thermostability of ZgEngAGH5_4 was studied by DLS using a Zetasizer Nano instrument (Malvern). ZgEngAGH5_4 (1.15 mg ml−1) was filtered through a 0.2 µm membrane filter prior to being heated from 5 to 65°C in steps of 1°C. The hydrodynamic gyration radius (Rg) was measured at each step and the denaturation temperature is defined as the temperature for which the gyration radius sharply increases.

Oligomerization state studies

The oligomerization state of ZgEngAGH5_4 was determined both by SEC and by SEC coupled to multiple angle laser light scattering (SEC-MALLS). For the SEC experiments, ∼600 µg of affinity chromatography purified ZgEngAGH5_4 in 1 ml of buffer B were loaded on top of a Superdex 75 HiLoad 16/60 column (GE Healthcare) previously equilibrated in buffer B. The elution was performed at a 0.7 ml min−1 flow rate. Calibration of the column was carried out in the same conditions using the appropriate calibration standards (GE Healthcare). For the experiments of SEC coupled with MALLS, 100 µl at 300 µg ml−1 of ZgEngAGH5_4 from the Superdex 75 chromatography were loaded onto a Superdex 200 Increase 10/300 GL column (GE Healthcare), previously equilibrated for at least 24 h in buffer B. Elution of the protein was performed for 80 min at a flow rate of 0.5 ml min−1 and the detection was carried out using both an Optilab rEX detector (Wyatt) and the Dawn Heleos light Scattering detector (Wyatt). Results were analyzed using the ASTRA V software (Wyatt Technology)

Crystallization and structure determination

Crystals for ZgEngAGH5_4, in solution at a concentration of 15 mg ml−1, were obtained using the hanging drop vapor diffusion method by mixing 2 µl of protein solution with 1 μl of crystallization solution composed of 14% PEG 6000, 200 mM CaCl2 and 100 mM sodium acetate buffer at pH 5.0. Crystals were cryo-protected using the crystallization buffer supplemented with 10% glycerol and flash frozen in a N2-stream at 100 K. X-ray diffraction data were collected at 1.2 Å resolution at the European Synchrotron Radiation Facilities (ESRF, Grenoble France) on beamline ID23-1. The images were integrated using XDS [27] in the space group P21. The structure was solved by molecular replacement with MolRep [28] using EngD (PDB ID: 3NDZ) as the search model. An initial model was built automatically with the CCP4 version of ARP-wARP, [29] with several cycles of manual rebuilding in Coot [30] and refinement with Refmac5 [31].

Crystals of ZgEngAGH5_4_E323S in complex with cellotriose were obtained using the same hanging drop method, in drops containing 2 µl of ZgEngAGH5_4_E323S mutant at 7 mg ml−1 mixed to 1 µl of well solution consisting in 24–24.5% PEG 3350, 160 mM MgCl2, 100 mM Bis–Tris (pH 5.5). ZgEngAGH5_4_E323S mutant was co-crystallized with 0.04% of a mixture of oligohexa- to oligononasaccharides obtained from limited digestion of MLG by ZgEngAGH5_4. Crystals were soaked in crystallization buffer supplemented with 30% glycerol before being frozen in liquid nitrogen. Data were collected at the Soleil synchrotron on beamline Proxima1 to 2.2 Å resolution. The images were integrated using XDS [27] and the space group P32. The structure of ZgEngAGH5_4_E323S in complex with the substrate was solved by molecular replacement, using the software Phaser [32] and the structure of ZgEngAGH5_4 as the search model. The structure was refined using REFMAC [31] in iterative cycles with manual corrections using the graphic interface Coot [30]. All figures representing the structures were prepared using the program PyMol (Schrödinger, LLC). The atomic co-ordinates and the atomic factors of both ZgEngAGH5_4 and ZgEngAGH5_4_E323S have been deposited at the Protein Data Bank collection (http://wwwpdb.org/) as PDB ID: 6GL2 and PDB ID: 6GL0, respectively.

Molecular dynamics simulations

Molecular dynamics (MD) simulations were performed to predict the cellulose recognition properties of native ZgEngAGH5_4 and ZgEngAGH5_4_E323S. The input starting protein structures for the calculations were generated using the experimental crystal co-ordinates, and the cellulose substrate was built into the binding pocket of the enzyme. Specifically, four different oligo-glucose, hexameric chains were tested in the simulations: a cellohexaose chain with β-1,4-linkages between all sugars, and three mixed-linked oligoglucans with the β-1,3-linkage at different positions (Supplementary Figure S2). The simulations are summarized in Supplementary Tables S2–S5 together with details of the model construction and simulation protocols. Each model was named according to the position and type of linkage present in the oligosaccharide substrate: β(1,4) for the oligosaccharide with β-1,4-linkages between all units; −1/+1 β(1,3) for that with a β-1,3 linkage between units −1 and +1; +1/+2 β(1,3) for that with a β-1,3 linkage between units +1 and +2; and +2/+3 β(1,3) for that with a β-1,3 linkage between units +2 and +3. The final co-ordinates for all trajectories can be accessed through the following link: https://1drv.ms/f/s!ArX4zU6cjMUQnHQ9m5ScGJkyY_Kr.

Results

The engA gene was acquired from an ancestral clostridial bacterium

The engA gene (systematic ID: ZGAL_208), coding a single CAZyme module, is localized within a potential polysaccharide utilization locus (PUL_4) [13] (Figure 1). Besides engA, PUL_4 also includes a gene encoding a lipoprotein of unknown function (ZGAL_209) displaying a C-terminal carbohydrate-binding module (CBM4) and two tandem susD/SusC-like pairs (ZGAL_211/212 and ZGAL_213/214) (Figure 1). PUL_4 was previously identified as strongly induced by β-1,3-glucans [33]. In the context of the development of a new screening method for carbohydrate-related proteins, the susD-like protein ZGAL_213 was shown to specifically bind xyloglucans [34]. Altogether, these transcriptomic and biochemical results suggest that PUL_4, and thus likely the engA-encoded protein, which will be named ZgEngAGH5_4, could be involved in the degradation of hemicelluloses.

Gene composition of the PUL_4 from Zobellia galactanivorans.

Figure 1.
Gene composition of the PUL_4 from Zobellia galactanivorans.

The gene encoding the ZgEngAGH5_4 is colored in green; the other genes are colored in blue. Abbreviations: CBM4, family 4 of carbohydrate-binding modules; TBDT, TonB-dependent transporter.

Figure 1.
Gene composition of the PUL_4 from Zobellia galactanivorans.

The gene encoding the ZgEngAGH5_4 is colored in green; the other genes are colored in blue. Abbreviations: CBM4, family 4 of carbohydrate-binding modules; TBDT, TonB-dependent transporter.

Homology searches in the GenBank database indicate that close homologues of ZgEngAGH5_4 are relatively rare in other marine flavobacteria. Unexpectedly, this protein is highly similar to numerous subfamily GH5_4 β-glucanases from Firmicutes (e.g. 51% sequence identity with the cellulase EngD from C. cellulovorans [17]). A phylogenetic analysis of the GH5_4 subfamily indicates that ZgEngAGH5_4 belongs to a clade only composed of marine flavobacterial proteins. This late-diverging clade is rooted by two successive clades of GH5_4 proteins from Firmicutes (Clostridia class) (Figure 2). Therefore, the paucity of ZgEngAGH5_4 homologues in marine flavobacteria and their phylogenetic position support that marine flavobacteria have horizontally acquired these GH5_4 genes from an ancestral clostridial bacterium.

Phylogenetic tree of ZgEngAGH5_4 homologues.

Figure 2.
Phylogenetic tree of ZgEngAGH5_4 homologues.

The phylogenetic tree was derived using the maximum-likelihood approach with the program MEGA6 [19]. Numbers indicate the bootstrap values in the maximum-likelihood analysis. The sequence marked by a brown diamond corresponds to ZgEngAGH5_4. The characterized enzymes are indicated by a black dot (biochemically characterized) or a black triangle (biochemically and structurally characterized). For these latter enzymes, the PDB code is indicated after the protein name. On the right, clades are delimited by brackets and their taxonomic affiliations are indicated. The sequences used are listed in Supplementary Table S25.

Figure 2.
Phylogenetic tree of ZgEngAGH5_4 homologues.

The phylogenetic tree was derived using the maximum-likelihood approach with the program MEGA6 [19]. Numbers indicate the bootstrap values in the maximum-likelihood analysis. The sequence marked by a brown diamond corresponds to ZgEngAGH5_4. The characterized enzymes are indicated by a black dot (biochemically characterized) or a black triangle (biochemically and structurally characterized). For these latter enzymes, the PDB code is indicated after the protein name. On the right, clades are delimited by brackets and their taxonomic affiliations are indicated. The sequences used are listed in Supplementary Table S25.

ZgEngAGH5_4 is a β-glucanase with broad substrate specificity

In the genome of Z. galactanivorans, engA was annotated as encoding for an endoglucanase, referred here to as ZgEngAGH5_4. To verify the prediction of this activity, the nucleotide sequence corresponding to the catalytic module was cloned into a pFO4 plasmid. The protein was produced with a yield of ∼130 mg l−1 in E. coli BL21(DE3) strain using an auto-inducible medium. The protein was purified to electrophoretic homogeneity by nickel affinity chromatography (Supplementary Figure S3). The purity of the enzyme was further confirmed by DLS (Supplementary Figure S3). Both SEC and SEC-MALLS were used to verify that ZgEngAGH5_4 is a monomer in solution (Supplementary Figure S3). SEC-MALLS additionally showed that the enzyme is characterized by a molecular mass of 36.5 kDa. This value is slightly lower than the theoretical molecular mass of 37.5 kDa deduced from the amino acid sequence of the recombinant protein using the ProtParam tool [35] (Supplementary Figure S1).

DLS was also used to study the thermostability of ZgEngAGH5_4 (Supplementary Figure S4A). Indeed, in the range of temperatures from 5 to 37°C, the protein is characterized by a hydrodynamic radius of gyration of 5.59 ± 0.12 nm. This value increases when temperature reaches 40°C and is almost doubled at a temperature of 44°C, indicating the enzymatic denaturation which then further increases with temperature.

The ferricyanide reducing sugar assay was used to screen for the hydrolytic activity of the enzyme on several soluble β-glucans such as the carboxymethyl cellulose, lichenan, β-d-glucan from barley, xyloglucan and konjac glucomannan. Activity was also screened on β-1,3-glucans such as laminarin, a β-1,3-glucan from Euglena gracilis and carboxymethyl curdlan (a bacterial exopolysaccharide), but revealed to be non-detectable (Table 1). Although the enzyme is active on soluble cellulose derivatives, its activity towards this substrate is very low, compared with its activity towards mixed-linked β-(1,4-1,3)-glucans, such as β-d-glucan from barley and lichenan from Iceland moss. ZgEngAGH5_4 is also able to degrade substituted β-1,4-glycosides such as glucomannan and, to some extent, xyloglucan (Table 1). Taken together, these results classify this enzyme as a β-(1,3-1,4)-endoglucanase.

Table 1
Activity of ZgEngAGH5_4 on different polysaccharides
Substrate ZgEngAGH5_4 Lichenas GH16 EngD F32EG5 PbGH5A 
Specific activity (µkat mg−1)1 Relative activity (%) Relative activity (%) Relative activity (%) Relative activity (%) Relative activity (%) 
Mixed-linkage glucan 5.1 ± 0.3 100 100 100 100 100 
Lichenan 1.5 ± 0.1 29.4 37.3  52.2  
Glucomannan 3.1 ± 0.0 60.8 n.d. 73.8  9.2 
Xyloglucan 0.3 ± 0.0 5.9 n.d. 85.7  19.5 
CMC 0.2 ± 0.0 3.9 n.d. 35.7 17.3 5.5 
Avicel n.d. n.d. n.d.  <0.02  
Laminarin n.d. n.d. n.d.  N.D.3  
Curdlan n.d. n.d. n.d.    
Substrate ZgEngAGH5_4 Lichenas GH16 EngD F32EG5 PbGH5A 
Specific activity (µkat mg−1)1 Relative activity (%) Relative activity (%) Relative activity (%) Relative activity (%) Relative activity (%) 
Mixed-linkage glucan 5.1 ± 0.3 100 100 100 100 100 
Lichenan 1.5 ± 0.1 29.4 37.3  52.2  
Glucomannan 3.1 ± 0.0 60.8 n.d. 73.8  9.2 
Xyloglucan 0.3 ± 0.0 5.9 n.d. 85.7  19.5 
CMC 0.2 ± 0.0 3.9 n.d. 35.7 17.3 5.5 
Avicel n.d. n.d. n.d.  <0.02  
Laminarin n.d. n.d. n.d.  N.D.3  
Curdlan n.d. n.d. n.d.    

For comparative purpose, the relative activity of ZgEngAGH5_4 is compared with the relative activity of the commercial lichenase (Megazyme) and three GH5_4 enzymes characterized at the 3D structure level. Results for EngD C. cellulovorans [38], Caldicellulosiruptor sp. F32 [10] and PbGH5 from Prevotella bryantii [11] were calculated based on published enzymatic activities.

1

Experiments were performed in triplicate. Results are expressed as average ± SD.

2n.d., not determinable, less than the limit of detection, i.e. 0.01 (µkat mg−1).

3

N.D., activity not detected according to the mentioned reference.

Prior to the determination of which linkage and which minimal substrate the enzyme is able to hydrolyze, an evaluation of its optimal conditions was carried out. Using β-d-glucan (MLG) from barley as a substrate, the universal buffer of Teorell and Stenhagen was used to study the pH dependence of the activity of ZgEngAGH5_4. The enzyme shows activity between pH 5.0 and 8.5, but the optimum of activity is observed at pH 6.0–6.5 (Supplementary Figure S4B). At pH 5.5, the enzyme loses ∼40% of its activity, as it does at pH values above 8. Similar results have been observed using biological buffers such as MES, MOPS (3-morpholino-1-propanesulfonic acid), phosphate and Tris, further showing that the activity of the enzyme in the MES buffer is higher than in the other ones (Supplementary Figure S4B).

β-d-Glucan from barley was also used as a substrate to study the influence of the temperature on the enzyme activity. In this case, the substrate was first thermostated at different temperatures, between 5 and 60°C, prior to the enzymatic reaction. As shown in Supplementary Figure S4C, the enzyme displays an optimal activity at 45°C. At 50°C, the enzyme loses almost half of its activity, which drops to only ∼10% at 60°C. To avoid denaturation, all the subsequent enzymatic reactions were, however, performed at 30°C, a compromise temperature between enzymatic activity and stability. The influence of NaCl was evaluated at different concentrations, up to 1 M, and seems to have no significant effect on the enzymatic activity.

ZgEngAGH5_4 is able to cleave both β-1,3 and β-1,4 linkages

To establish which linkages are cleaved by ZgEngAGH5_4, different standard β-1,4 and β-(1,4-1,3)-oligosaccharides were used as substrates. The hydrolysis products were identified by HPAEC using a CarboPac PA1 column, specifically dedicated to the separation of small oligosaccharides. From these experiments, it appears that, even when the reaction lasts overnight, ZgEngAGH5_4 is unable to hydrolyze di- and trisaccharides, whether they originate from cellulose or MLG polymers. A minimum of four glucose units (G4) is therefore essential for the activity of the enzyme.

In a first step, the nature and the concentration of different products released during hydrolysis of cello- oligosaccharides by ZgEngAGH5_4 were measured as a function of time (Figure 3). While after 60 min, 20% of cellotetraose (G4) remain to be hydrolyzed (Figure 3A), hydrolysis of cellopentaose and cellohexaose are a lot faster as they both are completely hydrolyzed within 1 and 2 min, respectively (Figure 3B,C). Hydrolysis of cellopentaose (G5) is straightforward and produces only cellobiose (G2) and cellotriose (G3) (Figure 3B). Hydrolysis of cellohexaose (G6) proceeds in two steps, since both cellotetraose (G4) and cellotriose (G3) are produced within the first 2 min; however, as hydrolysis proceeds, cellotetraose (G4) is further hydrolyzed into cellobiose (G2) (Figure 3C).

Substrate specificity of ZgEngAGH5_4 studied by HPAEC.

Figure 3.
Substrate specificity of ZgEngAGH5_4 studied by HPAEC.

Hydrolysis of cellotetraose (A), cellopentaose (B), cellohexaose (C), tetraose B (GGG3G; G4B) (D) and tetraose C (GG3GG; G4C) (E) from the mixed-linked glucan lineage with 0.5 µM ZgEngAGH5_4. Hydrolysis was performed as a function of time at 30°C. Aliquots of the reaction mixture were withdrawn periodically and analyzed by HPAEC-PAD on a CarboPAc-PA1 column. The oligosaccharides produced were identified and quantified via a standardization of the column performed with the different commercially available oligosaccharides used at different concentrations.

Figure 3.
Substrate specificity of ZgEngAGH5_4 studied by HPAEC.

Hydrolysis of cellotetraose (A), cellopentaose (B), cellohexaose (C), tetraose B (GGG3G; G4B) (D) and tetraose C (GG3GG; G4C) (E) from the mixed-linked glucan lineage with 0.5 µM ZgEngAGH5_4. Hydrolysis was performed as a function of time at 30°C. Aliquots of the reaction mixture were withdrawn periodically and analyzed by HPAEC-PAD on a CarboPAc-PA1 column. The oligosaccharides produced were identified and quantified via a standardization of the column performed with the different commercially available oligosaccharides used at different concentrations.

In a next step, and to evaluate the ability of ZgEngAGH5_4 to hydrolyze β-1,3 linkages, two β-(1,4-1,3)-tetrasaccharides, namely GGG3G and GG3GG, differing from each other by the position of the β-1,3 linkage, were used as substrates. In GGG3G, the β-1,3-linkage is at the reducing end, while it is flanked by a β-1,4-linkage on both sides in GG3GG. Hydrolysis of GGG3G (G4B) yielded glucose (G1), cellobiose (G2), cellotriose (G3) and laminaribiose (G3G; L2) (Figure 3D), while only cellobiose (G2) was produced upon hydrolysis of GG3GG (G4C) (Figure 3E). Altogether, these results indicate that ZgEngAGH5_4 is able to accommodate both β-1,4- and β-1,3-linked glucose in the +1 binding subsite, whereas only β-1,4-linkages are accepted in the negative binding subsites (Figure 4E,F). It also shows that the specificity of ZgEngAGH5_4 is dictated by the position of the β-1,3-linkages.

Substrate specificity of ZgEngAGH5_4.

Figure 4.
Substrate specificity of ZgEngAGH5_4.

Terminal products of ZgEngAGH5_4 upon hydrolysis of standard oligocelluloses (A,B), β-d-glucan (C) or its hydrolysis products (D) and schematic representation of the oligosaccharides accommodation in the active site (E,F). In these experiments, incubations were performed overnight at 30°C using 1 µl of ZgEngAGH5_4 (100 nM) to hydrolyze 12.5 µg of oligosaccharides (A,B,D) or 45 µg of β-d-glucan from barley (C). Commercial lichenase was also used to completely hydrolyze β-d-glucan from barley and to produce oligosaccharides that were then incubated for 10 min at 100°C prior to being labeled and further hydrolyzed overnight at 30°C with ZgEngAGH5_4 (D). Commercial cello- and MLG oligosaccharides were used as references. The 100 denotes oligosaccharides incubated with inactive ZgEngAGH5_4 and the * indicates that the oligosaccharides were labeled before the enzymatic incubation with ZgEngAGH5_4 (A,D). Schematic representation of cello- (E) and MLG (F) oligosaccharides accommodation in the active site of ZgEngAGH5_4. The proposed cleavage sites deduced from HPAEC and/or FACE experiments are indicated by a gray arrow. The gray circles represent the reducing end of the oligosaccharides and the yellow circles represent the fluorophore used to label the reducing end sugar. The modes of hydrolysis observed with the FACE experiments exclusively are depicted with yellow circles and black outlines. When the mode of hydrolysis has been observed both by HPAEC and FACE, the yellow circles are outlined in gray. The arrow between the cellohexaose (G6) and the cellotetraose (G4) means that the hydrolysis product from the cellohexaose is further hydrolyzed into cellobiose (G2). Cellobiose and cellotriose are not represented as they are not hydrolyzed by ZgEngAGH5_4. (G) Hydrolysis sites of MLG by ZgEngAGH5_4 deduced from FACE experiments (see above for details). The proposed cleavage sites are indicated by gray arrows. In a comparative purpose, the GH16 lichenase cleavage sites are indicated by dotted arrows.

Figure 4.
Substrate specificity of ZgEngAGH5_4.

Terminal products of ZgEngAGH5_4 upon hydrolysis of standard oligocelluloses (A,B), β-d-glucan (C) or its hydrolysis products (D) and schematic representation of the oligosaccharides accommodation in the active site (E,F). In these experiments, incubations were performed overnight at 30°C using 1 µl of ZgEngAGH5_4 (100 nM) to hydrolyze 12.5 µg of oligosaccharides (A,B,D) or 45 µg of β-d-glucan from barley (C). Commercial lichenase was also used to completely hydrolyze β-d-glucan from barley and to produce oligosaccharides that were then incubated for 10 min at 100°C prior to being labeled and further hydrolyzed overnight at 30°C with ZgEngAGH5_4 (D). Commercial cello- and MLG oligosaccharides were used as references. The 100 denotes oligosaccharides incubated with inactive ZgEngAGH5_4 and the * indicates that the oligosaccharides were labeled before the enzymatic incubation with ZgEngAGH5_4 (A,D). Schematic representation of cello- (E) and MLG (F) oligosaccharides accommodation in the active site of ZgEngAGH5_4. The proposed cleavage sites deduced from HPAEC and/or FACE experiments are indicated by a gray arrow. The gray circles represent the reducing end of the oligosaccharides and the yellow circles represent the fluorophore used to label the reducing end sugar. The modes of hydrolysis observed with the FACE experiments exclusively are depicted with yellow circles and black outlines. When the mode of hydrolysis has been observed both by HPAEC and FACE, the yellow circles are outlined in gray. The arrow between the cellohexaose (G6) and the cellotetraose (G4) means that the hydrolysis product from the cellohexaose is further hydrolyzed into cellobiose (G2). Cellobiose and cellotriose are not represented as they are not hydrolyzed by ZgEngAGH5_4. (G) Hydrolysis sites of MLG by ZgEngAGH5_4 deduced from FACE experiments (see above for details). The proposed cleavage sites are indicated by gray arrows. In a comparative purpose, the GH16 lichenase cleavage sites are indicated by dotted arrows.

Hydrolysis of oligosaccharides, as well as of β-d-glucan from barley, was also followed by FACE (Figure 4A–D). The ANTS was used as a fluorophore to label the reducing end of the oligosaccharides, which were then separated by electrophoresis. In addition to corroborating the results obtained by HPAEC about the products formed upon hydrolysis of the oligosaccharides, this technique showed in particular that cellotetraose and cellohexaose are hydrolyzed with different modes. Indeed, depending on whether labeling was performed on the substrate or on the hydrolysis products, the end products of these oligosaccharides are different: when cellotetraose (G4) is labeled before hydrolysis (Figure 4A), the fluorescent oligosaccharides migrate as cellotriose (G3) and minor amounts of cellobiose (G2). When labeling is performed after hydrolysis (Figure 4B), the major oligosaccharide is cellobiose (G2), although there are traces of cellotriose (G3) and even of glucose (G). Similarly, when labeling cellohexaose (G6) before hydrolysis mostly cellotriose (G3) but also some cellobiose (G2) oligosaccharides are detected (Figure 4A), while when cellohexaose (G6) is first hydrolyzed and then labeled, both cellobiose (G2) and cellotriose (G3) are detected at the same intensity (Figure 4B).

Comparison of the end products generated from the hydrolysis of β-d-glucan with the lichenase and with ZgEngAGH5_4 reveals that the size of the products are similar, but the cleavage sites of the enzymes are different (Figure 4C). As expected, GG3G and GGG3G are the end products of the MLG hydrolysis by lichenase, while G2, G3, G4 and, to some extent, G5 are the end products observed after hydrolysis with ZgEngAGH5_4. These oligosaccharides, as well as the complete absence of oligosaccharides with β-1,3-bonds such as GG3G, GGG3G, G3GG and GG3GG, attest therefore that, at the polymer level, the preferred cleavage site of the ZgEngAGH5_4 are β-1,3-bonds that are neighbored by β-1,4-bonds, at least towards the non-reducing end (Figure 4G).

Further hydrolysis overnight with ZgEngAGH5_4 of the labeled oligosaccharides produced by the lichenase shows that GG3G is not hydrolyzed while the complete hydrolysis of GGG3G into laminaribiose (G3G) (Figure 4D) is attained, again attesting thereby that, on small oligosaccharides, ZgEngAGH5_4 is able to cleave the β-1,4-bond that precedes a β-1,3-bond (Figure 4F). Altogether, these experiments allow deducing the subsites and their involvement in substrate binding, and they also show that the +1 or +2 binding subsites do not tolerate/accommodate the fluorophore (Figure 4E,F). Further interpretation of these results is described in the discussion below.

Three-dimensional structure of ZgEngAGH5_4

To determine the molecular basis of substrate recognition by ZgEngAGH5_4, we solved the crystal structure of ZgEngAGH5_4 wild-type and the E323S mutant (ZgEngAGH5_4_E323S) in complex with the cellotriose (three glucose units linked by β-1,4-bonds) (Table 2). The structure of ZgEngAGH5_4 was solved at 1.2 Å resolution by molecular replacement using the structure of EngD (PDB ID: 3NDZ, 51% sequence identity, Figure 5) as a search model. There is only one molecule in the asymmetric unit. The ZgEngAGH5_4 adopts a typical TIM-barrel (β/α)8 fold. An additional helix (α0) closes the β-barrel at its N-terminal face, consistent with other GH5 enzymes (Figure 6A). Structural similarity searches using the DALI server [36] identified close relationship to other GH5 enzymes. The closest ones were the structure of endoglucanase E from Ruminiclostridium thermocellum (PDB ID: 4IM4) and of endoglucanase D from C. cellulovorans (PDB ID: 3NDZ). Both are GH5 enzymes that exhibit broad substrate specificity, preferentially displaying high activity on β-1,4-linked glucans and xylans.

Sequence alignment of ZgEngAGH5_4 with structurally characterized GH5_4.

Figure 5.
Sequence alignment of ZgEngAGH5_4 with structurally characterized GH5_4.

The sequence alignment has been performed using MAFFT [18] and has been manually edited in Bioedit (©Tom Hall) based on the superimposition of the different crystal structures. The final figure has been created with using ESPript [43]. The sequences used in this alignment were as follows: CcEngD: the endo β-1,4-glucanase/xylanase EngD from Clostridium cellulovorans (GenBank accession no. AAA23233.1; residues 32–376; PDB ID: 3NDY); PbGH5A: the mixed-linkage β-glucanase/xyloglucanase from Prevotella bryantii B14 (GenBank accession no. AAC97596.1, residues 584–924, PDB ID: 3VDH); F32EG5: the β-(1.3–1.4) glucanase from Caldicellulosiruptor sp. (GenBank accession no. AGM71677.1, residues 38–401, PDB ID: 4XOV), BpCel5C: Cel5C from Butyvibrio proteoclasticus (GenBank accession no. ADL34447.1, residues 32–399, PDB code: 4NF7) and CcCel5A: the cellulase Cel5A from Clostridum cellulolyticum (GenBank accession no. AAA23221.1; residues 40–403, PDB ID: 1EDG). The α- and 310 helices and the β-strands are represented as helices and arrows, respectively, and β-turns are marked with TT. Dark-shaded boxes enclose invariant positions, and light-shaded boxes show positions with similar residues. The catalytic residues and the residues chosen for site-directed mutagenesis are marked by red triangles and blue dots, respectively.

Figure 5.
Sequence alignment of ZgEngAGH5_4 with structurally characterized GH5_4.

The sequence alignment has been performed using MAFFT [18] and has been manually edited in Bioedit (©Tom Hall) based on the superimposition of the different crystal structures. The final figure has been created with using ESPript [43]. The sequences used in this alignment were as follows: CcEngD: the endo β-1,4-glucanase/xylanase EngD from Clostridium cellulovorans (GenBank accession no. AAA23233.1; residues 32–376; PDB ID: 3NDY); PbGH5A: the mixed-linkage β-glucanase/xyloglucanase from Prevotella bryantii B14 (GenBank accession no. AAC97596.1, residues 584–924, PDB ID: 3VDH); F32EG5: the β-(1.3–1.4) glucanase from Caldicellulosiruptor sp. (GenBank accession no. AGM71677.1, residues 38–401, PDB ID: 4XOV), BpCel5C: Cel5C from Butyvibrio proteoclasticus (GenBank accession no. ADL34447.1, residues 32–399, PDB code: 4NF7) and CcCel5A: the cellulase Cel5A from Clostridum cellulolyticum (GenBank accession no. AAA23221.1; residues 40–403, PDB ID: 1EDG). The α- and 310 helices and the β-strands are represented as helices and arrows, respectively, and β-turns are marked with TT. Dark-shaded boxes enclose invariant positions, and light-shaded boxes show positions with similar residues. The catalytic residues and the residues chosen for site-directed mutagenesis are marked by red triangles and blue dots, respectively.

Crystal structure of ZgEngAGH5_4 and the relative MLG substrate locations after molecular modeling in all-atom simulations.

Figure 6.
Crystal structure of ZgEngAGH5_4 and the relative MLG substrate locations after molecular modeling in all-atom simulations.

(A) Crystal structure of ZgEngAGH5_4. The central β-sheet constituting the TIM barrel is shown in yellow, the additional α-helix α0 in pale blue and the loop between β-strand β6 and α-helix 6 in red. Both catalytic residues are shown in sticks. (B) View of the active site of ZgEngAGH5_4_E323S. The cellotriose is shown in green and the position of the two catalytic residues is shown in gray. The experimental electron density calculated as an 2Fo − Fc map and contoured at a 2σ level is shown as gray mesh. The stereochemistry of the substrate molecule has been validated using Privateer and the details are given in Supplementary Table S26. (C) Resulting view of MD with GGGG3GG. The active site of ZgEngAGH5_4_E323S in the last frame of the all-atom simulation and the relative position of the GGGG3GG (in orange) substrate molecule are represented. The positions of the different sub-binding sites are indicated. The arrow indicates the β-1-3 linkage. (D) Resulting view of MD with GGG3GGG. The active site of ZgEngAGH5_4_E323S in the last frame of the all-atom simulation and the relative position of the GGG3GGG (in orange) substrate molecule are represented. The arrow indicates the β-1-3-linkage.

Figure 6.
Crystal structure of ZgEngAGH5_4 and the relative MLG substrate locations after molecular modeling in all-atom simulations.

(A) Crystal structure of ZgEngAGH5_4. The central β-sheet constituting the TIM barrel is shown in yellow, the additional α-helix α0 in pale blue and the loop between β-strand β6 and α-helix 6 in red. Both catalytic residues are shown in sticks. (B) View of the active site of ZgEngAGH5_4_E323S. The cellotriose is shown in green and the position of the two catalytic residues is shown in gray. The experimental electron density calculated as an 2Fo − Fc map and contoured at a 2σ level is shown as gray mesh. The stereochemistry of the substrate molecule has been validated using Privateer and the details are given in Supplementary Table S26. (C) Resulting view of MD with GGGG3GG. The active site of ZgEngAGH5_4_E323S in the last frame of the all-atom simulation and the relative position of the GGGG3GG (in orange) substrate molecule are represented. The positions of the different sub-binding sites are indicated. The arrow indicates the β-1-3 linkage. (D) Resulting view of MD with GGG3GGG. The active site of ZgEngAGH5_4_E323S in the last frame of the all-atom simulation and the relative position of the GGG3GGG (in orange) substrate molecule are represented. The arrow indicates the β-1-3-linkage.

Table 2
Data collection and refinement statistics
 ZgEngAGH5_4 ZgEngAGH5_4_E323S 
Data collection 
Space group P21 P32 
 Cell dimensions 
  a, b, c (Å) 55.49, 48.46, 59.26 84.57, 84.57, 117.66 
   α, β, γ (°) 90.00, 104.35, 90.00 90.00, 90.00, 120.00 
 Resolution (Å) 57.41–1.18 (1.24–1.18) 45.87–2.20 (2.70–2.20) 
Rmerge 0.07 (0.74) 0.06 (0.31) 
I/σI 10.4 (1.8) 12.9 (3.2) 
CC(1/2) 0.99 (0.43) 1.00 (0.91) 
 Completeness (%) 98.0 (98.0) 99.3 (99.6) 
 Redundancy 6.3 (6.3) 2.9 (2.9) 
Refinement 
 Resolution (Å) 57.41–1.18 45.87–2.20 
 No. reflections 94 103 45 421 
Rwork/Rfree 0.173/0.201 0.177/0.221 
 No. atoms 
  Protein 2750 8066 
  Water 457 453 
 B-factors 
 Protein 13.67 39.92 
  Water 24.15 34.79 
R.m.s. deviations 
  Bond lengths (Å) 0.023 0.013 
  Bond angles (°) 2.13 1.49 
PDB ID 6GL2 6GL0 
 ZgEngAGH5_4 ZgEngAGH5_4_E323S 
Data collection 
Space group P21 P32 
 Cell dimensions 
  a, b, c (Å) 55.49, 48.46, 59.26 84.57, 84.57, 117.66 
   α, β, γ (°) 90.00, 104.35, 90.00 90.00, 90.00, 120.00 
 Resolution (Å) 57.41–1.18 (1.24–1.18) 45.87–2.20 (2.70–2.20) 
Rmerge 0.07 (0.74) 0.06 (0.31) 
I/σI 10.4 (1.8) 12.9 (3.2) 
CC(1/2) 0.99 (0.43) 1.00 (0.91) 
 Completeness (%) 98.0 (98.0) 99.3 (99.6) 
 Redundancy 6.3 (6.3) 2.9 (2.9) 
Refinement 
 Resolution (Å) 57.41–1.18 45.87–2.20 
 No. reflections 94 103 45 421 
Rwork/Rfree 0.173/0.201 0.177/0.221 
 No. atoms 
  Protein 2750 8066 
  Water 457 453 
 B-factors 
 Protein 13.67 39.92 
  Water 24.15 34.79 
R.m.s. deviations 
  Bond lengths (Å) 0.023 0.013 
  Bond angles (°) 2.13 1.49 
PDB ID 6GL2 6GL0 

Note: Single crystal was used for each data set; Values in parentheses are for highest-resolution shell.

Like other GH5 enzymes, the active site is formed by a catalytic cleft, which runs across the whole protein, where specific binding subsites recognize each glucose unit. Two glutamic acid residues (E200 and E323, in ZgEngAGH5_4) correspond to the catalytic acid–base and nucleophile, respectively, and are positioned between the −1 and +1 sub-binding sites (Figures 6B and 7A). Consistently with all other TIM-barrel hydrolases, these residues are located at the end of β-strands β4 and β7 (Figure 5).

Superimposition of GH5_4 active sites.

Figure 7.
Superimposition of GH5_4 active sites.

(A) Superimposition of the catalytic active sites of ZgEngAGH5_4 (in yellow) with those of F32EG5 (PDB ID: 4XOV in dark blue), of CcEngD in complex with cellotriose (PDB ID: 3NDZ in cyan and cellotriose in green) and that of PbGH5A (PDB ID: 3VDH in light gray). The highly conserved residues surrounding the −1 binding subsite are shown, highlighting the two major features that are different in the sugar-binding subsites of ZgEngAGH5_4, namely E363 and Y280. (B) Superimposition of the structure of ZgEngAGH5_4_E323S (in blue) and the structure of F32EG5 from Caldicellulosiruptor sp. F32 (in purple) showing the conformation of the loop harboring Y280 between β-strand β6 and α-helix 6. The GGG cellotriose molecule in the crystal structure of ZgEngAGH5_4_E323S is shown in green. The residue numbers for ZgEngAGH5_4_E323S are underlined.

Figure 7.
Superimposition of GH5_4 active sites.

(A) Superimposition of the catalytic active sites of ZgEngAGH5_4 (in yellow) with those of F32EG5 (PDB ID: 4XOV in dark blue), of CcEngD in complex with cellotriose (PDB ID: 3NDZ in cyan and cellotriose in green) and that of PbGH5A (PDB ID: 3VDH in light gray). The highly conserved residues surrounding the −1 binding subsite are shown, highlighting the two major features that are different in the sugar-binding subsites of ZgEngAGH5_4, namely E363 and Y280. (B) Superimposition of the structure of ZgEngAGH5_4_E323S (in blue) and the structure of F32EG5 from Caldicellulosiruptor sp. F32 (in purple) showing the conformation of the loop harboring Y280 between β-strand β6 and α-helix 6. The GGG cellotriose molecule in the crystal structure of ZgEngAGH5_4_E323S is shown in green. The residue numbers for ZgEngAGH5_4_E323S are underlined.

The co-crystallization of ZgEngAGH5_4_E323S with a mixture of oligosaccharides (mainly hexa- to nonasaccharides, all produced by the native enzyme upon hydrolysis of MLG) resulted in the complex structure solved at 2.2 Å resolution, with three molecules in the asymmetric unit. A clear electron density corresponding to a cellotriose (G3) oligosaccharide, linked by β-1,4-bonds only, is present in the active site of each of the three monomers (Figure 6B). The presence of this substrate molecule could either be due to a contamination of our oligosaccharide mixture by cellotriose, which are preferentially selected by ZgEngAGH5_4_E323S, or additional units at the non-reducing end are completely disordered in the crystal structure. These substrate molecules (further on named cellotriose or G3) occupy the negative binding subsites from −3 to −1. The glucose unit bound to the −3 subsite establishes a stacking interaction with W89. In the −2 subsite, N77 and N358 are involved in substrate binding via hydrogen bonds. The glucose unit bound to the −1 subsite is the most stabilized one, stacked against W356, and hydrogen bonded to H155, H156, Y277 and E200 (Figure 6B).

A particular feature in ZgEngAGH5_4 is the loop following the β-strand β8 that is shorter by four residues when compared with CcEngD (PDB ID: 3NDZ) or CcCel5A (PDB ID: 1EDG). This feature creates a more open active site at the non-reducing end (negative binding subsites), which could accommodate branched substrates (Figure 7B). Indeed, in the above mentioned other GH5 enzymes, this loop binds the glucose unit occupying the −3 subsite, by forming hydrogen bonds between an Asp or Glu residue and the O6 of this glucose unit. Here, the residue E363 is located too far to interact with the substrate (Figure 7A).

Another outstanding feature is the conformation of residue Y280. Indeed, the loop between β-strand β6 α-helix α6, carrying this residue, has a completely different conformation than in other GH5 enzymes (Figure 7B). First, the presence of T287 directed towards the short α6′ helix can be noted, whereas in other GH5, this threonine is substituted by a short residue which points to the solvent. This feature forces D285 to adopt a different conformation compared with all other GH5 enzymes. To avoid a steric clash with T287 or D285, the neighboring Y280 is orientated in the opposite direction compared with tyrosine residues at this position of other GH5 enzymes. The change of conformation of this residue is also possible by the presence of Q281, instead of an aromatic residue at this position in most of the other GH5 enzymes, which would clash with Y280 in this orientation. Overall, the presence of Y280 that changes the loop position also leads to a narrowing of the binding cleft on the positive binding subsites (Supplementary Figure S22A). Notably, when replacing Y280 by alanine (Supplementary Figure S22B) by computational mutation, the overall substrate binding cleft resembles closely that of F32EG5 (Supplementary Figure S22C).

Site-directed mutagenesis of selected residues and molecular modeling to explore the catalytic active site

To investigate the role of a selection of residues in the active site, we undertook site-directed mutagenesis experiments. Based on the 3D structure analyses, we chose to mutate residues that potentially interact with different polysaccharide substrates outside the −1 sub-binding site, since the importance of residues surrounding the −1 sub-binding site in substrate recognition and catalytic activity has already been demonstrated [10,37]. We also included two residues, Y82 and E363, which are not directly involved in interaction with a linear polysaccharide but that could accommodate branching in substrates, such as xyloglucan or glucomannan. However, mutation of both of these residues does not affect the activity, even on branched substrate (Table 3). On the other hand, replacement of the residues N77, H156, W210 or N358 by alanine substantially decreases or even completely abolishes the catalytic activity. These residues interact with glucose units bound to the −2, −1 and +1 subsites in the model obtained by MD, respectively (Figure 6B–D). Mutation of W210 to phenylalanine partially restores the activity (∼40% of activity when compared with ZgEngAGH5_4WT), which supports the fact that this residue establishes van der Waals contacts with the glucose unit positioned in the +1 subsite. Surprisingly, mutations of Y280 and K211, which are thought to interact with glucose units bound to +2 and +3 subsites, respectively, did not decrease the activity.

Table 3
Comparison of the activity of ZgEngAGH5_4 and its mutants using MLG and glucomannan as substrates
Substrate MLG (barley) Glucomannan 
mutants Activity5 × 103 (min−1Specific activity (µkat mg−1Relative activity (% WT activity) Activity1 × 103 (min−1Specific activity (µkat mg−1Relative activity (% WT activity) 
Wild type (WT) 11.4 ± 0.7 5.1 ± 0.3 100 6.9 ± 0.1 3.1 ± 0.0 100 
N77A 0.2 ± 0.0 0.1 ± 0.0 1.8 0.1 ± 0.0 0.04 ± 0.0 1.4 
N77Q n.d.6 n.d.6 – n.d.6 n.d.6 – 
Y82A 14.5 ± 0.5 6.5 ± 0.2 127.2 7.6 ± 0.1 3.4 ± 0.0 110.1 
Y82L 15.6 ± 0.2 6.9 ± 0.1 136.8 6.7 ± 0.2 3.0 ± 0.1 97.1 
H156A 2.4 ± 0.1 1.1 ± 0.1 21 1.0 ± 0.1 0.4 ± 0.0 14.5 
H156I 0.4 ± 0.0 0.2 ± 0.0 3.5 0.1 ± 0.0 0.04 ± 0.0 1.4 
W210A 2.3 ± 0.1 1.0 ± 0.1 20.2 0.1 ± 0.0 0.04 ± 0.0 1.4 
W210F 3.6 ± 0.4 1.6 ± 0.2 31.6 3.3 ± 0.2 1.4 ± 0.1 47.8 
K211A 12.7 ± 0.1 5.7 ± 0.0 111.4 8.1 ± 0.3 3.6 ± 0.1 117.4 
Y280A 12.7 ± 0.1 5.6 ± 0.0 111.4 7.9 ± 0.2 3.5 ± 0.1 114.5 
Y280L 10.0 ± 0.4 4.4 ± 0.2 87.7 5.7 ± 0.1 2.5 ± 0.1 82.6 
E323S n.d.6 n.d.6 – n.d.6 n.d.6 – 
N358A 1.8 ± 0.1 0.8 ± 0.0 15.7 0.6 ± 0.1 0.2 ± 0.1 8.7 
N358L n.d.6 n.d.6 – n.d.6 n.d.6 – 
E363A 12.9 ± 0.5 5.7 ± 0.2 113.1 6.9 ± 0.1 3.1 ± 0.1 100 
E363S 10.1 ± 1.2 4.5 ± 0.6 88.6 7.7 ± 0.3 3.4 ± 0.1 111.6 
Substrate MLG (barley) Glucomannan 
mutants Activity5 × 103 (min−1Specific activity (µkat mg−1Relative activity (% WT activity) Activity1 × 103 (min−1Specific activity (µkat mg−1Relative activity (% WT activity) 
Wild type (WT) 11.4 ± 0.7 5.1 ± 0.3 100 6.9 ± 0.1 3.1 ± 0.0 100 
N77A 0.2 ± 0.0 0.1 ± 0.0 1.8 0.1 ± 0.0 0.04 ± 0.0 1.4 
N77Q n.d.6 n.d.6 – n.d.6 n.d.6 – 
Y82A 14.5 ± 0.5 6.5 ± 0.2 127.2 7.6 ± 0.1 3.4 ± 0.0 110.1 
Y82L 15.6 ± 0.2 6.9 ± 0.1 136.8 6.7 ± 0.2 3.0 ± 0.1 97.1 
H156A 2.4 ± 0.1 1.1 ± 0.1 21 1.0 ± 0.1 0.4 ± 0.0 14.5 
H156I 0.4 ± 0.0 0.2 ± 0.0 3.5 0.1 ± 0.0 0.04 ± 0.0 1.4 
W210A 2.3 ± 0.1 1.0 ± 0.1 20.2 0.1 ± 0.0 0.04 ± 0.0 1.4 
W210F 3.6 ± 0.4 1.6 ± 0.2 31.6 3.3 ± 0.2 1.4 ± 0.1 47.8 
K211A 12.7 ± 0.1 5.7 ± 0.0 111.4 8.1 ± 0.3 3.6 ± 0.1 117.4 
Y280A 12.7 ± 0.1 5.6 ± 0.0 111.4 7.9 ± 0.2 3.5 ± 0.1 114.5 
Y280L 10.0 ± 0.4 4.4 ± 0.2 87.7 5.7 ± 0.1 2.5 ± 0.1 82.6 
E323S n.d.6 n.d.6 – n.d.6 n.d.6 – 
N358A 1.8 ± 0.1 0.8 ± 0.0 15.7 0.6 ± 0.1 0.2 ± 0.1 8.7 
N358L n.d.6 n.d.6 – n.d.6 n.d.6 – 
E363A 12.9 ± 0.5 5.7 ± 0.2 113.1 6.9 ± 0.1 3.1 ± 0.1 100 
E363S 10.1 ± 1.2 4.5 ± 0.6 88.6 7.7 ± 0.3 3.4 ± 0.1 111.6 
1

Experiments were performed in triplicate. Results are expressed as average ±SD.

2

n.d., not determinable, less than the limit of detection, i.e. 10 (min−1) or 0.01 (µkat mg−1).

Computed protein structure and molecular dynamics

We used atomic resolution MD computer simulations to model the binding of the full range of putative hexaose ligands (poorly resolved in the crystal structures) to wild-type ZgEngAGH5_4 and the mutant ZgEngAGH5_4_E323S endoglucanases. The ligand structures are described in Experimental. Both native and mutated structures show preservation of the protein secondary structure throughout the few-hundred nanosecond simulations (Supplementary Figures S5–S8 and S9–S12), even in cases where the glucan substrate leaves the binding pocket. The computed root-mean-square deviations (RMSD) of protein backbone non-hydrogen atoms in both mutant and native ZgEngAGH5_4 (Supplementary Figures S13 and S14) were within 0.2–0.3 nm, indicating a stable protein structure throughout the simulations. Calculated root-mean-square fluctuations (Supplementary Figures S15 and S16) show the steric freedom of the more flexible and loose parts of the crystal structure such as turns and loops (residues 85–90, 125–126, 162–164 and 207–211).

Computed substrate dynamics: glucan in the binding site

The −1/+1 (β-1,3) oligosaccharide with a β-1,3-linkage between units −1 and +1 (see Experimental) and the +1/+2 (β-1,3) glucan remain bound in five and three out of eight repeats, respectively. The −1/+1 (β-1,3) and +1/+2 (β-1,3) glucans remain bound in five and three out of eight repeats, respectively. Computed glucan RMSD values (Supplementary Figures S17 and S18) show that −1/+1 (β-1,3) forms a stable binding interaction with both the native (RMSD 0.24 ± 0.04 nm) and mutated ZgEngAGH5_4_E323S (RMSD 0.26 ± 0.04 nm). The next most strongly bound ligand was +1/+2 (β-1,3), followed by +2/+3 (β-1,3). The (β-1,4) glucan either dissociates (four out of eight repeats) or else forms a loose complex with both the native and mutated enzyme (four out of eight repeats) with high glucan RMSD values of up to 0.44 ± 0.14 nm. In all other simulations, we observe either loose unstable binding of a substrate or dissociation into solution, and we did not include these dissociated structures in the analysis of binding energetics below.

Computed sugar: protein interactions

The number of hydrogen bonds forming between the protein and substrate was monitored over time (Supplementary Tables S6–S19) to identify protein residues contributing strongly to glucan binding (Supplementary Figures S19 and S20). Eight hydrogen bonds (Supplementary Tables S20 and S21) stabilize the glucan in both native and mutated enzyme-binding pockets, namely N77, E200, T253, H275, Y277, W356, N358 and E363.

The −1/+1 (β-1,3) glucan exhibits the most favorable affinity for both wild-type and mutated ZgEngAGH5_4_E323S, as it stays strongly bound to the protein by six or more hydrogen bonds (Supplementary Tables S20 and S21) in 80% of the simulations. The computed MD structures in Figure 6C,D show that T253 and Y277 stabilize the sugar unit bound at the +3 subsite, W210 and E209 form H-bonds with +2, H275 and Y280 bind to +1, E200 binds to −1, and N358, W356, Y82 and N77 bind to the −2 subsite. Aromatic residues also contribute to carbohydrate recognition and orientation (Supplementary Tables S22 and S23, and Figure 6C,D). Eight aromatic residues Y82, H155, W210, H275, Y277, Y280, W356 and F364 (Figure 6C,D) interact with the substrate as it hydrogen bonds with adjacent polar and charged residues.

Computed binding energies (Supplementary Table S24) show significantly stronger time-averaged substrate binding to mutated ZgEngAGH5_4_E323S than wild type (−38.2 ± 10.6 vs. −22.5 ± 8.5 kcal mol−1). The −1/+1 (β-1,3) glucan showed the strongest binding energy (−42.7 ± 9.0 kcal mol−1), consistent with its low RMSD (Supplementary Figures S17 and S18) and extensive H-bonding (Supplementary Figures S19 and S20). In contrast, (β-1,4) showed the weakest binding energies, reflecting its poor fit to the ZgEngAGH5_4 active site pocket. In the most stable binding trajectories, the glucose chain is stabilized by H-bonding to approximately six polar and charged residues and makes close contacts with adjacent aromatic residues. The ‘S-shaped’ binding pocket better fits the natural conformation of the −1/+1(β-1,3)-linked glucan than the linear all (β-1,4) ligand.

Other insights from molecular modeling

In the last frame of two simulations between ZgEngAGH5_4_E323S and GGGG3GG, the substrate is correctly positioned in the catalytic cleft. Then, the glucose in the +1 binding subsite interacts by stacking with W210. This interaction seems to be of high importance to position the substrate in such way to enable catalysis. The β-1,3-linkage induces a turn, which allows stacking interaction between glucose in +2 and Y280. However, this interaction seems to be more labile since it is present in only one model out of three. The glucose in +3 seems to have more degrees of liberty, and it establishes only weak contact with K211 and S252 (Figure 6C).

In the simulations between ZgEngAGH5_4_E323S and GGG3GGG, the turn induced by the β-1,3 does not affect the stacking with W210, which is in a flexible loop. Indeed, in all simulations, it adapts its position to interact with glucose in +1. Y280 does not establish stacking contact with the glucose in +2 but it interacts with the glucose unit in +3 via a hydrogen bond (Figure 6D)

Fixation of the whole β-1,4 substrate or GGGGG3G seems to be weaker, as only W210 interacts with the substrate in the positive binding subsites.

Accession numbers

The atomic co-ordinates and the atomic factors of both ZgEngAGH5_4 and ZgEngAGH5_4_E323S have been deposited at the Protein Data Bank collection (https://www.wwpdb.org/) as PDB ID: 6GL2 and PDB ID: 6GL0, respectively. The modeling co-ordinates can be accessed through the following link: https://1drv.ms/f/s!ArX4zU6cjMUQnHQ9m5ScGJkyY_Kr.

Discussion

The frequent classification of family GH5 enzymes as cellulases in marine Flavobacteriia [13,16], despite the fact that these bacteria usually do not degrade crystalline cellulose [14,15], is puzzling. Therefore, and in the context of recent work highlighting that GH5 enzymes belong to one of the largest, multi-specific glycoside hydrolase families [7,1012], covering a very large range of activities, we applied a combination of methods spanning phylogeny, enzymology, crystallography and molecular modeling to explore key enzyme–substrate interactions in ZgEngAGH5_4 that define its substrate specificity. The comparison to other enzymes within GH5_4 reveals how substrate specificity is fine-tuned, even within the GH5_4 subfamily, and sheds further light on the roles of this subfamily in glucan catabolism. Taking the occurrence of this gene in a PUL that possibly is involved in the catabolism of hemicelluloses as a starting point, we show that cellulose and soluble β-1,4-glucan derivatives are not the preferred substrates. Instead, we demonstrate that ZgEngAGH5_4, is a β-(1,3-1,4)-glucanase that preferably cleaves β-1,3-linkages flanked by β-1,4-linkages, but is also able to hydrolyze β-(1,4)-linkages in glucomannan, or in short oligosaccharides, depending on the linkage positions. The ability to hydrolyze β-(1,4)-linkages in various substrates is common to EngD from C. cellulovorans (the closest structural relative of ZgEngAGH5_4), which has been described as a true cellulase [38]. However, the relative activity of these two enzymes differs radically when using xyloglucan or CMC as substrates. In those cases, ZgEngAGH5_4 is closer to other family GH5_4 members, such as PbGH5A from Prevotella bryantii and F32EG5 from Caldicellulosiruptor (Table 1).

Mapping the ZgEngAGH5_4 active site by the combination of crystallographic structure determination together with molecular modeling and product analyses using different substrate oligosaccharides suggests the presence of six well-defined binding subsites, evenly distributed with respect to the cleavage point, three negative and three positive subsites. The crystal structure of the inactivated mutant highlights the binding subsites on the non-reducing end to which the cellotriose molecule (GGG) is bound. The molecular dynamic simulations using β-(1,3-1,4)-hexasaccharides that differ by the position of the β-1,3-linkage corroborate this biochemically observed preference, since GGG3GGG, spanning the positions from −3 to +3, displayed the most favorable affinity for the catalytic cleft of ZgEngAGH5_4. In this configuration, the β-1,3-linkage is positioned at the cleavage site, in accordance with the preferred hydrolytic activity of the enzyme on polysaccharide.

The biochemical analyses also revealed that the smallest hydrolyzed substrates are tetrasaccharides. More generally, the mode of hydrolysis of minimal substrates showed that activity is favored when oligosaccharides are spanning the cleft using the −2 → +2 subsites, but hydrolysis does occur with modes spanning more subsites on the non-reducing end for GGG3G or on the reducing end for G4. Notably, oligosaccharides containing β-1,4-linkages only, such as cellulo-oligosaccharides G4 to G6, are also hydrolyzed, however, at a much slower rate than the preferred substrates. This is supported by MD, showing that cellulo-oligosaccharides were indeed much less stabilized in the active site cleft than the MLG oligosaccharides. In the case of G6, G4 is the first reaction product, meaning that for small oligosaccharides, occupation of negative subsites predominates over positive ones. The hydrolysis of the mixed-linkage oligosaccharides GGG3G and GG3GG also revealed the importance of the negative binding subsites in ZgEngAGH5_4. GGG3G is mainly hydrolyzed according to the −3 → +1 binding mode, demonstrating therefore that binding in subsite +2 is not essential for the hydrolysis of mixed-linkage oligosaccharides. However, the absence of hydrolysis of GG3GG in the same mode suggests that ZgEngAGH5_4 only tolerates β-1,4 bonds in the negative subsites, and that the presence of β-1,4-linkage in these positions is essential for hydrolysis of the neighboring β-1,3-linkage. In this respect, ZgEngAGH5_4 is closer to cellulases [9,37,38]. Indeed, eight residues (Figures 5 and 6) present in the negative binding subsites (N77, H155, H156, H275, Y277, W356 and N358) are well conserved throughout GH5_4 and typically bind to successive β-1,4-linked glucose units. This binding mode is completely different from that of family GH16 enzymes that cleave MLG (the so-called lichenases) that require a β-1,3-linkage in the negative subsites, a feature common to the β-glucanases ZgLamA_GH16 and ZgLamC from Z. galactanivorans [39,40]. Interestingly, these enzymes are also able to cleave both β-1,3- and β-1,4-linkages but, unlike ZgEngAGH5_4, ZgLamA_GH16 tolerate β-1,3 bonds in its negative subsites, rather than in the positive subsites.

The ability to cleave both β-1,3 and β-1,4 bonds has been previously described for GH5_4 enzymes [41], and it has been recently studied in light of 3D structures for PbGH5A [11], F32EG5 [10] and SdGluc5_26 [12]. Like ZgEngAGH5_4, these enzymes have β-(1,3-1,4) glucanase activities three to seven times higher than on CMC or cellulose. All of these enzymes require β-1,4-linkages between the −1 and −2 subsites and tolerate β-1,3-linkages in positive binding subsites. Nevertheless, subtle differences in accommodating the MLG substrate in the active site cleft can be noted between these enzymes. Although both ZgEngAGH5_4 and F32EG5 [10] tolerate both β-1,3 and β-1,4 bonds at the +1 and +2 subsites, they differ from each other by the fact that ZgEngAGH5_4 is unable to hydrolyze oligotrioses and has strict specificity for β-1,4-linkages between the −1 and −2 subsites, whereas F32EG5 only needs the −1 subsite to be occupied for activity [10]. The only structural difference between these enzymes in the negative binding sites consists in a loop that carries N358 and E363 (N362 and E370 in F32EG5 PDB ID: 4X0V; N367 and E375 in [10]) (Figure 7A). In F32EG5, E370 interacts with O6 of the glucose unit bound in the −1 subsite, while the different loop conformation in ZgEngAGH5_4 positions this residue far too distant (more than 7 Å between E363-OE1 and O6 of the glucose unit bound in −1, making this interaction impossible) (Figure 7A). Indeed, the point mutant of E363 in our study did not have any effect on the enzymatic activity. Contrarily, this additional stabilization of a glucose unit bound to the −1 subsite in F32EG5 thus plausibly explains the major difference between these two enzymes. The need to bind several β-1,4-linked glucose units at negative subsites, in turn, is shared with PbGH5A and SdGluc5_26, although their sequence identities to ZgEngAGH5_4 are lower (32% and 22%, respectively) than to F32EG5 (41.5% sequence identity). Notably, both in PbGH5A and SdGluc5_26, the binding cleft displays a more open space beyond binding subsite −1 towards the non-reducing end, although the corresponding loops and residues are highly diverse in these three enzymes. Apparently less tight binding of the unit bound to −1 implies that more sites need to be occupied for substrate stabilization prior to cleavage.

Differences in loop arrangements are also present at the positive end of the active site cleft, even within the GH5_4 subfamily. In this respect, ZgEngAGH5_4 has a uniquely featured loop between β6 and α6 (Figure 7B) that influences the positioning of the substrate at the +1 and +2 binding sites. The molecular dynamic simulations show that ZgEngAGH5_4 seems to display a rather flexible binding mode in these sites, in agreement with the fact that the mutation of Y280, to our surprise, did not affect activity. Apparently, the general difference of the loop structure in ZgEngAGH5_4 is sufficient to shape the binding cleft such that a mixed-linked chain with the β-1,3-linkage positioned at the −1 → +1 cleavage site is favored. In addition, the results of the product analyses of small oligosaccharides also indicate that binding at the positive subsites +2 and +3 are not crucial for the enzymatic activity or substrate specificity. In this context, it is interesting to note that activity at the level of the MLG polysaccharide differs from that on small oligosaccharides, highlighting that although powerful and useful for dissecting subtle substrate specificities, biochemical in vitro product analyses of oligosaccharides artefactually show activities that might not be relevant under natural conditions. Indeed, MLG polysaccharides appear to be hydrolyzed by ZgEngAGH5_4 almost exclusively at the β-1,3-linkages (Figure 4C). ZgEngAGH5_4 also shows substantial activity on glucomannan when compared with CMC and no activity at all on laminarin-like substrates that only contain β-1,3-linkages. These results on polysaccharides seem to point towards the fact that the overall 3D structural conformation of the polymeric chain also plays an important role for substrate specificity, and the kinked polysaccharide chain of MLG (or a non-regular structure, as in glucomannan) is the preferred site of hydrolysis of this enzyme. Interestingly, the bent or kinked active site cleft has also been described to be an important feature of other GH5_4 members.

In summary, in vitro ZgEngAGH5_4 appears to be most active on plant hemicellulose substrates, such as the polymers β-(1,3-1,4)-glucan and glucomannan, which raises the question of the functional rational behind this activity in the context of its ecologic and marine occurrence in Z. galactanivorans. While its evolutionary origin clearly points towards acquisition through lateral gene transfer from typical land–plant polysaccharide degrading bacteria, such as C. cellulovorans, the question remains whether the enzyme in the context of the physiology of Z. galactanivorans has ‘specialized’ for marine macroalgal cell wall components or if it remains specific of plant hemicelluloses. Arguments can be found for both scenarios: several macroalgal species of the red lineage have been reported to contain glucomannan as cell wall component [42], and mixed-linkage glucans are reported in red and brown algal species [4,42]. Moreover, engA is found in a PUL-like genetic context, next to hypothetical proteins that are indicative of a potential involvement in degradation of, to date, undescribed polysaccharide components. On the other hand, hemicellulosic polysaccharides, which strongly resemble those of land plants, can also be found in the marine environment in seagrasses, which could be the targeted natural substrate of this enzyme together with the adjacent PUL, for which the SusD-like protein was found to recognize xyloglucan. The elucidation of the biochemical activities and substrate specificities of the adjacent other components of the PUL-like structure may be the key to unravel the precise natural cell wall substrates that are targeted by these proteins.

Abbreviations

     
  • ANTS

    8-aminonaphthalene-1,3,6-trisulfonate

  •  
  • CAZy

    carbohydrate active enzyme

  •  
  • CBM

    carbohydrate-binding module

  •  
  • CMC

    carboxymethylcellulose

  •  
  • DLS

    dynamic light scattering

  •  
  • FACE

    fluorophore-assisted carbohydrate electrophoresis

  •  
  • G

    glucose

  •  
  • G2

    cellobiose

  •  
  • G3

    cellotriose

  •  
  • G3G, L2

    laminaribiose

  •  
  • G3GG, G3A

    glucosyl-(1 → 3)-β-d-cellobiose

  •  
  • G4

    cellotetraose

  •  
  • G5

    cellopentaose

  •  
  • G6

    cellohexaose

  •  
  • GG3G

    cellobiosyl-(1 → 3)-β-d-glucose

  •  
  • GG3GG

    cellobiosyl-(1 → 3))-β-d-cellobiose

  •  
  • GGG3G

    cellotriosyl-(1 → 3)-β-d-glucose

  •  
  • GH

    glycoside hydrolase

  •  
  • GH5

    family 5 of glycoside hydrolases

  •  
  • HPAEC

    high-performance anion exchange chromatography

  •  
  • MD

    molecular dynamics

  •  
  • MES

    2-(N-morpholino)ethanesulfonic acid hydrate

  •  
  • MLG

    mixed-linked glucan

  •  
  • PUL

    polysaccharide utilization locus

  •  
  • RMSD

    root-mean-square deviations

  •  
  • SEC

    size-exclusion chromatography

  •  
  • SEC-MALLS

    size-exclusion chromatography coupled to multiple angle laser light scattering

  •  
  • Tris

    2-amino-2-hydroxymethyl-1,3-propanediol

Author Contribution

G.M., M.C. and S.G. conceived the study. A.L. performed the protein expression experiments of the wild-type enzyme. S.R. crystallized the wild-type enzyme with the ligand molecule. G.M. performed the bioinformatics and phylogenetic analysis. M.C.-B. performed some preliminary hydrolysis kinetic analysis of the wild-type enzyme. S.G. produced and characterized the wild-type enzyme. M.C. and J.Da. solved the structure of the wild-type enzyme. S.R. produced the mutants. S.R. also produced the ligand used for the crystallization of the mutant. J.Do. and S.G. performed the kinetic analysis of the mutants. S.G. also performed the HPAEC and FACE experiments. J.Do. determined the X-ray structure of the mutants. M.G., A.O. and D.T. performed the computer simulations. J.Do., G.M., M.C. and S.G. wrote the manuscript with the help of A.L. and input from other coauthors. All authors approved the final version of the manuscript.

Acknowledgments

We are deeply grateful to Alexandra Jeudy for technical assistance especially in the crystallization experiments of the wild-type enzyme. We thank the local contact support on the beamlines ID-14-4 and Proxima2 at the ESRF (Grenoble, France) and SOLEIL (Paris, France). J.Do., S.R., M.G., A.O. and J.Da. and M.C. are grateful to the EU for its support with regard to the CellulosomePlus Program (FP7-NMP, project 604530). G.M. acknowledges support from the Agence Nationale de la Recherche (ANR) with regard to the ‘Blue Enzymes’ project (reference ANR-14-CE19-0020-01). G.M. and M.C. are also grateful to ANR for its support with regard to the investment expenditure program IDEALG (http://www.idealg.ueb.eu/, grant agreement no. ANR-10-BTBR-04). D.T. thanks Science Foundation Ireland (SFI) for support (grant no. 15/CDA/3491) and for computing resources at the SFI/Higher Education Authority Irish Center for High-End Computing (ICHEC).

Competing Interests

The Authors declare that there are no competing interests associated with the manuscript.

References

References
1
Popper
,
Z.A.
,
Michel
,
G.
,
Hervé
,
C.
,
Domozych
,
D.S.
,
Willats
,
W.G.
,
Tuohy
,
M.G.
et al. 
(
2011
)
Evolution and diversity of plant cell walls: from algae to flowering plants
.
Annu. Rev. Plant Biol.
62
,
567
590
2
Sørensen
,
I.
,
Pettolino
,
F.A.
,
Wilson
,
S.M.
,
Doblin
,
M.S.
,
Johansen
,
B.
,
Bacic
,
A.
et al. 
(
2008
)
Mixed-linkage (1→3),(1→4)-β-d-glucan is not unique to the Poales and is an abundant component of Equisetum arvense cell walls
.
Plant J.
54
,
510
521
3
Eder
,
M.
,
Tenhaken
,
R.
,
Driouich
,
A.
and
Lütz-Meindl
,
U.
(
2008
)
Occurrence and characterization of arabinogalactan-like proteins and hemicelluloses in Micrasterias (Streptophyta)(1)
.
J. Phycol.
44
,
1221
1234
4
Salmeán
,
A.A.
,
Duffieux
,
D.
,
Harholt
,
J.
,
Qin
,
F.
,
Michel
,
G.
,
Czjzek
,
M.
et al. 
(
2017
)
Insoluble (1→3), (1→4)-β-d-glucan is a component of cell walls in brown algae (Phaeophyceae) and is masked by alginates in tissues
.
Sci. Rep.
7
,
2880
5
Lombard
,
V.
,
Golaconda Ramulu
,
H.
,
Drula
,
E.
,
Coutinho
,
P.M.
and
Henrissat
,
B.
(
2014
)
The carbohydrate-active enzymes database (CAZy) in 2013
.
Nucleic Acids Res.
42
,
D490
D495
6
Planas
,
A.
(
2000
)
Bacterial 1,3-1,4-β-glucanases: structure, function and protein engineering
.
Biochim. Biophys. Acta, Protein Struct. Mol. Enzymol.
1543
,
361
382
7
Aspeborg
,
H.
,
Coutinho
,
P.M.
,
Wang
,
Y.
,
Brumer
, III,
H.
and
Henrissat
,
B.
(
2012
)
Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5)
.
BMC Evol. Biol.
12
,
186
8
Davies
,
G.J.
and
Henrissat
,
B.
(
1995
)
Structures and mechanisms of glycosyl hydrolases
.
Structure
3
,
853
859
9
Ducros
,
V.
,
Czjzek
,
M.
,
Belaich
,
A.
,
Gaudin
,
C.
,
Fierobe
,
H.P.
,
Belaich
,
J.P.
et al. 
(
1995
)
Crystal structure of the catalytic domain of a bacterial cellulase belonging to family 5
.
Structure
3
,
939
949
10
Meng
,
D.D.
,
Liu
,
X.
,
Dong
,
S.
,
Wang
,
Y.F.
,
Ma
,
X.Q.
,
Zhou
,
H.
et al. 
(
2017
)
Structural insights into the substrate specificity of a glycoside hydrolase family 5 lichenase from Caldicellulosiruptor sp. F32
.
Biochem. J.
474
,
3373
3389
11
McGregor
,
N.
,
Morar
,
M.
,
Fenger
,
T.H.
,
Stogios
,
P.
,
Lenfant
,
N.
,
Yin
,
V.
et al. 
(
2016
)
Structure-function analysis of a mixed-linkage β-glucanase/xyloglucanase from the key ruminal bacteroidetes Prevotella bryantii B(1)4
.
J. Biol. Chem.
291
,
1175
1197
12
Lafond
,
M.
,
Sulzenbacher
,
G.
,
Freyd
,
T.
,
Henrissat
,
B.
,
Berrin
,
J.G.
and
Garron
,
M.L.
(
2016
)
The quaternary structure of a glycoside hydrolase dictates specificity toward β-glucans
.
J. Biol. Chem.
291
,
7183
7194
13
Barbeyron
,
T.
,
Thomas
,
F.
,
Barbe
,
V.
,
Teeling
,
H.
,
Schenowitz
,
C.
,
Dossat
,
C.
et al. 
(
2016
)
Habitat and taxon as driving forces of carbohydrate catabolism in marine heterotrophic bacteria: example of the model algae-associated bacterium Zobellia galactanivorans DsijT
.
Environ. Microbiol.
18
,
4610
4627
14
Bernardet
,
J.F.
,
Segers
,
P.
,
Vancanneyt
,
M.
,
Berthe
,
F.
,
Kersters
,
K.
and
Vandamme
,
P.
(
1996
)
Cutting a Gordian knot: emended classification and description of the genus Flavobacterium, emended description of the family Flavobacteriaceae, and proposal of Flavobacterium hydatis nom. nov. (basonym, Cytophaga aquatilis Strohl and Tait 1978)
.
Int. J. Syst. Bacteriol.
46
,
128
148
15
Thomas
,
F.
,
Hehemann
,
J.H.
,
Rebuffet
,
E.
,
Czjzek
,
M.
and
Michel
,
G.
(
2011
)
Environmental and gut bacteroidetes: the food connection
.
Front. Microbiol.
2
,
93
16
Elifantz
,
H.
,
Waidner
,
L.A.
,
Michelou
,
V.K.
,
Cottrell
,
M.T.
and
Kirchman
,
D.L.
(
2008
)
Diversity and abundance of glycosyl hydrolase family 5 in the North Atlantic Ocean
.
FEMS Microbiol. Ecol.
63
,
316
327
17
Foong
,
F.C.
and
Doi
,
R.H.
(
1992
)
Characterization and comparison of Clostridium cellulovorans endoglucanases-xylanases EngB and EngD hyperexpressed in Escherichia coli
.
J. Bacteriol.
174
,
1403
1409
18
Katoh
,
K.
and
Standley
,
D.M.
(
2013
)
MAFFT multiple sequence alignment software version 7: improvements in performance and usability
.
Mol. Biol. Evol.
30
,
772
780
19
Tamura
,
K.
,
Stecher
,
G.
,
Peterson
,
D.
,
Filipski
,
A.
and
Kumar
,
S.
(
2013
)
MEGA6: molecular evolutionary genetics analysis version 6.0
.
Mol. Biol. Evol.
30
,
2725
2729
20
Juncker
,
A.S.
,
Willenbrock
,
H.
,
Von Heijne
,
G.
,
Brunak
,
S.
,
Nielsen
,
H.
and
Krogh
,
A.
(
2003
)
Prediction of lipoprotein signal peptides in Gram-negative bacteria
.
Protein Sci.
12
,
1652
1662
21
Gaboriaud
,
C.
,
Bissery
,
V.
,
Benchetrit
,
T.
and
Mornon
,
J.P.
(
1987
)
Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences
.
FEBS Lett.
224
,
149
155
22
Barbeyron
,
T.
,
Kean
,
K.
and
Forterre
,
P.
(
1984
)
DNA adenine methylation of GATC sequences appeared recently in the Escherichia coli lineage
.
J. Bacteriol.
160
,
586
590
PMID:
[PubMed]
23
Groisillier
,
A.
,
Hervé
,
C.
,
Jeudy
,
A.
,
Rebuffet
,
E.
,
Pluchon
,
P.F.
,
Chevolot
,
Y.
et al. 
(
2010
)
MARINE-EXPRESS: taking advantage of high throughput cloning and expression strategies for the post-genomic analysis of marine organisms
.
Microb. Cell Fact.
9
,
45
24
Studier
,
F.W.
(
2005
)
Protein production by auto-induction in high density shaking cultures
.
Protein Expr. Purif.
41
,
207
234
25
Kidby
,
D.K.
and
Davidson
,
D.J.
(
1973
)
A convenient ferricyanide estimation of reducing sugars in the nanomole range
.
Anal. Biochem.
55
,
321
325
26
Östling
,
S.
and
Virtama
,
P.
(
1946
)
A modified preparation of the universal buffer described by Teorell and Stenhagen
.
Acta Phys. Scandinav.
11
,
289
293
27
Kabsch
,
W.
(
2010
)
Xds
.
Acta Crystallogr. D.
66
,
125
132
28
Vagin
,
A.
and
Teplyakov
,
A.
(
1997
)
MOLREP: an automated program for molecular replacement
.
J. Appl. Crystallogr.
30
,
1022
1025
29
Perrakis
,
A.
,
Sixma
,
T.K.
,
Wilson
,
K.S.
and
Lamzin
,
V.S.
(
1997
)
wARP: improvement and extension of crystallographic phases by weighted averaging of multiple-refined dummy atomic models
.
Acta Crystallogr. D Biol. Crystallogr.
53
,
448
455
30
Emsley
,
P.
,
Lohkamp
,
B.
,
Scott
,
W.G.
and
Cowtan
,
K.
(
2010
)
Features and development of Coot
.
Acta Crystallogr. D Biol. Crystallogr.
66
,
486
501
31
Vagin
,
A.A.
,
Steiner
,
R.A.
,
Lebedev
,
A.A.
,
Potterton
,
L.
,
McNicholas
,
S.
,
Long
,
F.
et al. 
(
2004
)
REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use
.
Acta Crystallogr. D Biol. Crystallogr.
60
,
2184
2195
32
McCoy
,
A.J.
,
Grosse-Kunstleve
,
R.W.
,
Adams
,
P.D.
,
Winn
,
M.D.
,
Storoni
,
L.C.
and
Read
,
R.J.
(
2007
)
Phaser crystallographic software
.
J. Appl. Crystallogr.
40
,
658
674
33
Thomas
,
F.
,
Bordron
,
P.
,
Eveillard
,
D.
and
Michel
,
G.
(
2017
)
Gene expression analysis of Zobellia galactanivorans during the degradation of algal polysaccharides reveals both substrate-specific and shared transcriptome-wide responses
.
Front. Microbiol.
8
,
1808
34
Salmeán
,
A.A.
,
Guillouzo
,
A.
,
Duffieux
,
D.
,
Jam
,
M.
,
Matard-Mann
,
M.
,
Larocque
,
R.
et al. 
(
2018
)
Double blind microarray-based polysaccharide profiling enables parallel identification of uncharacterized polysaccharides and carbohydrate-binding proteins with unknown specificities
.
Sci. Rep.
8
,
2500
35
Gasteiger
,
E.
,
Hoogland
,
C.
,
Gattiker
,
A.
,
Duvaud
,
S.
,
Wilkins
,
M.R.
,
Appel
,
R.D.
et al. 
(
2005
) Protein identification and analysis tools on the ExPASy Server. In
The Proteomics Protocols Handbook
(
Walker
,
J.M.
, ed.). pp.
571
607
,
Totowa, NJ, U.S.A.
,
Humana Press
.
36
Holm
,
L.
and
Laakso
,
L.M.
(
2016
)
Dali server update
.
Nucleic Acids Res.
44
,
W351
W355
37
Bortoli-German
,
I.
,
Haiech
,
J.
,
Chippaux
,
M.
and
Barras
,
F.
(
1995
)
Informational suppression to investigate structural functional and evolutionary aspects of the Erwinia chrysanthemi cellulase EGZ
.
J. Mol. Biol.
246
,
82
94
38
Bianchetti
,
C.M.
,
Brumm
,
P.
,
Smith
,
R.W.
,
Dyer
,
K.
,
Hura
,
G.L.
,
Rutkoski
,
T.J.
et al. 
(
2013
)
Structure, dynamics, and specificity of endoglucanase D from Clostridium cellulovorans
.
J. Mol. Biol.
425
,
4267
4285
39
Labourel
,
A.
,
Jam
,
M.
,
Jeudy
,
A.
,
Hehemann
,
J.H.
,
Czjzek
,
M.
and
Michel
,
G.
(
2014
)
The β-glucanase ZgLamA from Zobellia galactanivorans evolved a bent active site adapted for efficient degradation of algal laminarin
.
J. Biol. Chem.
289
,
2027
2042
40
Labourel
,
A.
,
Jam
,
M.
,
Legentil
,
L.
,
Sylla
,
B.
,
Hehemann
,
J.H.
,
Ferrières
,
V.
et al. 
(
2015
)
Structural and biochemical characterization of the laminarinase ZgLamCGH16 from Zobellia galactanivorans suggests preferred recognition of branched laminarin
.
Acta Crystallogr. D Biol. Crystallogr.
71
,
173
184
41
Iakiviak
,
M.
,
Mackie
,
R.I.
and
Cann
,
I.K.
(
2011
)
Functional analyses of multiple lichenin-degrading enzymes from the rumen bacterium Ruminococcus albus 8
.
Appl. Environ. Microbiol.
77
,
7541
7550
42
Lechat
,
H.
,
Amat
,
M.
,
Mazoyer
,
J.
,
Buléon
,
A.
and
Lahaye
,
M.
(
2000
)
Structure and distribution of glucomannan and sulfated glucan in the cell walls of the red alga Kappaphycus alvarezii (Gigartinales, Rhodophyta)
.
J. Phycol.
36
,
891
902
43
Robert
,
X.
and
Gouet
,
P.
(
2014
)
Deciphering key features in protein structures with the new ENDscript server
.
Nucleic Acids Res.
42
,
W320
W324

Author notes

*

These authors contributed equally to this work.