Proteins belonging to cupin superfamily are known to have critical and diverse physiological functions. However, 7S globulins family, which is also a part of cupin superfamily, were undermined as only seed storage proteins. Structure determination of native protein — Vic_CAPAN from Capsicum annuum — was carried out, and its physiological functions were explored after purifying the protein by ammonium sulfate precipitation followed by size exclusion chromatography. The crystal structure of vicilin determined at 2.16 Å resolution revealed two monomers per asymmetric unit which are juxtaposed orthogonal with each other. Vic_CAPAN consists predominately of β-sheets that folds to form a β-barrel structure commonly called cupin fold. Each monomer of Vic_CAPAN consists of two cupin fold domains, N-terminal and C-terminal, which accommodate two different ligands. A bound ligand was identified at the C-terminal cupin fold in the site presumably conserved for metabolites in the crystal structure. The ligand was confirmed to be salicylic acid through mass spectrometric analysis. A copper-binding site was further observed near the conserved ligand-binding pocket, suggesting possible superoxide dismutase activity of Vic_CAPAN which was subsequently confirmed biochemically. Vicilins from other sources did not exhibit this activity indicating functional specificity of Vic_CAPAN. Discovery of bound salicylic acid, which is a known regulator of antioxidant pathway, and revelation of superoxide dismutase activity suggest that Vic_CAPAN has an important role during oxidative stress. As salicylic acid changes the redox state of cell, it may act as a downstream signal for various pathways involved in plant biotic and abiotic stress rescue.
Seed is an embryonic part of the plant capable of growing into a complete plantlet. During growth and germination, seeds are exposed to various biotic and abiotic stresses. Although being a dormant phase of a plant's life cycle, seed comprises all the components that are capable of rescuing it from such adverse environmental conditions. Seeds majorly consist of carbohydrate, lipids and proteins . Seed proteins are known to confer physiological functions that are important for the growth and germination of seed. There is an array of proteins in seeds, which perform functions to help the seeds to survive environmental stress and further grow into a complete matured plant.
Some of the proteins are known to help seeds maintain their structural features and metabolism. Examples of such proteins are β-amylase, immunoglobulin-binding protein and heat shock proteins . Other proteins are involved in the protection of seeds from pests and pathogens . Some proteins are known only to act as nutritional reserves for the seeds with apparently no involvement in any physiological activity. Such proteins, which are termed as ‘storage proteins’, are worth exploring as these can have essential physiological functions not yet discovered . Seed storage proteins are broadly classified into globulins, albumins and prolamins. Globulins constitute a major part of the seed proteome and they are further classified into 7S globulins and 11S globulins based on their sedimentation rates . 7S globulins are also termed as vicilins, and they are a major source of nutrition during the growth and development of the seeds .
Detailed structural studies of native seed proteins can give clues on possible functional roles of these proteins in seeds. The proteome of Capsicum annuum seeds was explored towards this end. Capsicum is cultivated all around the world and is a significant part of human diet. Capsicum is known to be an excellent source of vitamins and minerals like molybdenum, manganese, folate, potassium and thiamine . Apart from being rich in these ingredients, capsicum is also known to have medicinal properties . With this respect, the proteome and phenolic components of capsicum have been explored. Functions of many major capsicum seed proteins are already discovered in an attempt to identify the elements responsible for their respective physiological and medicinal properties. However, storage proteins of C. annuum are not well studied in this context. Structure determination and functional characterization of major proteins from capsicum seed proteome, whose role is not yet deciphered, has been our focus.
Among the vicilin family of proteins explored in the past, the structure and function of Vic_CAPAN (vicilin of C. annuum) was not studied in detail. Previous comparative structural analyses reveal that vicilins belong to cupin superfamily. Cupin superfamily proteins are distributed among all kingdoms and are known to have a wide range of biochemical functions. They are known for their various enzymatic roles ranging from epimerase and transferase in prokaryotes to oxalate oxidase and iron-binding nuclear protein (pirin) in eukaryotes . Vicilins are predicted to have a role during desiccation and oxidative stress as they increase under dehydrating conditions . However, the detailed molecular mechanisms by which vicilins manage desiccation and oxidative stress are not yet explained. To understand this, we carried out structural and functional studies of Vic_CAPAN from C. annuum.
We were able to purify, crystallize and solve the structure of Vic_CAPAN, a vicilin from C. annuum. The detailed structural analysis of Vic_CAPAN led to the identification of a conserved binding pocket at the C-terminal end of the protein, which has bound salicylic acid. Further observation identified a copper-binding site in the structure. Independently, Vic_CAPAN was found to possess a novel superoxide dismutase (SOD) activity. Salicylic acid is known to be a modulator of proteins involved in the SOD activity and is also known to rescue plants during water stress . These analyses reveal that vicilins play a crucial role in rescuing seeds from oxidative stress during germination and salicylic acid could be the modulator of this activity.
Materials and methods
Fractionation of seed proteome
C. annuum seeds were ground into powder and delipidified using petroleum ether thrice, followed by air drying at room temperature. Fifteen grams of this dried and delipidified powder were homogenized in 100 ml of 50 mM Tris–HCl and 150 mM of sodium chloride solution at pH 7.5 by stirring for 3 h at 4°C in the presence of a protease inhibitor cocktail (Sigma–Aldrich). The homogenized mixture was centrifuged at 12 000 rpm for 60 min at 4°C. The protein extract was then precipitated based on their solubility in increasing concentration of ammonium sulfate from 20% to 95% (w/v). The precipitated protein of every fraction was then centrifuged at 12 000 rpm for 60 min at 4°C. The pellets were redissolved in a minimum quantity of extraction buffer and were analyzed on 15% SDS–PAGE.
The major protein bands of SDS–PAGE gel were transferred to the PVDF membrane using CAPS buffer at pH 11 and were further subjected to N-terminal sequencing by the Edman degradation method on a PPSQ-33A protein sequencer (Shimadzu). After obtaining the N-terminal sequence, initial identification of proteins was done by searching for homologous proteins through BLAST  against the complete (non-redundant) protein database.
Peptide mass fingerprinting
MALDI-TOF mass spectrometry was utilized in the case of bands which had possibly their N-terminus blocked. The samples were prepared for peptide mass fingerprinting evaluation by the in-gel trypsin digestion method  and were spotted with α-cyano-4-hydroxycinnamic acid (5 mg/ml in 80% ACN and 0.1% TFA) onto the MALDI target plate. Peptides (10 mg/ml) were spotted twice in 1 : 5 and 1 : 10 ratio of the matrix. External calibration was performed using calibration mixture 1 from the mass calibration standard kit (AB SCIEX), and the spectrum was acquired using 1000 laser shots. Mass spectra analysis of the trypsin digested peptide samples was performed using the 5800 MALDI–TOF/TOF analyzer (AB SCIEX) and the instrument was operated in a positive ion mode. The laser power was set at 3500 for MS acquisition, and the spectra were acquired between 800 and 4000 m/z. Furthermore, dominant peaks of the parent ions were fragmented, and MS/MS analysis was done using 4000 Series Explorer software, version 4.0 (AB SCIEX) with laser power set at 4200. The spectra scan for fragmented parent–daughter ions ranged from 9 m/z to mass of corresponding parent ion.
MS/MS data analysis and ion search for protein and peptide were performed with the protein pilot software (version 4.5, Applied Biosystems). Highly specific rigorous search system, PARAGON was used to identify either known or identical proteins. For the analysis, peptide precursor tolerance was set to 0.5 Da (MALDI) and MS/MS tolerance was 0.2 Da (MALDI) within PARAGON parameters.
Vic_CAPAN was purified from 60% ammonium sulfate fraction through size exclusion chromatography using the manually packed sephacryl S-200 preparatory column (GE Healthcare). Tris–HCl (50 mM) at pH 7.5 containing 150 mM NaCl was used as equilibration and running buffer for separation of Vic_CAPAN from other proteins which was co-precipitated along with the dominant Vic_CAPAN protein. Elution of the protein was done at 1 ml/min on AKTA pure (GE healthcare), and the fractions collected were analyzed on SDS–PAGE for homogeneity. The fraction of first peak containing Vic_CAPAN was pooled and was concentrated using ultra centrifugal filters (Amicon, 30 kDa cutoff, Millipore). The concentration of Vic_CAPAN was determined by the BCA protein assay (Pierce Biotechnology) using BSA as a standard.
Crystallization and data collection
Purified Vic_CAPAN was concentrated up to 15 mg/ml to set up a hanging drop crystallization plate using Mosquito crystal robot (TTP-labtech). After getting few initial hits for Vic_CAPAN, the crystallization conditions were optimized and cuboid shape crystals were obtained by the hanging drop vapor diffusion method in 50 mM HEPES, 12% PEG (polyethylene glycol) 3350 at 4°C. A single crystal was soaked in 30% ethylene glycol for 30 s and frozen in liquid nitrogen. X-ray diffraction data were collected at beamline BM14, European Synchrotron Radiation Facility (ESRF), Grenoble, France at wavelengths of 0.953 Å. The data were processed, scaled and merged using HKL2000 . The data collection statistics are shown in Table 1.
|Unit cell dimension (Å)||148.034|
|Resolution range (Å)||50.00–2.16 (2.28–2.16)|
|Total number of reflections||57 851|
|Average B-factors (Å2) — Vic_CAPAN||40.0|
|Average B-factors (Å2) — salicylic acid||37.4|
|RMSD (bond length) (Å)||0.0098|
|RMSD (bond angle) (Å)||0.905|
|Ramachandran plot favored (%)||98.9%|
|Ramachandran plot disallowed (%)||0.3%|
|Unit cell dimension (Å)||148.034|
|Resolution range (Å)||50.00–2.16 (2.28–2.16)|
|Total number of reflections||57 851|
|Average B-factors (Å2) — Vic_CAPAN||40.0|
|Average B-factors (Å2) — salicylic acid||37.4|
|RMSD (bond length) (Å)||0.0098|
|RMSD (bond angle) (Å)||0.905|
|Ramachandran plot favored (%)||98.9%|
|Ramachandran plot disallowed (%)||0.3%|
Values in parentheses are for the highest resolution shell.
Rmerge = Σ|I − 〈I〉|/ΣI, where I is the integrated intensity of a given reflection.
Refinement statistics: Rwork = Σ||Fobs| − |Fcalc||/Σ|Fobs|.
Rfree was calculated using 5% of data excluded from refinement.
Structure determination and refinement
The complete protein sequence for Vic_CAPAN was extracted from the genome of C. annuum–Zunla-1 species . The closest available protein data bank structure for the sequence obtained was vicilin from Solanum melongena (PDB ID: 5CAD) with 52% identity which was used as a search model for molecular replacement  of Vic_CAPAN. Heteroatoms were deleted from the co-ordinates before molecular replacement. Phaser  was used for molecular replacement with all alternative space group option to ensure correct space group determination and best possible solution. The molecular replacement solution was checked manually for symmetry-mate clashes within 8 Å range in PyMOL. Initial structure reﬁnement was carried out with phenix.reﬁne  at a resolution of 2.16 Å alternated with model building and model improvement using Coot . Since the sequence was extracted from a genomic database of another variety, there were many ambiguities in the sequence which were rectified using the developing electron density after each cycle of refinement. Also, we used multiple homologous proteins to generate a consensus sequence to avoid error in the sequence of the final structure. Ligand density was validated through ligand omit map.
Verification of copper metal
X-Ray Absorption Near Edge Spectroscopy (XANES) was used to detect copper in Vic_CAPAN crystal. A Vic_CAPAN crystal was exposed to X-rays on synchrotron on ID 30B beamline, and an energy scan specific to copper atom ranging from 8860 to 9040 KeV was done to confirm the presence of copper bound to protein . The data obtained were plotted using ATHENA after normalization .
Superoxide dismutase assay
The superoxide dismutase assay was performed using inhibition of photoreduction in nitroblue tetrazolium (NBT) by SOD as described by Beauchamp and Fridovich in 1971  with modification. Purified Vic_CAPAN at a concentration of 13 µg/ml was loaded on 8% native gel and was run for 2:30 h at 120 V at 4°C. One hundred milliliter of the solution containing 2.5 mg of NBT salt (SRL) and 1 g of riboflavin (Sigma–Aldrich) was prepared, and the gel was soaked in this solution for 1 h in the dark. Furthermore, 20 µl of TEMED (Sigma–Aldrich) was added to it and was incubated for 20 min continuing in the dark. The gel was then rinsed with 50 mM phosphate buffer (pH 7.2) and exposed to light for color formation. Prestained pink plus marker from Gene Direct and copper–zinc superoxide dismutase (CuZnSOD) from human erythrocytes (Sigma–Aldrich) were used as negative and positive controls, respectively. Comparison of SOD activity was done similarly with 5 µg of Vic_CAPAN, CuZnSOD and vicilins from cowpea, okra and hazelnut.
A quantitative spectrophotometric assay of Vic_CAPAN was done using the SOD assay kit (Sigma–Aldrich). CuZnSOD from human erythrocytes was taken as a positive control, and the similar conditions with no protein in final solution were used as a negative control. Comparison of SOD activity of Vic_CAPAN with vicilins of cowpea, okra and hazelnut was done using the same kit.
Verification of salicylic acid co-purified with Vic_CAPAN
Eighty microliter of 40 mg/ml protein present in buffer (50 mM Tris–HCl, pH 7.5 and 150 mM sodium chloride) was denatured by adding 1 ml of 100% methanol. The mixture was agitated and further centrifuged. The supernatant of this mixture was used for determination of ligand by directly injecting it into the SCIEX TripleTOF 6600 system. Blank was prepared by dissolving 80 µl of the buffer in 1 ml of 100% methanol. The machine was calibrated with APCI negative calibration solution in a negative mode. The declustering potential and collision energy of the machine were kept −50 V and −10 eV, respectively, and the samples were scanned from 50 to 500 Da for mass spectrometry. Furthermore, fragmentation of the parent ion was done using collision energy of −20 eV and the fragmented daughter ions were scanned from 50 to 500 Da.
Purification and characterization of Vic_CAPAN from C. annuum
Ammonium sulfate fractionation resulted in precipitation of five major proteins: CA-1, CA-2, CA-3, CA-4 and CA-5 (Supplementary Figure S1). All of them were subjected to N-terminal sequencing after transferring onto the PVDF membrane and initial 15–20 amino acids were sequenced for each band (Supplementary Table SS1). N-terminal sequence was used to identify the nearest possible homolog protein through BLAST algorithm. N-terminal sequences of few bands showed no significant match with known plant proteins and were attempted to be defined by mass spectrometry. CA-1 and CA-5 showed considerable homology with vicilin-like antimicrobial peptide (Nicotiana tomentosiformis) and non-specific lipid-transfer protein-like protein At2g13820 isoform X1 (Solanum pennellii), respectively. CA-2, CA-3 and CA-4 were identified using peptide mass fingerprinting and were classified as legumin A-like protein (C. annuum) and alpha and beta subunits of 11S globulin seed storage protein (C. annuum), respectively (Supplementary Table SS1). Detailed characterization indicated that the CA-1 is identical with Vic_CAPAN. Therefore, for the ease of nomenclature, CA-1 was renamed as Vic_CAPAN. To shed light on the possible functions of vicilin-like storage proteins during germination, CA-1 was selected for purification and characterization. Size exclusion chromatography of 60% cut gave three major peaks among which the first peak (Peak A) in the chromatogram corresponded to a highly purified protein of molecular mass 45 kDa, as shown by 15% SDS–PAGE (Supplementary Figure S2a). The purified protein was further confirmed by Vic_CAPAN through N-terminal sequencing.
Crystallization and structure determination
Crystals of Vic_CAPAN were obtained at 12 mg/ml concentration of protein through the vapor diffusion method which diffracted up to 2.16 Å resolution (Supplementary Figure S2b). Data processed from HKL2000 revealed that the crystals belonged to cubic space group P213 with unit cell parameter of 148.03 Å. The data collection and processing statistics are summarized in Table 1. Mathew's coefficient revealed two monomers of Vic_CAPAN per asymmetric unit.
The structure of Vic_CAPAN was determined by borrowing phases from its homolog vicilin of S. melongena by molecular replacement. The top model of phaser solution showed initial log-likelihood gain, rotation factor Z-score and translation factor Z-score for Vic_CAPAN model as 2482.6, 56.1 and 51.0, respectively. Rigid body refinement of the model gave Rcryst of 31.14% and Rfree of 37.39%, which further decreased to 30.23% and 34.25%, respectively, after B-group and positional refinement. Furthermore, the values decreased until 20.52 and 24.28 for Rcryst and Rfree, respectively, with 98.2% of the residues in the allowed region of the Ramachandran plot. Remaining two residues labeled, as outliers are glycines in each monomer at position 279. The RMSD from ideal empirical values was observed to be 0.917 and 0.008 for bond angle and bond length, respectively. The final statistics of the refined model is listed in Table 1.
The final refined model of Vic_CAPAN (PDB ID: 5YJS) had two monomers per asymmetric unit placed perpendicular to each other. Each monomer contains 370 amino acids among which electron density for 99 amino acids (in reference to the gene sequence) at N-terminal and 25 at C-terminal was absent. This is possibly due to the loss of amino acids, presumably due to the posttranslational modifications, which is also conclusive from the N-terminal sequence starting from the same segment as that of the crystal structure and therefore was not included in the final structure. Also, there was no electron density for the amino acids in a loop region (residues 420–439). Refined structure of Vic_CAPAN consists predominantly of β-sheets followed by loops and α-helices. Generation of symmetry mate for Vic_CAPAN revealed that each monomer exists in trimeric state, which is similar to the previously reported structures of vicilins. The existence of vicilins as trimer makes them thermally more stable which hints their crucial role during germination .
Each monomer of Vic_CAPAN (Figure 1) consists of a β-barrel core domain and has α-helices forming an extended arm. Also, they assumed a typical similar cupin fold on both N- and C-terminal which is a characteristic feature of 7S globulin making them a part of bicupin family. There were many ambiguities in the available genomic sequence as it was extracted from a different variety. So, the final amino acid sequence for Vic_CAPAN was rebuilt during refinement cycles according to the developing electron density and by considering the conserved amino acids present in nearest homolog proteins of Vic_CAPAN. The final sequence based on the electron density was 17% different from that of the gene sequence. The final sequence aligned with the gene-derived sequence of Vic_CAPAN is shown in Supplementary Figure S3.
Overall structure of Vic_CAPAN with ligands in both the pockets.
The final structure of Vic_CAPAN (PDB ID: 5YJS) also includes one copper-binding site in each monomer within the same pocket as observed in Korean pine and Pecan [22,23]. The difference peak map at this site was considerably higher and could not be satisfied merely by building water molecule or other ions used during crystallization. The presence of copper at the site was further confirmed by scanning the energy across an X-ray absorption edge of copper (Figure 2). The geometry attained by the copper-binding site is trigonal planar as reported by the CheckMyMetal site , and the co-ordination is fulfilled by two histidines (H419 and H464) and a cysteine (C417). Similar geometry was observed in 7S globulin structure from pecan  and korean pine . Fourth co-ordination of copper ion is partially fulfilled by the hydroxyl group of tyrosine which is at a distance of 3.1 Å (Figure 3a). There is also a methionine (M415) residue (Figure 3a) near the copper site at a distance of 6.4 Å which hints that this copper can drop the external ligand upon reduction, similar to other known copper-binding sites . The shakedown in the linear curve of the XANES data (Figure 2) is similar to copper with the oxidation state of Cu(I) . As there was no copper solution used either during purification or crystallization, the presence of copper ion at this site proves that it exists naturally in this protein.
Stereo-stick model of cofactors in Vic_CAPAN.
Superoxide dismutase activity
After analyzing the oxidation state and copper-binding site of Vic_CAPAN, we tried to identify whether it has any role in the enzymatic reaction. As the nearest possible ortholog of Vic_CAPAN from the plant with known enzymatic activity was revealed to be germins, we explored the possibility of Vic_CAPAN having enzymatic activity similar to germins. Very interestingly, Vic_CAPAN and germins both are seed proteins and on structural alignment and have their metabolite in the same pocket (Supplementary Figure S4c). Germins apart from having oxalate oxidase activity also had SOD activity which is attributed to manganese ion . Many reported SOD enzymes have copper in their catalytic domain and, thus, this activity was studied with respect to Vic_CAPAN.
SOD activity of Vic_CAPAN was identified through the NBT gel-based assay. The scanned gel (Figure 4a) with white lane of Vic_CAPAN (loaded in lane B) against stained background states that it was capable of neutralizing superoxide that was generated during the assay. Lane A loaded with CuZnSOD was also white against the background. On the contrary, the markers were stained blue as they lack such enzymatic activity. Quantitative assay of SOD activity of Vic_CAPAN revealed that Vic_CAPAN preserves at least 25% of the activity of CuZn SOD (Figure 4b). WST-1 in the presence of superoxide radical produces soluble formazan dye, which is measured spectroscopically. The decrease in absorbance of the reaction mixture containing Vic_CAPAN in comparison with that of blank depicts that Vic_CAPAN has a strong capacity to neutralize superoxide radicals.
Superoxide dismutase assay of Vic_CAPAN.
Superoxide dismutase activity in other vicilins
SOD activity of other vicilins, i.e. cowpea, okra and hazelnut, was also analyzed. It was evident that none of them possess SOD activity. Figure 5a,b shows that vicilin from capsicum was capable of quenching superoxide produced during the reaction, whereas other proteins failed to do so.
Comparison of superoxide dismutase activity of vicilins from different sources.
We were able to identify ligand bound to Vic_CAPAN (PDB ID: 5YJS) from the electron density, which was further confirmed by mass spectrometry. There are two hydrophobic pockets in Vic_CAPAN, and the C-terminal hydrophobic core was found to accommodate salicylic acid at the same position as that of acetate in SM80.1 . The electron density observed in difference map after the final round of refinement before introducing ligand was equivalent to the electron density of a ring structure having functional group similar to acetate and, therefore, we initially introduced benzoic acid which fitted perfectly into the electron density. However, after building iterative omit map, we saw an extra density on the second carbon of the benzoic acid ring. Considering the possibility that this could be due to the presence of a derivative of benzoic acid, we performed mass spectrometry to identify the ligand bound to Vic_CAPAN. We initially used three standards as the reference — benzoic acid, salicylic acid and anthranilic acid.
Mass spectrometry analysis of supernatant of denatured protein revealed the bound ligand to be salicylic acid (2-hydroxybenzoic acid). The peak observed at 137.0258 Da in negative mode of MS spectra corresponds to the molecular mass of salicylic acid (Figure 6a; Supplementary Figure S5a). Furthermore, fragmentation pattern of MS/MS unambiguously confirmed the molecule to be salicylic acid (Figure 6b; Supplementary Figure S5b). Also, it is evident that we can observe density for salicylic acid in both the monomers of asymmetric unit. Salicylic acid present in the binding pocket of cupin occupies ∼336.6 Å2 surface area (Figure 3b) with the solvent accessible area being 102.0 Å2. Salicylic acid is making total six electrostatic interactions, five with [O] and [OXT] and one with Phe[O]. Salicylic acid [OXT] is forming bonds with asparagine (N397), tyrosine (Y395) and lysine (K490) at a distance of 2.8, 2.8 and 3.3 Å, respectively. Similarly, salicylic acid [O] is forming bonds with lysine (K490) and arginine (R402) at a distance of 2.5 and 2.8 Å, respectively.
Identification of ligand bound to Vic_CAPAN.
The hydrophobic cavity at N-terminal also showed an unknown density which we were not able to identify because of relative low resolution and partially developed density. We tried fitting all possible ligands and, based on cryoprotectant used, PEG appeared to fit partially. However, when we placed lauric acid at the same place, it fitted much better without any negative density after three cycles of refinement suggesting the possibility of some metabolite or ligand similar to lauric acid bound in the cavity. Moreover, the density lies in the same pocket which accommodated pyroglutamate  in SM80.1. The ligand omit map of Vic_CAPAN also revealed an extra density at the same position resembling that of lauric acid in N-terminal and a density for salicylic acid in C-terminal pocket (Supplementary Figure S4a,b). As mentioned previously, we were able to see the density in both the monomers of Vic_CAPAN.
Possible metabolite binding in orthologous vicilin-binding pocket
On careful observation, it was noted that the residues interacting with ligand in the C-terminal pocket of both SM80.1 and Vic_CAPAN were the same. To further investigate whether these residues are conserved among all known vicilins, we did multiple sequence alignment of the available vicilin sequences from different families. It was found that at least two of the four amino acids — asparagine at 397 position and arginine or lysine (both being positively charged) at 490 position — were conserved in all vicilins from angiosperms (Figure 7). Multiple sequence alignment with gymnosperms revealed that only one gymnosperm family, Pinaceae (representative plants — Pinus koraiensis and Picea glauca) — has these residues conserved. However, for Zamiaceae family (representative plant — Zamia furfuracea), arginine at position 490 is conserved, and aspartic acid is present at position 397 instead of asparagine. Tyrosine at position 395 was found conserved in Araucariaceae family, which is represented by Araucaria angustifolia in the multiple sequence alignment. It is concluded that the C-terminal β-barrel pockets of vicilins are inevitably made for binding of ligands similar to salicylic acid and acetate. A phylogeny tree (Supplementary Figure S6) made using the sequences of vicilins from different families reveals that despite significant differences in the primary sequences of this protein, there is subtle difference in the amino acids present in the C-terminal-binding pocket of the protein.
Multiple sequence alignment of amino acids present in C-terminal pocket of vicilins from different families.
Comparison of binding pocket with other known vicilin structures
On the analysis of the electron density in the binding pockets of other vicilin structures downloaded from the protein data bank, it was observed that there existed uninterpreted electron density in these pockets. On some occasions, buffer reagents were modeled in the pocket density. Among them, vicilin structures from adzuki bean-2EAA  and eggplant-5CAD  have acetate in the same pocket that accommodates salicylic acid in Vic_CAPAN. Korean pine-4LEJ , peanut-3S7I  and pecan-5E1R  vicilin have phosphate, sodium and (4S)-2-methyl-2,4-pentanediol in the C-terminal-binding pocket, respectively. Vicilin from soybean  has a positive density which is left empty in the same pocket. Recently, the reported structure of canavalin (PDB ID: 6CB4) also identifies benzoic acid in the C-terminal pocket of the structure. Similarly, unoccupied or electron density justified by buffering reagent was also observed for the other binding pocket present at N-terminal domain of other vicilins. Electron density in the N-terminal pocket of adzuki bean (2EAA and 2EA7) and peanut is partially satisfied with water and is left empty in the case of soybean (1IPK) and pecan. Korean pine and eggplant have glycerol and pyroglutamate at this position.
It is interesting to observe that two different ligands bind in structurally similar pockets of Vic_CAPAN. On superimposition of both the cupin fold of Vic_CAPAN on each other, it was observed that they aligned on each other with RMSD of 4.146 Å and the position of ligands in binding pocket overlaps. Similarly, on alignment of 5CAD with Vic_CAPAN revealed that pyroglutamate position overlapped with a minimal deviation from the position of salicylic acid-binding site. Both the ligand-binding pockets present in Vic_CAPAN are predominately made of hydrophobic residues making it a hydrophobic pocket to accommodate ring-like structures. However, there are few charged residues in the hydrophobic pocket which interact with metabolites and help in its stability. Binding of salicylic acid in this pocket is contributed by hydrogen bonding of carboxylic functional group and charged residues (Y,N,K,R) of Vic_CAPAN. Similar interactions are seen with acetic acid present in 5CAD binding pocket.
Traditionally, proteins with known functions were focus of structural explorations, as this could provide molecular insights into the mechanism of the function. Structural insights of the functionally ill-defined proteins derived from their native source can help decipher their functions. Towards this end, the structure of a storage protein Vic_CAPAN from capsicum was determined. The structure revealed a bicupin fold with two conserved cupin domains on both the termini of polypeptide chain as reported in 7S globulin structures from different families . Considering the high resolution of our data, we were able to identify a few differences in the sequence of Vic_CAPAN when compared with the genomic sequence of Zunla-1 species .
The copper-binding site present in Vic_CAPAN is similar to that of azurin  and other proteins which actively participate in redox reactions. Many proteins that actively participate in electron transfer (e.g. azurin or plastocyanin) or in enzymatic reactions (superoxide dismutase and nitrite reductase) are known to undergo oxidation–reduction reaction between Cu(I) and Cu(II) oxidation state . The presence of copper in Cu(I) state indicated that Vic_CAPAN could participate in redox reaction essential for physiological function. Eventually, we were able to identify Vic_CAPAN SOD activity. The copper center of CuZnSOD enzymes is already well studied . While known SODs have four histidines as part of the copper-binding site, the corresponding site of Vic_CAPAN has two histidines, one tyrosine and one cysteine residue among which histidine could be a mediator of SOD activity in Vic_CAPAN. The presence of extra density (resembling that of a water) next to tyrosine could work as the active site for Vic_CAPAN as reported previously in other SOD structures . The proposed mechanism of Vic_CAPAN is given in Figure 8. SODs are known to have a crucial role in the elimination of (reactive oxygen species) produced during the metabolic process or external stimulus. Vicilins could have a role in eliminating the increased oxidative stress during germination when there is an increase in the rate of metabolic activity for the seed to grow. Apart from this, seeds are also subjected to different environmental stresses and vicilins can then help these seeds to sustain such environmental stresses.
Schematic illustration of the proposed mechanism involved in elimination of oxygen radical.
Previously reported two structures of 7S globulins stated the presence of acetate in the N-terminal cupin fold [28,29]. Structural analysis of Vic_CAPAN also identifies a ligand, possibly a metabolite, in the hydrophobic core of the protein which is consistent with previously reported SM80.1 structure and adzuki bean 7S globulin structure. These bound metabolites gave possible insights into the function of vicilin in metabolic pathways during germination. The presence of bound salicylic acid, observed in the present study, indicates that vicilins are likely to have role in different physiological pathways. It is the first time that salicylic acid is identified bound to a vicilin in the crystal structure and its presence has been confirmed through mass spectrometry. Salicylic acid is a seven carbon phenolic compound found in abundance within plant cell in both free and conjugated forms . These phenolic compounds are synthesized through phenylalanine or isochorismate .
Salicylic acid is known to be a vital plant hormone which modulates various signaling pathways and provides defense against biotic and abiotic stresses [37–39]. One such pathway which salicylic acid is known to modulate is the antioxidant pathway in plants. In a study, it was found that exogenous application of salicylic acid rescued seeds from salinity stress and mutants of Arabidopsis which were deficient in the sid2 gene (isochromate synthase), germinated slowly . A detailed study revealed that salicylic acid rescues seeds from abiotic stress by enhancing antioxidant capacity of seeds . Another independent study done on Arabidopsis seed shows that there was increased expression of MnSOD on the exogenous application of salicylic acid to germinating seeds . It has further been reported that most of the abiotic tolerance salicylic acid provides to germinating seedlings through enhancing pathways involved in scavenging reactive oxygen species generated during such stress.
Salicylic acid is also known to bind antioxidant enzymes like carbonic anhydrase , catalase  and ascorbate peroxidase . Recent studies also reveal that salicylic acid binds strongly to salicylic acid-binding protein-2 and stimulates its lipase activity . The presence of salicylic acid in Vic_CAPAN suggests that salicylic acid could modulate the function of vicilins in the context of its SOD activity. We further demonstrated that vicilins from other sources lack SOD activity, emphasizing on the fact that SOD activity could be a co-operative effect of salicylic acid binding to Vic_CAPAN. Previously reported studies on salicylic acid binding to proteins like catalase and thus their inhibition of enzymatic activity also depict the allosteric role of salicylic acid . It is important to note that just like salicylic acid, pyroglutamate is also related to the oxidative stress of cell. Pyroglutamate is an intermediate product formed during catabolism of glutathione . Binding of two metabolites that are directly or indirectly involved in the maintenance of the oxidative state of the cell indicates that Vic_CAPAN could be an eminent partner in regulating such physiological processes. It is well established that vicilins are present in different isoforms  and are linked with various functional characteristics. These findings suggest that vicilins could be paralogous in nature which can perform different functions in various forms in their respective species.
Both of our findings, SOD activity through copper-binding site and binding of salicylic acid which is the modulator of the oxidative system in the cell, indicate that vicilins are far more important proteins than being only nutritional reserves. Comparative structural studies of the binding pocket of vicilins present in protein data bank also point out that almost all reported vicilins have an extra density in this pocket. These densities could accommodate ligands similar to salicylic acid or similar metabolites which can further modulate the activity of vicilins. In fact, such sites are often occupied by buffer reagents, implying that they are potent sites for small ligands which could be relevant metabolites during protein in action. Findings of the present study reveal first crystal structure of native vicilin from C. annuum bound with salicylic acid and copper ion in Cu(I) oxidation state. This was followed with the identification of SOD activity of Vic_CAPAN, and furthermore, it was analyzed that the copper site and binding of salicylic acid to Vic_CAPAN could be responsible for its SOD activity. In addition, it was explored that the binding residues — asparagine and arginine — were conserved among all vicilins for salicylic acid. Cumulating this fact with lack of SOD activity in other purified vicilins, we can postulate that salicylic acid has an eminent co-operative effect in modulating Vic_CAPAN activity. The finding suggests that the members of 7S globulin can bind to different ligands which could modulate their activity opening up exciting avenues of functions performed by members of 7S globulins.
The database deposition code was found in the present study (accession code: PDB ID: 5YJS).
D.M.S. and M.S. conceived and designed the experiments. D.T.N. and M.S. solved the structure. M.S. performed the biochemical experiments. D.M.S., D.T.N. and M.S. wrote the manuscript.
We thank the EMBL staff Dr Hassan Belrhali and Dr Babu A. Manjasetty for providing support on the beamline and EMBL-DBT for providing access to the BM14 beamline at the ESRF. We thank Dr Sasha Popov for providing support during XANES data collection at ID30B beamline. We thank mass spectrometry facility and staff at RCB and SCIEX. We also acknowledge the financial support from the Department of Biotechnology, Government of India and the Council of Scientific & Industrial Research (CSIR) for fellowship. We thank Dr Abha Jain, Dr Tushar K. Maiti and Dr Nirpendra Singh for their suggestions during experiments.
The Authors declare that there are no competing interests associated with the manuscript.