The reader ability of PHD fingers is largely limited to the recognition of the histone H3 N-terminal tail. Distinct subsets of PHDs bind either H3K4me3 (a transcriptional activator mark) or H3K4me0 (a transcriptional repressor state). Structural studies have identified common features among the different H3K4me3 effector PHDs, including (1) removal of the initiator methionine residue of H3 to prevent steric interference, (2) a groove where arginine-2 binds, and (3) an aromatic cage that engages methylated lysine-4. We hypothesize that some PHDs might have the ability to engage with non-histone ligands, as long as they adhere to these three rules. A search of the human proteome revealed an enrichment of chromatin-binding proteins that met these criteria, which we termed H3 N-terminal mimicry proteins (H3TMs). Seven H3TMs were selected, and used to screen a protein domain microarray for potential effector domains, and they all had the ability to bind H3K4me3-interacting effector domains. Furthermore, the binding affinity between the VRK1 peptide and the PHD domain of PHF2 is ∼3-fold stronger than that of PHF2 and H3K4me3 interaction. The crystal structure of PHF2 PHD finger bound with VRK1 K4me3 peptide provides a molecular basis for stronger binding of VRK1 peptide. In addition, a number of the H3TMs peptides, in their unmethylated form, interact with NuRD transcriptional repressor complex. Our findings provide in vitro evidence that methylation of H3TMs can promote interactions with PHD and Tudor domain-containing proteins and potentially block interactions with the NuRD complex. We propose that these interactions can occur in vivo as well.
Serving as a central hub and carrying a variety of histone post-translational modifications (PTMs), histone N-terminal tails orchestrate the recruitment of protein complexes that change the chromatin landscape at the target gene loci [1–5]. A combination of different PTMs on histone tails, including methylation, acetylation, and phosphorylation, has been termed the ‘histone code’ [1,5]. Single or combinatorial histone codes dictate the formation of specific chromatin-associated protein complexes that are involved in chromatin remodeling and transcriptional regulation. Histone H3K4me3 is found at the transcription start site (TSS) of almost every active gene, and is therefore recognized as a transcriptional activation mark . Transcriptional initiation encoded by H3K4me3 is accomplished by the direct recruitment of reader modules from the transcription machinery and chromatin remodeling complexes [7–9]. There are two major H3K4me3 recognition module families: the royal superfamily of Chromo and Tudor-domains, and the PHD-finger family [10–14]. This study is focused on the PHD-finger superfamily because of its unique ability to recognize the tip of histone tails in a sequence-specific and modification-specific manner.
PHD fingers are characterized by a conserved zinc-coordinating Cys4-His-Cys3 motif . On one hand, PHD fingers exhibit high sequence variability, providing various binding specificity towards H3K4me0 and K3K4me2/3 as the two major PHD ligand classes [7,16–19]. On the other hand, PHD fingers share two adjacent ligand-binding sites that help stabilize the positively charged side chains of R2 and K4 within the histone H3 sequence (ARTKS) [16–19]. A clear example of this can be observed in BPTF (PHD), whose R2 is anchored in place by hydrogen bonds and electrostatic interactions with negatively charged D27 while its K4me3 is stabilized by van der Waals and cation-π interactions within an aromatic cage . Notably, the free N-terminal amine forms a pair of hydrogen bonds with adjacent backbone carbonyls, enhancing the binding specificity of BPTF–H3K4me3 . The free N-terminus of histone H3 is generated by enzymes that facilitate the cleavage of the initial methionine (iMet) .
Initiator methionine cleavage is a co-translational process that is mediated by the methionine aminopeptidases (MAP). There are two MAPs in yeast and humans, MAP1 and MAP2 [21,22]. Around two-thirds of proteins in the human proteome undergo initial methionine cleavage. The specificity of this process is determined by the size of the amino acid that is immediately adjacent to the iMet . The initiator methionine is usually cleaved when the next residue has a small side chain: glycine, alanine, serine, threonine, cysteine, proline, and valine. Histone H3 has alanine at the N-terminus; therefore, its iMet is cleaved, generating a free N-terminal amino group that stabilizes the H3–PHD finger interactions.
Non-histone proteins with a histone-like sequence motif are called histone mimics. These proteins have the ability to mimic the H3 amino tail's interaction with modification-specific recognition modules [24–26]. One example of these mimics is the histone H3 methyltransferase G9a, which carries a 163ARKT166 sequence that is similar to the 7ARKS10 of histone H3. Similar to H3K9, the K165 residue can be self-methylated by G9a, and K165me2/3 interacts with the H3K9me2/3 effector HP1 [25,27,28]
Another example is the nonstructural protein 1 (NS1) of the influenza A virus. NS1 has a C-terminal 226ARSK229 sequence that resembles the N-terminal 1ARTK4 of histone H3. This C-terminal sequence of NS1 interacts with the human PAF1 transcription elongation complex (hPAF1c), suppressing antiviral gene expression by blocking hPAF1c-mediated transcription elongation . The dimethylated K229 of NS1, like dimethylated H3K4, interacts with the double chromodomains of chromatin remodeling protein CHD1 , and the CW domain of the MORC3 ATPase . In addition, we have identified an internal H3-like 173ARTK176 motif in PRMT6 that can be automethylated at the R residue. Notably, all the H3-mimicry proteins discussed above do not have the histone mimetic sequence at their N-termini, which could potentially undermine their ability to hijack PHDs that prefer binding the H3 N-terminal tail, and require iMet cleavage of their ligands.
This study seeks to address whether there are proteins in the human proteome that have an H3-like N-terminal motif and whether they can interact with known H3K4me3-reader proteins in a methylation-dependent manner, which could expand the PHD interaction network. H3 N-terminal mimicry proteins (H3TMs) are defined as proteins that start with an iM-Z1-R2-X3-K4 sequence motif. In this motif, Z1 represents one of the seven smaller amino acids that allows for iMet cleavage, and X3 represents any amino acid. First, we searched the database for proteins that harbor this relaxed motif when it is positioned just after the iMet residue, and we identified 48 H3TMs. We chose seven of these H3TMs as candidates for further in vitro validation experiments. Next, we used an in-house protein domain microarray, which harbors roughly 200 potential methyl-effector domains, to identify trimethylation-specific interaction profiles between the seven H3TMs and eight arrayed PHD fingers (and one Tudor domain-containing protein). We validated a subset of these interactions by peptide pull-down of recombinant GST-fusion proteins and their endogenous counterparts from total cell lysates. Using the same pull-down approach, we further show that a number of the unmethylated H3TMs interact with the NuRD complex. We also explored the possibility that the K4 site of H3TMs could be post-translationally modified by three known H3K4 methyltransferases: Set7/9, PRDM9 and the MLL1 complex. Furthermore, structural studies reveal how the PHD of PHF2 interacts with one of the methylated H3TMs (VRK1). This body of work demonstrates that K4 methylation of H3TMs promotes a new interaction network with PHD fingers, and potentially blocks their interactions with the repressive NuRD complex.
Materials and methods
Generation of methyl reader domain array and peptide synthesis
The current methyl reader domain array contains 108 PHD, 40 Tudor, and 31 Chromo domains. Protein domain microarray generation and probing have been published . Briefly, recombinant proteins were synthesized and cloned into pGEX-4T-1 vector by Biomatik Corporation. These GST-fused reader domains were subsequently expressed, purified, and spotted in duplicate onto nitrocellulose-coated glass slides (Oncyte Avid slides, Grace Bio-Labs) using a pin arrayer (Aushon 2470, Quanterix). For probing, microarray slides were blocked with 3% milk, 3% bovine serum albumin, 0.1% Tween 20 in PBS.
Methylated and unmodified forms of the following peptides were supplied by CPC Scientific Inc (China): VRK1 (2–13): PRVK(me3)AAQAGRQS-Biotin, BCL11B (2–13): SRRK(me3)QGNPQHLS-Biotin,
TSHZ1 (2–13): PRRK(me3)QQAPRRSA-Biotin, HIVEP1 (2–13): PRTK(me3)QIHPRNLR-Biotin,
PCLAF (2–13):VRTK(me3)ADSVPGTY-Biotin, MAPK8(2–13): SRSK(me3)RDNNFYSV-Biotin,
MEF2C(2–13):GRKK(me3)IQITRIMD-Biotin, HIRIP3(2–13): AREKme3EMQEFTRS-Biotin,
Histone H3 (1-18):ARTK(me3)QTARKSTGGKAPRK-biotin.
Biotinylated peptides were pre-labeled with streptavidin-Cy3 fluorophore (GE Healthcare) and incubated with the blocked array slides. Slides were then washed with PBST and allowed to air dry. Fluorescent interactions were visualized using a GenePix 4200A Microarray Scanner (Molecular Devices) and quantified using the Mapix software platform (Innopsys Inc).
Peptide pull-down assays
Biotinylated peptides (6 µg) were immobilized on 25 µl of streptavidin beads (Sigma, St Lous, MO, U.S.A.) in 500 µl of binding buffer (50 mM Tris-HCL pH 7.5, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 0.5% NP-40) at 4°C for 1 h. The beads were washed with binding buffer three times and incubated with 10 µg of GST fusion protein or 30 µg 293T cell lysates for 1 h with rotation at 4°C. After three washes with binding buffer, the beads were boiled in protein loading buffer.
Western blot analysis was performed as described previously . Primary antibodies against PHF2 and MTA1 were purchased from Cell Signaling (Danvers, MA). The anti-ING2 antibody was purchased from Proteintech Group (Chicago, IL). The anti-GST antibodies was purchased from Covance (Berkeley, CA). The anti-SPIN1 antibody and RP-conjugated streptavidin antibody were purchased from ThermoFisher Scientific (Rockford, IL).
In vitro methylation assay
The GST-Set7/9 was expressed and purified, as described previously . The MLL1 complex and PRDM9 protein were purchased from Reaction Biology Corp (Malvern, PA) and Active Motif (Carlsbad, CA), respectively. In vitro methylation reactions were performed in a final volume of 30 µl of 50 mM Tris–HCl (pH 8.5), 5 mM MgCl2, 4 mM DTT, and 0.42 µM 3H-labeled S-adenosyl-L-[methyl 3H] methionine (PerkinElmer Life Sciences) [33,34].
Isothermal titration calorimetry
GST tagged human PHF2 (residues 1–451; pXC816 or residues 1–71; pXC2227) was expressed in BL21(DE3)CodonPlus cells using auto-induction medium . Briefly, an overnight cell culture in MDAG medium was inoculated to ZYM-5052 medium and cultured at 37°C until the OD600 reached to 1. Temperature was changed to 22°C and cells were cultured for 18 h for autoinduction of target protein. Protein was purified as previously described . Two peptides were purchased from BioMatik representing the N-termini of VRK1 (PRV[Kme3]AAQAGRQS) and BCL11B (SRR[Kme3]QGNPQHLS). Isothermal titration calorimetry (ITC) experiments were performed with MicroCal PEAQ-ITC automated system (Malvern). Experiments were conducted at 25°C with a reference power of 10 µcal/s. Nineteen injections of 250 µM of a peptide of VRK1 or BCL11B with initial injection of 0.2 µl followed by eighteen injections (each of 2 µl) were titrated into 20 µM PHF2 (residues 1–451) in 300 mM NaCl, 20 mM HEPES pH 8.0, 0.5 mM TCEP and 5% glycerol. ITC data were fitted as ‘one site’ with the offset subtracted and binding constants were calculated using ITC analysis module supplied by the manufacturer.
Concentrated PHD domain of PHF2 (10–20 mg/ml in 200 mM NaCl and 20 mM Tris–HCl, pH 7.5) was mixed with the VRK1 peptide at an approximate 1 : 1.5 molar ratio (PHD/peptide) for ∼1 h on ice. The Art Robbins Gryphon Crystallization Robot set up 0.4 ml sitting drops (0.2 ml of complex plus 0.2 ml of well solution) at ∼19°C utilizing screens from Hampton Research and Molecular Dimensions. The initial observed crystals were conglomerations of needles and plates. Subsequent crystallization using hanging drop technique at 3.5 M sodium formate and 100 mM Tris–HCl (pH 8.5) generated larger, but were still not single, crystals. The crystals were manually broken into smaller single pieces, picked up in nylon loops and directly flash frozen in liquid nitrogen as high concentrations of organic salts are themselves cryoprotectants (PMID: 20606259).
Crystals were screened and X-ray diffraction data were collected at the SER-CAT beamline 22ID of the Advanced Photon Source at Argonne National Laboratory. Crystallographic datasets were processed with HKL2000 . Molecular replacement was performed with the PHENIX PHASER module  by using the known structure of the PHD domain (PDB ID 3KQI) as a search model. Structure refinement was performed with PHENIX Refine  with 5% randomly chosen reflections for the validation by the Rfree value. The crystallographic temperature B-factors for individual atoms (except for water) were refined anisotropically in final refinement rounds. COOT  was used for the initial manual building and corrections between refinement rounds. Structure quality was analyzed during PHENIX refinements and finally validated by the PDB validation server. Molecular graphics were generated by using PyMol (Schrodinger, LLC).
Bioinformatics search for histone H3 N-terminal mimicry proteins
To investigate whether a certain class of protein might be enriched for H3TMs, we focused on the over 20 thousand proteins that are functionally annotated in the Panther database (http://pantherdb.org/) (Figure 1A). To identify H3 N-terminal mimicry proteins that undergo initial methionine cleavage and contain RxK motifs, we searched for M-Z-R-X-K motif-containing proteins using the Motif Search database (https://www.genome.jp/tools/motif/MOTIF2.html). Z represents alanine, glycine, serine, cysteine, threonine, proline, or valine, which is a critical component of this motif because these seven amino acids are permissive for initiator methionine cleavage [23 41,42]. X represents all 20 possible amino acid residues. We excluded histone H3 variants from the results and identified 257 M-Z-R-X-K motif-containing proteins, which harbored this motif in either the body of the protein (Figure 1B) or at the protein's N-terminus (Figure 1C). About 1/5 of these proteins have the motif restricted to their N-terminus; specifically, 48 of these proteins possessed the motif at their N-termini (Figure 1C and Supplementary Table S1), while the remaining 209 proteins possessed the motif at the internal regions (Figure 1B). Functional annotation shows that the slice of the pie chart represented by ‘chromatin-binding protein’ does not increase if the M-Z-R-X-K motif is internal (Figure 1A,B). However, if the iM-Z-R-X-K motif was restricted to the N-terminus of proteins, there was a dramatic enrichment in the fraction of proteins that are functionally annotated as H3TMs (Figure 1C, blue wedge). These results reveal selective enrichment of H3TMs proteins in the ‘chromatin-binding protein’ class, which indicates that these types of protein (transcription factors and transcriptional co-regulators) may engage the PHD interaction network.
Identification of H3 N-terminal mimicry proteins.
Phylogenetic analysis of the 48 proteins that harbor iM-Z-R-X-K motifs, identified several families of transcriptional factors with conserved N-terminal sequences: MEF2A/B/C/D, SALL1/2/3/4, BCL11A/B, FOG1/2, and TSHZ1/2/3 (Figure 1D). Notably, SALL1, FOG1 [44,45], and BCL11A/B [46,47] are known to interact with the NuRD corepressor complex via their N-termini. We chose to perform follow-up studies on one protein from each of the families, MEF2C, BCL11B, and TSHZ1. VRK1 is a nuclear serine/threonine kinase that phosphorylates several essential proteins including p53/TP53, histone H3, BANF1, and ATF2 [30,48–50]. Importantly, VRK1 K4me1 has been identified using mass spectrometry (MS) implying K4 residue is likely a target for methylation in vivo . PCLAF interacts with PCNA to regulate DNA repair during DNA replication [52,53]. MAPK8 is another serine/threonine kinase that involves cell proliferation, differentiation, migration, and apoptosis [54–59]. HIVEP1 is a zinc-finger protein that binds to the HIV-1 enhancer to activate HIV-1 gene expression . Thus, we selected seven proteins from distinct families of transcription factors/co-regulators as H3TMs candidates for further evaluation, because of their distribution across the phylogenetic tree and their roles in transcriptional regulation: VRK1, BCL11B, TSHZ1, MEF2C, PCLAF, MAPK8, and HIVEP1 (Figure 1E). Then, we screened for potential novel interactions between these candidates and PHD finger proteins.
Identification of H3TMs interactions with methyllysine-effector domains
To identify potential interactions between the seven candidate proteins and methyl-binding domains, we synthesized N-terminal peptides (1–12 amino acids excluding the initial methionine) with and without tri-methylation at the K4 position. We used these biotinylated peptides as probes to interrogate an in-house methyl-reader microarray that harbors GST fusions of known and predicted binders of methyl-lysine and methyl-arginine marked motifs . The current array contains 108 PHD, 40 Tudor, and 31 Chromo domains. As positive and negative controls, the histone H3K4me3 peptide interacted with known H3K4me3-interacting PHD finger domains and Tudor domains, while the unmodified peptide did not (Figure 2, top left panel). Interestingly, each of the seven peptide probes that harbored the K4me3 mark (Figure 2), displayed a unique interaction profile with known H3K4me3-interacting PHD finger domains , and many of these peptides also bound the Tudor domain-containing protein SPIN1, which is a well-characterized H3K4me3 reader . The relative intensities of these interactions on the arrays were quantified (Supplementary Figure S1). Notably, compared with the other probes, VRK1 and BCL11B had a greater number of interacting reader domains. When the K4 position was tri-methylated, both VRK1 and BCL11B interacted with PHF2, TAF3, and KDM7A PHD fingers; however, VRK1 selectively interacted with DIDO1, ING1/2, and MLL5 PHD fingers, while BCL11B selectively interacted with ING3 and PHF23 PHD fingers. The TSHZ1 K4me3 peptide bound the PHD fingers of TAF3 and ING3, and PCLAF only recognized KDM7A. All the PHD fingers listed above have been shown to bind the histone H3K4me3 mark [7,8,19,64–67]. Also, as six of the seven methylated peptide probes interacted well with the arrayed SPIN1 Tudor domains, we included this domain as a positive control in the downstream validation experiments.
Identification of novel methylation-dependent interactions between H3TMs and PHD fingers.
Validation of novel interactions using in vitro interaction assays
We used a peptide pulldown assay to confirm a small subset of the interactions we observed on the protein domain microarrays, and to validate the methyl-dependent nature of these bindings. GST fusions of PHD domains from PHF2, ING2 and ING3, and the Tudor domains of SPIN1 were used to perform pulldowns with biotinylated VRK1, BCL11B, TSHZ1 and histone H3 peptide pairs, which were both methylated and unmethylated. The PHF2 PHD finger strongly interacted with methylated peptides derived from VRK1, BCL11B, and H3; whereas, it weakly interacted with the methylated form of TSHZ1. The ING2 PHD finger only recognized methylated VRK1 and H3 peptides. ING3 showed methylation-dependent interactions with TSHZ1 and H3, but could also interact with VRK1 in a methyl-independent fashion. SPIN1 interacted with VRK1, BCL11B, TSHZ1, and H3 peptides in their K4me3 form (Figure 3A).
Validation of novel interactions using in vitro interaction assays.
Next, we asked if the same peptide sets could be used to pulldown the methyl-effector proteins from whole-cell lysates. We reasoned that the isolated recombinant protein domain may behave differently to the same domain when it is in the context of the full-length endogenous protein. Using antibodies that could recognize endogenous PHF2, ING2 and SPIN1, and whole-cell lysates pulldowns using the VRK1 peptides, we found that endogenous PHF2, ING2 and SPIN1 were indeed enriched with the methylated peptide. The BCL11B methyl-peptide enriched for endogenous SPIN1, and very weakly endogenous PHF2. The TSHZ1 methyl-peptide only enriched for endogenous SPIN1 (Figure 3B).
A number of the H3TMs listed in the phylogenetic tree (Figure 1D), harbor the ability to bind to the NuRD repressor complex, through their N-termini. For example, SALL1 , FOG1 [44,45], and BCL11A/B [46,47] all possess this property, and it is thought methylation at their K4 position will block this interaction just as it does for the interaction between NuRD and Histone H3 . It has been shown that the MTA1 is a core subunit of the NuRD complex, and the C-terminal of MTA1 (454–715) recognizes the N-terminus of BCL11B . In addition, the tandem PHD fingers of CHD4, which is also found in the NuRD complex, binds unmethylated histone H3 tails . We next wanted to determine whether K4me3 has the ability to disrupt the interactions between the NuRD complex and VRK1, BCL11B, and TSHZ1. To do this, we again use the peptides (with and without methylation) to perform pulldowns from whole-cell lysates, and we probed these pulldowns with antibodies against MTA1 and RBAP46/48 (components of the NuRD complex). Indeed, as expected, we find that BCL11B and histone H3 peptides engage NuRD when unmethylated, and so does TSHZ1, which has not been reported before. The unmethylated VRK1 peptide does not have the ability to interact with the NuRD complex (Figure 3B). Thus, just like histone H3, a number of H3TMs can switch from binding activator complexes (in their methylated state) to binding the NuRD repressor complex (in their unmethylated state).
In our pulldown assays, performed from whole-cell lysates, we noticed that substantially more PHF2 was enriched using the VRK1-K4me3 peptide as compared with the H3K4me3 peptide (Figure 3B). PHF2 fragment (residues 1–451) containing both PHD domain and Jumonji domain has been reported to have a binding affinity (KD) of 230 nM for H3K4me3 peptide , and we performed isothermal titration calorimetry (ITC) to determine the binding affinity of the same PHF2 fragment for the methylated N-terminus of VRK1. As suggested by the pulldown, we observe an almost three-fold stronger binding of PHF2 for VRK1, than for H3K4me3 (90 nM vs 230 nM). The binding affinity of PHF2 for BCL11B was much weaker, at greater than 7 µM (Figure 4). Thus, there are a myriad of interactions that can potentially occur with H3TMs, either in their methylated or unmethylated states, and the high binding affinity between the VRK1 K4me3 peptide and the PHF2 PHD domain, suggests a competition model where PHF2 might be recruited to chromatin by H3K4me3, and then handed off to VRK1-K4me3.
Investigating the strength of the interaction between PHF2 and the N-termini of VRK1 and BCL11B.
Structural comparison of PHF2 PHD binding to VRK1-K4me3 and histone H3K4me3
We next sought to understand how PHF2 recognizes the VRK1 tri-methylated peptide. Accordingly, we purified the PHD domain of PHF2 (residues 1–70) and grew co-crystals with VRK1 peptide (residues 1–12). The complex crystallized in space group I222, resulting in a structure determined to a resolution of 1.2 Å (Table 1). Although 12 residues of VRK1 peptide was used for crystallization, we observed the N-terminal seven residues (Figure 5A). Like other structurally characterized PHD domains , VRK1 peptide in an extended conformation lies in an acidic surface of the PHD (Figure 5A), and is inserted antiparallel to the strand β3 of PHD via main-chain hydrogen bonds (Figure 5B), effectively expanded the two-stranded sheet (β3 and β4) to three-stranded β sheet. The structure is highly similar to that of previously determined PHD-H3K4me3 peptide , with a root-mean-square-deviation of <0.5 Å when pairwise superimposed (Figure 5C). The largest difference between the two bound peptides is the interactions with the first residue. In VRK1, the bulkier proline forms van der Waals contacts with two hydrophobic residues Ile21 and Ile45 (Figure 5D). In addition, the main chain carbonyl oxygen of Ile45 interacts with the carbon Cδ of pyrrolidine ring via an O···H-C type hydrogen bond . We note that the secondary amine of the VRK1 proline is not involved in direct interaction with the protein. In histone H3 peptide, the N-terminal amino group NH3 forms two hydrogen bonds with the backbone carbonyl oxygen atoms of Ile45 and Asp46 (Figure 5E).
Structure of PHD domain of PHF2 (residues 1–70) in complex with VRK1 peptide.
|PHF2 with VRK1 peptide .||PDB 7M10 .|
|Cell dimensions (Å)||34.24, 63.72, 103.15|
|α, β, γ (°)||90, 90, 90|
|Unique reflections||33 230||(1839)||(991)|
|Observed reflections||316 530|
|No. reflections||33 196|
|B Factors (Å2)|
|Bond lengths (Å)||0.004|
|Bond angles (˚)||0.8|
|PHF2 with VRK1 peptide .||PDB 7M10 .|
|Cell dimensions (Å)||34.24, 63.72, 103.15|
|α, β, γ (°)||90, 90, 90|
|Unique reflections||33 230||(1839)||(991)|
|Observed reflections||316 530|
|No. reflections||33 196|
|B Factors (Å2)|
|Bond lengths (Å)||0.004|
|Bond angles (˚)||0.8|
*Values in parenthesis correspond to the two highest resolution shells. The shell of 1.19–1.15 Å has ∼25% completeness and its 991 reflections were included in the refinement.
Rmerge = Σ|I − <|>/ΣI, where I is the observed intensity and <|> is the averaged intensity from multiple observations.
<I/σI> = averaged ratio of the intensity (I) to the error of the intensity (σI).
Rwork = Σ|Fo − Fc|/Σ|Fo|, where Fo and Fc are the observed and calculated structure factors.
Rfree was calculated using a randomly chosen subset (5%) of the reflections not used in refinement.
As noted above, the positively charged guanidino group of VRK1 R2 interacts with the negatively charged Glu22 of PHF2 (Figure 5F). VRK1 V3 forms van der Waals contacts with Ile21 and Phe19 (Figure 5G). The trimethylated K4 of VRK1 is bound in a three-sided cage formed by Tyr14, Met20 and Trp29, and capped with Tyr7 of PHF2 (Figure 5H,I). Glu39 of PHF2 forms two weak interactions with A5 (via the main-chain Cα atom) and A6 (via the main-chain amide nitrogen) of VRK1 (Figure 5J). The associated main-chain amide nitrogen and carbonyl oxygen of A5 of VRK1 is also involved in a network of water-mediated interactions (Figure 5J). The hydrophobic Cα-Cβ carbon atoms of VRK1 A6 are in van der Waals contacts with Thr17 of PHF2 (Figure 5K). The last ordered side chain of VRK1 Q7 has a water-mediated intramolecular interaction with main-chain carbonyl oxygen of A5 (Figure 5L) and a van der Waals contact with Val16 of PHF2 (Figure 5M). In sum, PHF2 provides extensive interactions with the first seven residues of VRK1, involving electrostatic, hydrogen bonds, aromatic cage, van der Waals contacts in both side-chain and main-chain atoms.
The K4 residue of three H3TMs can be methylated by H3K4 methyltransferases in vitro
Importantly, VRK1-K4me1 has been identified using MS implying K4 residue is likely a target for methylation in vivo . In addition, three independent MS studies performed by Cell Signaling Technology (CST), and publicly available through the PhosphoSitePlus website (www.phosphosite.org), have also identified mono-methylation at the VRK1-K4 position. All of these reported studies focused on enriching for Kme1 peptides, either using methyl-specific Kme1 antibodies (CST) or using a unique chemical derivation approach followed by antibody enrichment . These studies suggest that VRK1 is indeed targeted for methylation at its K4 position, just like H3K4, and that higher degrees of methylation (me2/3) may occur.
To experimentally test if H3TMs have the potential of being methylated by enzymes that modify the H3K4 site, we set up a number of in vitro methylation reactions. Three different lysine methyltransferases that have been reported to deposit methyl marks at the histone H3K4 position were used: Set7/9 , and MLL1 complex [73,74] and PRDM9 . We performed in vitro methylation using the same set of peptide pairs used in previous experiments. For each in vitro methylation reaction, enzyme and 3H-labeled SAM in the absence of peptide served as a negative control, while H3K4 peptide served as a positive control. We showed that Set7/9 could methylate unmodified peptides from all three candidate proteins VRK1, BCL11B and TSHZ1, but could not methylate the K4me3 peptides, indicating the methylation is focused at the K4 sites (Figure 6). MLL1 and PRDM9 can only methylate VRK1 with moderate activity, compared with H3 peptide, suggesting VRK1 is a weaker substrate than histone H3 for MLL1 and PRDM9 in vitro. These data support the hypothesis the at least a subset of H3TMs can be targeted by both the readers and writers of the histone H3K4me3.
Three H3TMs can be methylated at Lys4 by H3K4 Methyltransferases.
In this study, we proposed the concept of histone H3 N-terminal mimicry and provided in vitro biochemical evidence that H3TM N-terminal K4me3 serves as a molecular switch that engages different protein interaction networks. These findings may dramatically expand the scope of functions that have been bestowed on methyl-effector proteins and the NuRD complex beyond the original histone code hypothesis, by providing evidence that PHD fingers (and the Tudor domain of SPIN1) recognize not only histone proteins, but also non-histone proteins.
For these findings to be relevant, they of course need to also occur in cells and in vivo. However, we did not find direct evidence for the existence of the H3TM-K4me3 methylation state in vivo. This is not through lack of trying. First, using MS (data not shown), we were unable to map the N-terminal PTMs for the six tested H3TM candidates (VRK1, BLC11B, MEF2C, TSHZ1, PCLAF and MAPK8), because, like histone H3, the H3TMs are enriched for lysine or arginine residues in the vicinity of the K4 residue, generating tryptic peptides that are often very short and incompatible with MS analysis. Direct evidence for the occurrence of H3TM K4me3 will likely come from middle-/top-down mass spectrometry studies . Second, we developed methyl-specific antibodies to K4me3 sites on VRK1, BCL11B and TSHZ1 (data not shown). While these antibodies selectively recognize the K4me3 peptides, and not the K4me0 peptides, in dot-blot assays, they were unable to detect these three endogenous H3TMs. This could be due to a number of reasons: (1) the antibodies are just bad; (2) the K4me3 mark occurs at very low levels, which make it difficult for the antibodies to detect; (3) the H3TMs may carry K4me1/2 marks that are not detected by our K4me3 antibodies, and (4) the K4me3 mark on these H3TMs may be cell type specific, tissue specific, cell cycle specific, or development specific, and we have missed them.
Although the cellular context of K4me3 remains unknown, the N-termini of H3TMs are disordered (using https://iupred2a.elte.hu/) as the H3 N-terminus is, implying that the N-terminal regions need to be stabilized by PTMs or by interacting with binding partners . Indeed, H3TMs with unmodified N-terminal tails mimic unmodified histone H3. Several of the 48 H3TM candidates interact with the repressive NuRD complex [43–47,78,79]. Interestingly, the NuRD complex also recognizes the unmodified histone H3 N-terminus, but this interaction is disrupted by H3K4 trimethylation [68,80]. K4me3 prohibits the chromatin-association of the repressive NuRD complex, regulating local chromatin status to facilitate transcriptional activation. Among the 48 candidate proteins, SALL1/2/3/4, FOG1/2, and BCL11A/B are known to interact with the NuRD complex via their N-termini, repressing target gene expressions [43–47,78,79]. However, the mechanism by which these interactions are turned off remains unknown. Among the seven candidates that were used to probe the protein domain microarray (Figure 2), VRK1 K4me1, MEF2C K4ac, TSHZ1 K4ac, and BCL11B S2ph have been identified in various mass spectrometry studies [46,51,81,82]. Notably, BCL11B serine 2 phosphorylation has been shown to negatively regulate BCL11B's interaction with the NuRD complex . Given that the N-terminal tail of histone H3 is heavily modified by various PTMs, such as phosphorylation, methylation, and acetylation, it is likely that the N-termini of these H3TM proteins are also subject to PTMs that positively or negatively regulate their ability to interact with recognition modules.
We have proposed a model that H3TMs bind the repressive NuRD complex in their unmethylated (K4me0) state, which is already established, and in their K4me3 state these same H3TMs directly interact with effectors (protein that harbor PHD fingers and Tudor domains) that have transcriptional activator activity (Figure 7). In cases where the binding affinity of the effector is greater for the H3TM than the for H3K4me3, then these effectors may already reside on active chromatin and then be transferred to the H3TM. If the binding affinities are reversed (higher for H3K4me3 than H3TM-Kme3) then the H3TM may ‘deliver’ the effector to chromatin.
The Model of H3TM interactions.
As search for proteins that harbor N-termini with sequence similarity to histone H3, identified a striking enrichment (5-fold) of fraction of proteins with chromatin binding functions (Figure 1). Many of these H3TM proteins have already be shown to interact with the NuRD transcriptional repressor complex in their unmethylated states. We tested 7 of these methylated H3TMs for their ability to interact with known histone H3K4me3 binders, and they all do, to varying degrees. We thus hypothesize that the 48 H3TMs that we identified will be a major class of interacting proteins for what are traditionally thought of as H3K4me effector proteins.
All the data needed to evalute the conclusions that are drawn in this manuscript are available in the main text. There is no supplemental section for this manuscript. The X-ray structure (coordinates) and the source data (structure factor file) of PHF2 PHD domain with bound VRK1 peptide have been submitted to the PDB under accession numbers 7M10.
M.T.B. is a co-founder of EpiCypher.
M.T.B. is supported by an NIH grant — GM126421, as well as a CPRIT Grant RP180804 that supports the Protein Array & Analysis Core (PAAC). X.C., a CPRIT Scholar in Cancer Research, is supported by NIH R35GM134744 and Cancer Prevention and Research Institute of Texas grant RR160029.
Open access for this article was enabled by the participation of the University of Texas Southwestern Medical Centre in an all-inclusive Read & Publish pilot with Portland Press and the Biochemical Society under a transformative agreement with EBSCO.
CRediT Author Contribution
Mark T. Bedford: Conceptualization, Funding acquisition, Project administration, Writing — review and editing. Jianji Chen: Conceptualization, Formal analysis, Validation, Methodology, Writing — original draft. John Horton: Data curation, Investigation, Methodology. Cari Sagum: Data curation, Methodology. Jujun Zhou: Data curation, Validation, Methodology. Xiaodong Cheng: Resources, Funding acquisition, Validation, Methodology, Writing — review and editing.