Abstract
Transcription Termination Factor 1 (TTF1) is a multifunctional mammalian protein with vital roles in various cellular processes, including Pol I-mediated transcription initiation and termination, pre-rRNA processing, chromatin remodelling, DNA damage repair, and polar replication fork arrest. It comprises two distinct functional regions; the N-terminal regulatory region (1-445 aa), and the C-terminal catalytic region (445-859 aa). The Myb domain located at the C-terminal region is a conserved DNA binding domain spanning from 550 to 732 aa (183 residues). Despite its critical role in various cellular processes, the physical structure of TTF1 remains unsolved. Attempts to purify the functional TTF1 protein have been unsuccessful till date. Therefore, we focused on characterizing the Myb domain of this essential protein. We started with predicting a 3-D model of the Myb domain using homology modelling, and ab-initio method. We then determined its stability through MD simulation in an explicit solvent. The model predicted is highly stable, which stabilizes at 200ns. To experimentally validate the computational model, we cloned and expressed the codon optimized Myb domain into a bacterial expression vector and purified the protein to homogeneity. Further, characterization of the protein shows that, Myb domain is predominantly helical (65%) and is alone sufficient to bind the Sal Box DNA. This is the first-ever study to report a complete in silico model of the Myb domain, which is physically characterized. The above study will pave the way towards solving the atomic structure of this essential mammalian protein.
Introduction
The eukaryotic genome possesses multiple copies of rDNA, which are present in tandem, repetitive sequences that encode pre-rRNA and regulatory elements. These regulatory elements allow site-specific termination of RNA Pol I transcription from yeast to humans. RNA polymerase I (Pol I) mediated transcription is terminated in a polar manner by specialized transcription terminator proteins binding to specific sites, which also prevents the collision of replication forks heading in opposite directions. Terminator proteins that mediate transcription termination have been found in a variety of taxa which includes mammalian TTF1, Nsi1 [also known as yeast transcription terminator1 (Ytt1)] of Saccharomyces. cerevisiae, Reb1 (RNA polymerase I enhancer binding protein) of Schizosaccharomyces pombe, and Rib2 of Xenopus [1]
Transcription Termination Factor 1 (TTF1) is a multifunctional nucleolar DNA-binding protein that is involved in various processes, such as Pol I-mediated transcription initiation [2] and termination [3], pre-rRNA processing, chromatin remodelling, DNA damage repair [4], and polar replication fork arrest [5]. The TTF1 gene is located at the 2; 2 A3 locus on mouse chromosome 2 and at 9q34.13 on the long arm of human chromosome 9. This factor binds in an orientation-dependent manner to the terminator element called Sal box, which is an 18-bp sequence motif- AGGTCGACCAGA/TT/ANTCCG in mouse and 11-bp long sequence - GGGTCGACCAG in humans. TTF1 binding sites are present both upstream and downstream of the rDNA coding region, which brings the terminator and promoter loci in close proximity upon TTF1 binding. This allows the direct transfer of Pol I machinery from 3′ end terminator to 5′ end promoter of the adjacent rDNA unit (Ribomoter model [6].
TTF1 protein consists of three functional domains: 1) an N-terminal regulatory domain (NRD), 2) a trans-activation domain (TAD) or central domain, and 3) a C-terminal DNA-binding domain (Figure 1) [7]. The N-terminal domain (less conserved region between murine and human) is an auto-regulatory domain that is involved in the oligomerization of protein, which masks the DNA-binding activity of TTF1. A specific region ranging from 323 to 445 amino acids in the central trans-activation domain is involved in chromatin-specific functions, including transcription termination, chromatin remodelling, and transcription activation. Additionally, another region ranging from 430 to 445 amino acids is crucial for termination, as its deletion affects termination without compromising its DNA-binding activity [8]. The C-terminal half of the TTF1 protein exhibits a highly conserved amino acid sequence in both mouse and human [9]. It has two conserved Myb/SANT (SWI3- ADA2-N-CoR-TFIIIB) like domains that are involved in DNA binding. This domain shows strong homology with the Reb1 protein of S. pombe and the DNA-binding domain of the proto-onco-protein c-Myb. It contains clustered conserved tryptophan residues, homologous to Reb1p and c-Myb, which are essential for DNA-protein interactions [8]. The last 31 amino acids at the C-terminal end of the TTF1 protein are known to show species specificity [10].
Schematic representation of mouse TTF1 protein
Several studies have demonstrated that different domains of TTF1 interact with various important factors that are involved in numerous regulatory processes. Some important TTF1 interacting protein are Cockayne Syndrome group B (CSB), Murine Double Minute 2 (MDM2), Alternative Reading Frame (ARF: p19ARF in mouse and p14ARF in human), TTF1 interacting protein 5 (Tip5), Nucleosome remodelling and histone deacetylase complex (NuRD), p300/CBP-associated factor (PCAF), Polymerase 1 and transcript release factor (PTRF), and nucleophosmin NPM/ B23 [4,5] Our lab has recently identified DDB1 as a novel interacting partner of TTF1 and its role in UV-mediated DNA damage sensing [4]. The C-terminal region, besides being highly conserved, plays an important role in DNA binding and interacts with various proteins to perform the essential functions required for cell growth and proliferation [5]
Hence, it is crucial to explore the structural aspects of these domains to examine the specific amino acid residues engaged in the interaction with their partners. However, neither an in silico nor a physically determined structure of the individual domains of TTF1 is available to date. Since the Mys domain of TTF1 is critical for coordinating many cellular processes, in the current study, we predicted the in silico model of the Mys domain of mouse TTF1 protein using ab-initio and homology modelling and validated the same experimentally as well. We determined the stability of the structure through a 200 ns molecular dynamics (MD) simulation (in triplicate) in an explicit solvent. Further, for physical characterization, we cloned and expressed the Myb domain of the mouse TTF1 protein; we then confirmed its secondary structure using circular dichroism (CD) and Raman spectroscopy. The computational and biophysical analyses of the Myb domain will provide insight into uncovering the atomic structure, which in-turn will elucidate the mechanisms behind its diverse functions.
Results
Amino acid sequence-based analysis
The amino acid sequence-based physicochemical properties of the Myb domain were computed using the ProtParam server, as listed in Table 1. The isoelectronic point (pI) value of this protein was above 7, suggesting the basic nature of the protein due to a preponderance of basic amino acid residues in its side chain. An instability index of <40 indicates protein stability. The instability index of 30.86 for the Myb domain, suggest that the domain is stable. This is further corroborated by the disorder prediction profile, which shows that the Myb domain is stable when the disorder is less than the threshold value of 0.5 (Supplementary Figure S1). The hydrophobicity of the protein was determined using the GRAVY method, which assesses the ratio of the total number of residues in the protein sequence to the sum of the hydropathy values of the amino acids. The Myb domain exhibits hydrophilic properties, as evidenced by a negative GRAVY value of -0.672.
Physicochemical Properties . | Values . |
---|---|
Number of residues | 183 |
Molecular weight | 21334.80 Da |
Theoretical pI | 10.11 |
Molecular formula | C958H1542N280O262S5 |
Instability index | 30.86 |
Aliphatic index | 82.57 |
Number of negatively charged residues (D+E) | 19 |
Number of positively charged residues (R+K) | 38 |
Extinction coefficient | 43430 |
Grand average of hydrophobicity (GRAVY) | -0.672 |
Estimated half-life (Escherichia coli, in vivo) | >10hrs |
Physicochemical Properties . | Values . |
---|---|
Number of residues | 183 |
Molecular weight | 21334.80 Da |
Theoretical pI | 10.11 |
Molecular formula | C958H1542N280O262S5 |
Instability index | 30.86 |
Aliphatic index | 82.57 |
Number of negatively charged residues (D+E) | 19 |
Number of positively charged residues (R+K) | 38 |
Extinction coefficient | 43430 |
Grand average of hydrophobicity (GRAVY) | -0.672 |
Estimated half-life (Escherichia coli, in vivo) | >10hrs |
Molecular modelling of Myb domain and its validation
To understand the structure of the Myb domain of TTF1, we have predicted computational model of the same. The results of the template search using BLASTp against the Protein Data Bank (PDB) reveal that the highest target-template coverage (Myb domain sequence identity with the template) is 27%, which falls within the ‘twilight zone’ for homology modelling [11]. For the above reasons, the structure of the Myb domain was modelled using SWISS-MODEL (with the Reb1 protein as a template, PDB ID: 5EYB; [1]) and AlphaFold server (Figure 2A,B). Sequence alignments with its homologue, Reb1 protein, are mentioned in the Supplementary File (Supplementary Figure S2). In addition to homology modelling, a de novo model of the Myb domain was predicted using the Robetta and I-TASSER servers (Figure 2C,D). The models revealed that the Myb domain of TTF1 comprises two Myb-like motifs (illustrated in blue and magenta, Figure 2), which are predominantly composed of alpha-helices. Notably, these motifs are entirely devoid of β-sheets, and the remaining structures are composed of random coils. The structural statistics and different parameters obtained to validate the structural reliability using PROCHECK and SAVEv6.0 are provided in Table 2. To assess the quality of the generated structural models, Ramachandran plot analysis was conducted (Figure 2). Marginal variance was observed in the percentage of residues situated within favourably allowed regions for the Myb domain across the AlphaFold, SWISS-MODEL, and Robetta models (Table 2). The residues present in the most favoured region in case of all models of the Myb domain were approximately 90%, except for the I-TASSER-predicted Myb model, where 72.1% of residues were present in the most favoured region. The reason behind above observation could be due to the presence of few residues (4.2%) in the disallowed region for the I-TASSER model, as observed in the Ramachandran plot summary (generated using ProCheck server for structural validation Table 2). The most notable feature among the predicted structural models was the difference in the compactness of the secondary structure. AlphaFold, SWISS-MODEL, and Robetta predicted compact and ordered structures with a very high percentage of α-helical conformations. While the model predicted by I-TASSER was less compact and ordered compared to the models generated by the above servers. All models have very similar and reliable statistics as per the overall SAVESv6.0 results. In summary, the structural integrity and statistics of the models derived from both homology and ab-initio methods showed considerable consistency. This confirms the reliability of Robetta and other models (excluding I-TASSER) for further computational analysis. Considering all the above factors, we selected the Robetta generated de novo model of the Myb domain, to pursue for further computational analysis. Based on the Z-score (-5.07) of the structure measured using ProSA, it was found that the Myb domain has a structure comparable to other PDB structures of similar size (as determined by NMR, Figure 3A). The quality of the predicted model was highly satisfactory, as evidenced by the fact that the predicted Myb domain structure was superimposed on its yeast homologue Reb1 structure, with a root mean square deviation (RMSD) value of 1.467 nm (Figure 3B). Furthermore, we utilized the VADAR 1.8 server to compute the proportion of secondary structures in the modelled Myb domain. The results indicated that the model consisted of approximately 66.2% α-helices, 0% β-sheets, and 33.2% random coils. These findings support the fact that the majority of DNA-binding proteins are helical in nature in order to stabilize the DNA.
Computational models generated for DNA binding domain of TTF1 protein using different modelling servers
Structure validation of energy-minimized Robetta model of Myb domain
PROCHECK Ramachandran plot statistics of Myb domain . | ||||
---|---|---|---|---|
Server name . | SWISS MODEL . | Alphafold . | Robetta . | I-TASSER . |
Residues in most favored region | 90.9% | 89.1% | 93.3% | 72.1% |
Residues in additional allowed region | 8.5% | 10.3% | 6.1% | 21.2% |
Residues in generously allowed region | 0.0% | 0.0% | 0.0% | 2.4% |
Residues in disallowed region | 0.6% | 0.6% | 0.6% | 4.2% |
PROCHECK Ramachandran plot statistics of Myb domain . | ||||
---|---|---|---|---|
Server name . | SWISS MODEL . | Alphafold . | Robetta . | I-TASSER . |
Residues in most favored region | 90.9% | 89.1% | 93.3% | 72.1% |
Residues in additional allowed region | 8.5% | 10.3% | 6.1% | 21.2% |
Residues in generously allowed region | 0.0% | 0.0% | 0.0% | 2.4% |
Residues in disallowed region | 0.6% | 0.6% | 0.6% | 4.2% |
Trajectory analyses
To analyse the structural stability and conformational changes in the Myb Domain model obtained by the Robetta server, MD simulation analysis was performed for 200 ns in triplicate. Various parameters, such as root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (RoG) [12], and solvent-accessible surface area, were calculated from the molecular dynamics (MD) trajectory using the modules available in GROMACS MD engine [13].
The RMSD was analysed to determine the deviation from the initial structure, which was calculated by aligning all the frames in a trajectory with the first frame throughout the simulation. The RMSD plot indicates that the three runs remained stable throughout the simulation (Figure 4A). The RMSD values were observed within the range of 0.5–2 nm for all the three runs (Supplementary Figure S3). To assess the dynamic behaviour and flexibility of amino acid residues, we analysed the RMSF of amino acid residues throughout the simulation run. The RMSF plot for the Myb domain of TTF1 across all three runs showed minimal fluctuations, with values ranging from 0.5 to 2 nm. However, the third run exhibited high flexibility in the N-terminal region of the TTF1 Myb domain, with values reaching up to 2.5 nm. This was expected, as the N-terminal region is naturally exposed to the solvent surface (Figure 4B). Furthermore, we performed solvent-accessible surface area (SASA) analysis to determine the surface area of the protein exposed to the fluidic exterior, which helps predict the stability of the hydrophobic core of the protein. Figure 4C shows that the SASA value of the TTF1 Myb domain for all three runs remained stable within the range of 125–155 nm2. We also determined the radius of gyration (RoG) of the complex to understand the compactness and overall dimensions of the protein structure. The RoG values for the first two runs were observed within the range of 2–2.5 nm, indicating that the protein maintained a compact, folded conformation throughout the simulation. However, in the third run, the RoG value ranged from 2-3 nm, displaying a minor deviation from the first two simulation trajectories (Figure 4D).
Analysis of MD trajectories of triplicate runs for Myb domain of mammalian TTF1 protein
Free Energy Landscape (FEL) analysis
FEL (Free Energy Landscape) analysis was performed using the gmx sham module available in GROMACS [14]. This analysis assesses the Gibbs free energy of the system and identifies the most stable conformational states of the protein. It integrates the radius of gyration and the RMSD variables to reflect the specific properties of the system. In the 2D energy landscape, the centralized blue area represents the complex within a cluster with minimum energy and maximum stability (Figure 5). In 3D projection, the narrow-shaped funnel illustrates the changing conformation of the protein over time, demonstrating dynamic shifts within the system until it settles into a stable structure with low energy [15,16]. The results showed the changes in the Gibbs free energy (ΔG) values ranging from 0 to 11.70 kJ/mol for the first two runs and 0–7.72 kJ/mol for the third run. In Figure 5, the formation of a single narrow folding funnel in the 3D plot illustrates a stable folding process for the first two runs. This, along with the 2D contour plot, indicated that it had one local energy minimum, reflecting a stable folding process. However, in the third simulation run, the trajectory diverged from that of the first two, displaying multiple folding patterns and energy minima, indicating an unstable folding process. The 3D conformational state of the TTF1 Myb domain, as determined from the lowest energy minima across the three independent simulation runs are represented in Figure 5. Overall, the results indicated that the TTF1 Myb domain remained stable throughout the simulation time of 200 ns.
Free energy landscape of Myb domain of TTF1 protein
Close contact map analysis
Residue-residue close contact map analysis identified amino acid residues that interacted with each other during simulation. Residue-residue close contact maps for the lowest energy conformation of the protein obtained from FEL analysis were created using gmx mdmat, with a cutoff distance of 1.5 nm [17]. The tool calculates the inter-residue distance matrix and provides the average close contact between residues. Amino acid residues 140-170 remain in close contact with residues 80-100 for all three runs (Figure 6). Moreover, for the first two runs, residues 80-90 maintained close interactions with residues 10-60, but these contacts were absent in the third run. Likewise, the close contact between amino acid residues 150-165 and residues 15-25 observed in the first two runs was absent in the third run. This difference might account for the deviation in the results of the third run from those of the first two runs.
FEL analysis and contact maps of TTF1 Myb domain
Furthermore, to determine the structural stability of the TTF1 Myb domain, we extracted frames from the first simulation run every 20 ns and aligned each extracted frame using the PyMOL tool. In Supplementary Figure S4, we observed that all frames were aligned with minor deviations and fluctuations, indicating the stability of the TTF1 Myb domain over the simulation time (Supplementary Figure S4).
Purification of TTF1 protein Myb domain
The codon-optimized mouse Myb domain of TTF1 was amplified and cloned into the bacterial expression vector pET28a, as described in the Methods section (Figure 7A). The induced protein was purified using Ni-NTA-based affinity chromatography (Supplementary Figure S5) followed by size exclusion chromatography (as mentioned in the Materials and Methods section). Protein purity was confirmed by SDS-PAGE analysis and homogeneity by Dynamic light scattering (DLS). The purified protein was concentrated to 1 mg/ml (Figure 7B). From the elution profile (Figure 7C) and DLS data (data not shown), it was confirmed that it was highly homogeneous and mono-dispersed, respectively.
Cloning and purification profile of the Myb domain of mouse TTF1 protein
Electrophoretic Mobility Shift Assay (EMSA)
Myb domains of various transcription factors are involved in DNA-binding activity. Since TTF1 is a transcription factor, we wanted to determine whether the Myb domain alone can bind to DNA. For this, we carried out an electrophoretic mobility shift assay (EMSA), which is a well-established and widely used method to confirm DNA-binding activity. The DNA–protein complex was allowed to form using a fixed quantity (100 ng) of the Sal box DNA sequence and progressively increasing amount of purified Myb domain protein. Upon resolving the protein–DNA complex on a 10% native polyacrylamide gel, it was observed that increasing the amount of protein stoichiometrically shifted the Sal box DNA. The results confirmed that the Myb domain alone was sufficient to bind to the Sal box DNA sequence (Figure 8).
DNA binding activity analysis of Myb domain
Circular Dichroism spectroscopy confirms Myb domain has an alpha helical nature
By analysing the computational model of the Myb domain, we observed that it was predominantly helical in nature. To confirm the same, circular dichroism (CD) analysis was performed. CD spectroscopy is a well-established technique for elucidating proteins’ secondary structures. It has various applications in structural biology, such as secondary structure determination, protein aggregation, and protein folding. CD spectra in the far-UV region (240–180 nm) corresponded to peptide bond absorption, which could determine the presence of secondary structural elements such as helices, sheets, and turns. CD spectroscopy is frequently employed as a confirmatory method for determining secondary structures because it is quicker and requires minimal resources. The mean residue molar ellipticity, [θ], in degrees cm2 dmol−1, has been calculated from; [θ] = [θ]obs × MRW/(10cl) where [θ]obs is the measured molar ellipticity in degrees, MRW is the mean residue molecular weight, c is the protein concentration in grams per millilitre, and l is the optical path length in centimetres.
CD spectra of the Myb domain were collected in 25 mM Tris buffer (pH 7.5) at 21°C. The profile shows a clear maximum at 192 nm and two slightly pronounced peaks at 210 nm and 222 nm. After spectral de-convolution analysis (as mentioned in the materials and methods section), the secondary structure estimated the percentages of alpha helices (64.3%), beta sheets (0%), random coils (35.7%), and turns (0%) (average of three runs, Figure 9A,B). We also calculated the ratio of ellipticity at 222/208 nm; which yielded a value of 0.94. This value suggests that the protein structure is helical in nature [18]. The above results confirmed that the Myb domain is predominantly helical in nature. These results are in agreement with the results of our computational model and the simulation data of the Myb domain.
Secondary structure analysis of Myb domain by CD spectra
Raman spectra analysis of Myb domain
Raman spectroscopy is a non-destructive technique that provides comprehensive data on chemical structure, phase and polymorphy, crystallinity, and molecular interactions. The strength and wavelength positions of the scattered Raman light are displayed by the number of peaks in the spectrum. Every peak, including specific bonds like C-C, C = C, N-O, and C-H, is associated with a particular chemical bond vibration. Hence, the secondary structures of proteins and peptides can be effectively derived using Raman spectroscopy. Raman spectral profile is a compilation of signals from various conformations of the protein molecule in solution, offering an ‘instantaneous snapshot’ of the population. Considering above towards characterizing the secondary structure of the purified Myb domain of TTF1 and to comprehend the population heterogeneity, we used Raman spectroscopy. Raman spectrum of the buffer (control) is displayed in black, whereas that of the Myb domain is displayed in red (Figure 10A).
Raman Spectrum profile and analysis of secondary structure for Myb domain of TTF1 protein
In the spectra, we found various intense peaks which are associated with aromatic amino acids, such as peak at 1000 cm−1 is associated with phenylalanine amino acid and that at 1341 cm−1 is associated with tryptophan amino acid. The three signals of amide I 1600–1690 cm−1 (stretching vibration of C=O), amide II 1480–1580 cm−1, and amide III 1230–1300 cm−1 (both related to paired C-N stretching and N-H bending vibrations of the peptide group) are of particular importance for the detection of various protein backbone confirmations. Raman spectroscopy of protein’s secondary structure has adopted the same strategy that is used in analysis. It has concentrated on the relationship between the positions of the amide I and amide III vibrations and the proportion of each secondary structural element in the protein that has been determined through crystallography. The wavenumbers for the amide I and III modes in α-helix and β-sheet structures are typically found to be within the ranges of 1662–1650 and 1272–1264 cm−1 (for α-helix) and 1674–1672 and 1242–1227 cm−1 (for β-sheet), respectively. The amide I band, which is mostly formed by C1/4 O stretching with minor contributions, is located within the 1600–1690 cm−1 spectral range. The vibrational coupling between the movements of the peptide carbonyl groups arranged in an ordered secondary structure by hydrogen bonds produces a spectrum resulting from the amide I modes. The examination of amide I can provide point-by-point data on the underlying compliance of proteins under physiological circumstances (Figure 10B). We described and examined the amide I spectrum in great detail (Figure 10) for the Myb domain because it has the potential to significantly characterize the secondary structure of a protein [19–21].
Curve fitting for the amide I region and de-convolution studies
The majority of the C=O stretching was attributed to the amide I region, while the N-H plane deformation component was present in the amide III band. As previously mentioned, our study focused on the amide I peak, as the overlapping side chain vibrations make it challenging to analyse the amide III band. To fully comprehend the secondary structure, we carried out the curve fitting process for the amide I region of the Myb domain. By fitting the measured data into an analytical equation and interpolating between discrete values, we can derive a finite-difference approximation and obtain the maximum or minimum value. The amide I centre is approximately 1650 cm−1 in α-helical proteins or peptides and functions as a marker band for the ordered α-helix contribution in Raman spectroscopy. Precisely, the band was split into two main parts related to the random coil and alpha helix configurations. Two curves were considered to fit the amide I band: the first at 1650–1656 cm−1 was assigned to alpha helices, while the second at 1680 cm−1 was linked to random coils. The fitting curves (in amide I) of the Myb domain indicate that the majority fits in the region representing alpha-helices (aqua blue area under the red curve), whereas random coils make up the rest of the curve (Figure 10B). These findings align with our computational model and CD spectroscopy data, indicating that the Myb domain is predominantly alpha-helical in nature.
Discussion
Transcription Termination Factor 1 (TTF1) is a multifunctional nucleolar protein which partakes in several cellular processes, such as Pol I-mediated transcription initiation, termination, r-RNA processing, chromatin remodelling, polar fork arrest, DNA damage, etc. It stops transcription of the 47S pre-rRNA by binding to several loci flanking the rDNA coding regions. However, TTF1 also interacts with loci located upstream 47S pre-rRNA promoter, mediating both kind of action; either activating the promoter through nucleosome displacement or silencing rDNA by promoting methylation of the promoter region. Both of these actions are catalysed upon interaction with the ACF/ISWI-related ATP-dependent remodelling complex NoRC. The interaction of NoRC with TTF1 and stimulation of promoter methylation were found to be increased by the presence of a noncoding RNA (ncRNA) [22]. TTF1 function has also been reported to be regulated by a tumor growth inhibitory factor ARF. ARF binds to the Nucleolar Localization Signals (NoLSs) located in the short N-terminal repeat motifs of TTF1, hence, inhibiting the NoLS function. Suppression in NoLS function prevents NPM1 chaperone based nucleolar translocation of TTF1 thereby, inhibiting its function in the nucleolus [23].
Owing to the importance and multifunctional nature of TTF1, we previously proposed an ab-initio model of the full-length mammalian TTF1 protein [24]. The model revealed that the Myb domain of the protein is a major DNA-binding region. We and others earlier tried to purify both human and mouse full-length TTF1 but were unsuccessful in obtaining homogenous functionally active protein. The full-length protein elutes as aggregates, making it difficult to characterize. As mentioned above, our earlier computational model showed severely disordered regions in the N-terminal domain of the protein [24]. For the above reasons, we moved ahead towards cloning and characterizing the DNA-binding Myb domain of mouse TTF1. To gain structural insight, we have constructed a 3-D model of the Myb domain via homology and ab initio modelling using different servers such as the Swiss Model, AlphaFold, Robetta, and I-TASSER. The generated models were then analysed for its stability using Ramachandran analysis and MD simulation, which confirmed that the Robetta provides an excellent model (Figure 2C), as the Ramachandran plot showed very few residues in the prohibited area. The model was comparable to the DNA binding domain structure of the Reb1 protein (crystal structure data) [1] and had a very low RMSD value (1.467 nm), proving the good quality of the model. The model reveals four helix-turn-helix (HTH) motifs in the Myb domain (Figure 2), which is similar to the DNA-binding domain of its homologue Reb-1 (two MybADs and two MybRs). This provided us with the desired confidence in the model, suggesting that we are heading in the right direction towards the characterization of this important DNA-interacting domain. Secondary structure prediction shows that the protein is helical in nature (alpha helix ∼65%, Figure 2), which seems to be conserved in Rtf1 of Schizosaccharomyces pombe, Reb1p of Saccharomyces cerevisiae, Nsi1 (Ytt1) of Saccharomyces cerevisiae, and Rib2 of Xenopus tropicalis [1]. To validate the computational model, the Myb domain was cloned into an expression vector and purified to homogeneity (> 95% purity, Figure 7B) through affinity chromatography, followed by gel filtration chromatography. The SDS-PAGE and DLS profiles confirmed the above findings (Figure 7B). Furthermore, the CD spectra and Raman spectroscopy results showed that the Myb domain is a helical protein with approximately 65% alpha helices and 35% random coils (Figure 9A,B), which is in agreement with the computational models (Figure 2). De-convolution of the CD and Raman spectra confirmed that the Myb domain is predominantly helical in nature, which explains its DNA-binding capability. From the existing literature, we can conclude that most of the DNA-binding domains (Leucine zippers and HTH Motifs) are helical in nature, which allows DNA to get locked in the major grooves for further activities [25]. Biochemical data [3,8] have shown that TTF1 binds to Sal box elements. To check whether the Myb domain alone is sufficient for this activity, we performed an EMSA with the purified protein and Sal box DNA oligos. A shift in the protein and DNA bands confirmed the successful formation of this complex (Figure 8). The above results prove that the Myb domain alone is sufficient for DNA binding. As has been observed for other transcription factors, various domains are involved in different regulatory functions.
Hence, our work is the first attempt to characterize this essential domain structurally, opening up possibilities for further investigation. Earlier, obtaining the homogeneously purified functional TTF1 protein was a rate-limiting step towards solving the atomic structure of this essential protein. Since we have purified the functional Myb domain of TTF1, we are moving ahead towards solving the atomic structure of this protein either by crystallography or NMR. Solving the structure will provide insights into the mechanism by which this domain engages DNA for various activities. Also, our lab is ambitiously exploring the possibilities to purify the full-length protein in order to understand the regulatory mechanism of this essential transcription factor, which is also engaged in regulating DNA replication and DNA damage repair.
Materials and methods
Sequence retrieval and sequence analysis
The Myb Domain sequence of Mus musculus TTF1 was retrieved from UniProtKB (Accession Number: Q62187), in the FASTA format (Supplementary Box S1). The physicochemical features of the Myb domain were observed and compared using Expasy ProtParam (https://web.expasy.org/protparam/). The molecular weight, instability index, isoelectric point (pI), and grand average of hydropathicity (GRAVY) values corresponding to each domain were compared. To evaluate the sequence of the Myb domain, the disorder profile was analysed using the DisoPred3 web server (http://bioinf.cs.ucl.ac.uk/psipred/).
Molecular modelling of Myb domain of TTF1, refinement and validation
The DNA binding domain (Myb domain) of the TTF1 protein is widely recognized for its ability to engage with r-DNA's Sal box terminator elements to stop Pol I-mediated transcription. A sequence homology search was performed for the Myb domain of the mouse TTF1 protein in PDB using BLASTp. For the structural analysis we have generated computational models of Myb domain by the Alpha Fold (https://alphafold.ebi.ac.uk/.) [26,27] and SWISSMODEL (https://swissmodel.expasy.org/) homology-modelling server [28–30]. Further, ab-initio structure prediction has also been used to obtain the protein structure using I-TASSER (https://zhanggroup.org/I-TASSER/) [31]. The Robetta server (https://robetta.bakerlab.org/) [32] uses both ab initio and comparative modelling techniques to build structural models depending upon the availability of a suitable template structure for homology modelling. The predicted model structures were refined using the ModRefiner server (https://zhanggroup.org/ModRefiner/). The structural quality of these models was checked using the SAVESv6.0 web server (https://saves.mbi.ucla.edu/), and the refined models were validated via ERRAT, VERIFY3D, PROCHECK, and Ramachandran plot. The quality of the modelled protein structures were analysed for Z-score using the ProSA server (https://prosa.services.came.sbg.ac.at/prosa.php), and the percentages of the secondary structures were determined using VADAR version 1.8 (http://vadar.wishartlab.com [33].
Molecular Dynamics simulation and FEL analysis
Molecular dynamics study was performed in triplicate sets using GROMACS 2020 [13] and CHARMM36 force field [34]. The Myb domain was solvated in a cubic box using a TIP3 water model with a distance of 1 nm between each side and ionized with Na+ and Cl− ions at a concentration of 0.15M to neutralize the system. The constructed system contained 66 Na+ ions, 85 Cl− ions and 23190 water molecules. Furthermore, the system was minimized using the steepest descent algorithm until the maximum force decreased below 1000 kJ/mol. During the equilibrium phase, we implemented position restraints under constant number, volume, temperature (NVT) and isothermal-isobaric (NPT) ensembles for 1 ns each to prevent potential distortions that could lead to instability. V-rescale [35] temperature coupling and Parrinello-Rahman pressure coupling [36] were employed to maintain the system at 300 K temperature and 1 bar pressure along with coupling constant of 0.1 ps for temperature and 2 ps for pressure. Long-range electrostatic interactions and van der Waals interactions were calculated using the Particle Mesh Ewald method [37], and the cut-off distance for short-range van der Waals interactions was set to 1 nm. The LINCS algorithm [38] was used to constrain all bonds. Finally, a 200 ns production simulation was conducted in triplicate using periodic boundary conditions. Furthermore, the final production simulation trajectory was analysed by calculating the root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (RoG), and solvent-accessible surface area and FEL.
The Free-Energy Landscape (FEL) is a representation of the possible conformations that a protein may adopt during molecular dynamic simulation, which is associated with Gibbs free energy. The FEL is comprised of two variables that reflect specific properties of the system and measure the degree of conformational variability. In order to visualize the energy minima landscape of the Myb protein, we employed the gmx sham module available in GROMACS to study the FEL in relation to the radius of gyration (RoG) and root-mean-square deviation (RMSD) as the two reaction coordinates [13].
Plasmid constructs
The codon-optimized Myb domain sequence of the TTF1 was amplified using the mentioned primers and cloned into the pET28a expression vector (Novagen, Millipore, USA) using EcoRI and Sal-I restriction sites. The clone was verified by double-restriction digestion and sequencing. For expression, the verified clone was transformed into E. coli BL21 (DE3) strain.
Primer Forward: 5′ ACCCTGATTACCAATCTGAAACGC 3′
Primer Reverse: 5′ TTAAACACCACGATAAACAAAACC 3′
Expression and purification of the TTF1 Myb domain
Myb/pET28a clone vector was transformed into BL21 (λDE3) E. coli strain for expression of the recombinant protein and grown in LB culture media at 37°C at 210 rpm in shaker incubator with 30 µg/ml kanamycin antibiotic until the OD600 reaches 0.6 to 0.8. Exponentially grown culture was induced with 0.9 mM isopropyl β-D-1-thiogalactopyranoside (IPTG; HiMedia, India) at 30°C for approximately 6 to 8 hours. The induced culture was harvested by centrifugation at 8000 rpm for 10 min at 4°C. The pellet was re-suspended in lysis buffer [25 mM Tris (pH-7.5), 500 mM KCl, 10% (w/v) glycerol, 9 mM β-mercaptoethanol, 10 mM Imidazole, 1 mM PMSF, 2 mg/ml lysozyme] with 1X protease inhibitor cocktail (Roche, USA) for 1 hour at 4°C and subjected to sonication (30 sec pulse with 50 sec intervals for 10 mins at 40% power on ice) to ensure complete lysis. The cell lysate was clarified by centrifugation (14000 rpm, 1 hour at 4°C) to remove debris. The clear lysate was mixed with pre-equilibrated Ni-NTA beads (Thermo Fisher Scientific, USA) and incubated on a rotatory shaker for 2 h at 4°C for adequate binding. The mixture was loaded onto a column and allowed to settle under gravity. The packed column was then washed with 10 column volumes of wash buffer [25 mM Tris (pH-7.5), 500 mM KCl, 10% (w/v) glycerol, 9 mM β-mercaptoethanol, 40 mM Imidazole, 1 mM PMSF], and the protein was eluted with elution buffer [25 mM Tris (pH-7.5), 500 mM KCl, 10% (w/v) glycerol, 9 mM β-mercaptoethanol, 400 mM Imidazole, 1 mM PMSF]. The eluted protein fractions were pooled and concentrated using a 10 kDa cut-off (Amicon filter, Millipore, U.S.A.). After affinity purification, the protein was subjected to size-exclusion gel filtration chromatography using a HiLoad 16/600 Superdex 200 pg preparative SEC column (Cytiva, U.S.A). The protein was resolved on 10% SDS-PAGE at each step from protein induction to purification in order to observe the protein induction and level of protein purity. Protein concentration was estimated using the bradford method, and samples were stored at -80°C in storage buffer [25 mM Tris (pH-8.0), 300 mM KCl, and 30% glycerol].
Electrophoretic Mobility Shift Assay (EMSA)
Reactions were assembled in 20 µl volume containing variable concentrations of protein (TTF1 Myb domain) and 100 ng DNA (Sal Box sequence) were mixed and incubated at room temperature for 40 min in binding buffer [12 mM Tris-HCI, pH 8.0, 100 mM KCl, 5 mM MgCl2, 0.1 mM EDTA, and 0.5 mM DTT, 5% glycerol]. The protein–DNA complexes were then resolved on a 10% non-denaturing polyacrylamide gel at 4°C and 150 V for 2 h in TBE buffer [39]. The shifts in DNA bands were visualized using an EMSA kit (Invitrogen, U.S.A.) according to the manufacturer’s instruction.
CD Spectroscopy
The purified Myb domain was diluted to 20 ng/μl in 10 mM Tris-Cl (pH-7.5) and subjected to CD spectroscopy using a Jasco J-1500 spectropolarimeter, having Peltier temperature-controlled cell holders (Jasco, Easton, MD, U.S.A.). Data collection and analyses were performed in triplicates following the method described by Colarusso et al. [40]. Following the standard procedures, the secondary structure was evaluated using the Jasc [41]and OriginPro software (version 2022, Origin Lab Corp., Northampton, MA, U.S.A.) according to protocol described by Greenfield el.al [42].
Raman spectroscopy
Raman spectroscopy (alpha300, WITec, Germany) was used for recording the Raman spectrum homogeneity of the purified Myb domain. The instrument was equipped with a liquid-nitrogen-cooled charge-coupled device (CCD) detector and a spectrograph with a 600 g/mm grating with a resolution of 1 cm−1. For spectrum collection, backscattering geometry was combined with a notch filter to reject elastic contributions. Excitation was achieved using a laser emitting at 532 nm, with a power of 44 mW [21]. Raman spectra of the Myb domain of TTF1 was recoded using a 50 objective with a numerical aperture (NA) ¼ 0.6 (laser spot diameter reaching the sample was about 1 mm). An optical glass slide with a cavity was used for the measurements, which were carried out on sample drops (5 µg/ml) at room temperature. For each sample, five Raman spectra were obtained at an acquisition time of 10 s from various spots on the drops. High signal-to-noise ratio spectra were picked among some of the recorded spectra upon verifying the spectral profile. OriginPro software was used to map and evaluate the same [20]. Protocol as described by Maiti et al [19] was used for curve fitting of the amide I band and its assignment to the structural components using the Origin software.
Data Availability
All the data generated from the study is available in the manuscript.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
The research was funded by Department of Biotechnology (DBT), Govt. of India, RLS [grant number BT/RLF/Re-entry/43/2016 (to S.K.S. and S.R.F.) fellowship by Council for Scientific and Industrial Research (to G.S.).
CRediT Author Contribution
Gajender Singh: Data curation, Software, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review & editing. Abhinetra Jagdish Bhopale: Data curation, Software, Validation, Writing—original draft. Saloni Khatri: Data curation, Methodology, Writing—original draft. Prashant Prakash: Data curation, Methodology, Writing—original draft. Rajnish Kumar: Resources, Supervision, Validation, Investigation, Writing—original draft, Writing—review & editing. Sukh Mahendra Singh: Supervision, Writing—original draft. Samarendra Kumar Singh: Conceptualization, Resources, Supervision, Funding acquisition, Validation, Investigation, Writing—original draft, Project administration, Writing—review & editing.
Acknowledgements
The authors are thankful to the Director Prof. Sanjay Kumar, Institute of Science, Banaras Hindu University for providing space and facilities to conduct the research. We are also thankful to the Central Discovery Centre (CDC) and Sophisticated Analytical and Technical Help Institutes (SATHI), Banaras Hindu University (BHU), for the accessibility of the CD spectroscopy and Raman spectroscopy instruments. Rajnish Kumar is grateful to the support and the resources provided by ‘PARAM Shivay Facility’ under the National Supercomputing Mission, Government of India at the Indian Institute of Technology (BHU), Varanasi for carrying out the computational simulation study. The research was funded by the Department of Biotechnology (DBT), Govt. of India, RLS grant (BT/RLF/Re-entry/43/2016) and IOE seed grant to Samarendra K Singh. The Council of Scientific and Industrial Research (CSIR) supported this research by providing scholarship to Gajender Singh, Department of Biotechnology (DBT), and Govt. of India by providing scholarship to Saloni Khatri and Prashant Prakash (M.Sc. Stipend). We further show our thanks to ISLS for providing space and instrumentation support.