Structural and functional impact of non-synonymous SNPs in the CST complex subunit TEN1: structural genomics approach

TEN1 protein is a key component of CST complex, implicated in maintaining the telomere homeostasis, and provides stability to the eukaryotic genome. Mutations in TEN1 gene have higher chances of deleterious impact; thus, interpreting the number of mutations and their consequential impact on the structure, stability, and function is essentially important. Here, we have investigated the structural and functional consequences of nsSNPs in the TEN1 gene. A wide array of sequence- and structure-based computational prediction tools were employed to identify the effects of 78 nsSNPs on the structure and function of TEN1 protein and to identify the deleterious nsSNPs. These deleterious or destabilizing nsSNPs are scattered throughout the structure of TEN1. However, major mutations were observed in the α1-helix (12–16 residues) and β5-strand (88–96 residues). We further observed that mutations at the C-terminal region were having higher tendency to form aggregate. In-depth structural analysis of these mutations reveals that the pathogenicity of these mutations are driven mainly through larger structural changes because of alterations in non-covalent interactions. This work provides a blueprint to pinpoint the possible consequences of pathogenic mutations in the CST complex subunit TEN1.


Introduction
Telomeres consist of non-coding ends of eukaryotic linear chromosomes and play a vital role in the replication, regulation, and protection of genome [1,2]. Ends of eukaryotic chromosomes can be identified by recombination and repair system of the cells as DNA strand breaks that often proceed to end-to-end fusion and instability of genome [3,4]. Shelterin complex is composed of six subunits (TRF1, TRF2, RAP1, TIN2, TPP1, and POT1) which are located primarily to single-and double-stranded telomeric DNA [5]. In addition to repressing DDR and chromosome fusion, shelterin complex also caps the telomeric ends by facilitating the formation of T-loop. It is also acting as a processivity factor via recruiting telomerase to chromosomes end [6,7].
The CST complex is composed of three subunits, conserved telomere maintenance component 1 (CTC1), suppressor of CDC thirteen homolog (STN1), and telomere length regulation protein TEN1 homolog (TEN1) [8], specifically localizes to the ssDNA) of telomere and is involved in telomere capping and regulation of telomere length [9][10][11]. However, increasing evidence has demonstrated that the STN1-TEN1 complex possesses some extra telomeric functions. It is involved in resolving replication fork stalling during replication stress [12,13]. CST complex is also involved in the removal of G-quadruplexes (G4: G-rich repeats) [14]. The G-rich region of the telomere is very prone to form G4 throughout telomeric DNA and poses severe challenges for telomere replication machinery [15]. In addition, CST complex binds to the 3 ends of telomeres and regulates polymerase α-mediated syntheses of C-strand [16]. Some important reports on the structure and function of the CST complex can be found elsewhere [17][18][19][20].
In addition to polymerase α-mediated syntheses of C-strand, a subunit of CST (CTC1-STN1) regulates telomerase-mediated extension of G-rich overhang which is critical for the cell proliferation. Deficiency of CTC1-STN1 complex leads to overextension of G-rich overhangs which initiate DDR [21,22]. In this process, the role of TEN1 is indispensable as it is essential to provide stability to CTC1-STN1 complex. Disruption of TEN1 results in progressive shortening of telomere more like caused by telomerase deficiency. As telomere maintenance is paramount to genome stability, mutations in the genes encoding essential components of CST are associated with varieties of genetic abnormalities including cancer [23], coat plus [24][25][26], and dyskeratosis congenita [27,28].
Prediction of nsSNPs affecting protein structure and function in detail may be investigated by the aid of cutting-edge computational methods. In many cases, nsSNPs have little or no effect on protein structure and functions, but often a single mutation is highly lethal [29]. Experimental studies suggested that about one-third of nsSNPs are deleterious to human health [30]. Thus, identification of such deleterious nsSNPs is of serious concern in terms of diagnosis and therapeutic perspective. In vitro mutational studies by Bryan et al., suggested that mutation in some important residue in TEN1 gene directly affect the interaction with STN1 to many fold. During structure analysis, we have observed that mutant R27Q caused a marked reduction in the polar interactions between TEN1-STN1complex (Supplementary Figure S1). Disruption in TEN1-STN1 interaction leads to the development telomere malfunctions and thus telomeropathies [31]. A little report is available on the mutational analysis of nsSNPs in TEN1 gene. Taking this opportunity into consideration and the fact that TEN1 plays crucial role in the telomere maintenance; we have predicted the structural and functional effects of about 78 nsSNPs in the coding region of TEN1 gene. The present study will offer in-depth understanding of the role of nsSNPs on the structure and function of TEN1 protein.

Data collection
Distribution of nsSNPs in human TEN1 gene was retrieved from dbSNP [32], Ensembl [33], and HGMD [34] databases. Data enrichment was carried out by removing the variant duplicates of different databases. The human TEN1 amino acid sequence was obtained in FASTA format from UniProt database (UniProt ID: Q86WV5) (http://www.uniprot.org/). A 3D structure of TEN1 (PDB ID: 4JOI) was downloaded from the Protein Data Bank (PDB) [35]. Functional annotations of all SNPs were extracted from the dbSNP database; for example, whether the SNPs present in an intron or exon, in the 3 or 5 -UTR, or downstream or upstream of the TEN1 gene.

Sequence-based prediction of deleterious nsSNPs
Sorting Intolerant from Tolerant (SIFT) (http://sift.jcvi.org/) algorithm was used to predict the amino acid substitution as tolerable and intolerable depending upon the physical and sequence-homology features. Substitutions with normalized probabilities of ≥0.05 and ≤0.05 were predicted as tolerated and deleterious, respectively [36,37]. There were about 78 nsSNPs identified from Ensembl and dbSNP databases. Prediction of tolerated and deleterious effect of these nsSNPs in human TEN1 gene was predicted using SIFT. Protein variation effect analyzer (PROVEAN) (http://provean.jcvi.org/) tool was used to predict the consequences of amino acid substitution on protein function [38]. It predicts nsSNPs as 'deleterious' if the score is less than the threshold value (cutoff is −2.5), and 'neutral' if the predicted score is more than the cut-off value. All the nsSNPs in human TEN1 gene were calculated and analyzed using this cut-off value.
PolyPhen-2 (polymorphism phenotyping-2) (http://genetics.bwh.harvard.edu/pph2/) was used to calculate functional predictions of coding variants. It uses a particular empirical rule comprises of both comparative and physical considerations to predict the probable functional impacts of mutation on the structure-function relationship. FASTA format of protein sequence was used as input to calculate the effects of a particular substitution [39]. It calculates a position-specific independent count (PSIC) score for each substitution and then estimates the score deviations. A mutation is considered as possibly destructive mutation if the PSIC score is ≥0.9.
Structure-based prediction of destabilizing nsSNPs STRUM (https://zhanglab.ccmb.med.umich.edu/STRUM/) tool was used to predict the stability differences between WT and mutant proteins. Initially, from protein sequences, a 3D model was generated by I-TASSER simulation and used to train STRUM model through gradient boosting regression. STRUM predicts the possible effects of nsSNPs on the structure and function of a protein using conservation score from an alignment of the multiple-threading template. The query sequence used as input in FASTA format and calculated the impact of a particular substitution in a given sequence [40]. SDM2 (site direct mutator 2) (http://structure.bioc.cam.ac.uk/sdm2) is a knowledge-based tool used to estimate the impact of mutations on the stability of protein [41]. It uses constrained environment-specific substitution tables (ESSTs) to calculate the differences in the protein stability upon mutation [41,42]. SDM2 uses PDB as an input file, and point variants to estimate the stability difference score between the WT and mutants.
PoPMuSiC (http://babylone.ulb.ac.be/popmusic/) tool was used to predict changes in thermodynamic stability upon mutation. PoPMuSiC employing a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the substituted residues. It uses PDB as an input file. DUET server was used to predict the impact of mutations on the stability of TEN1 protein using PDB code. DUET calculates a combined or consensus predictions of SDM and mCSM (mutation Cutoff Scanning Matrix) using support vector machines (SVMs) in a non-linear regression fashion. The output it provides is in the form of change in Gibbs free energy ( G), where negative sign indicates destabilizing mutation [43]. MCSM was implicated to predict the impact of mutations on the stability of proteins using graph-based structural signatures. It predicts protein-protein and protein-nucleic acid interaction [44].

Aggregation propensity analysis
SODA (protein solubility from disorder and aggregation propensity) was used to predict the change in protein solubility upon mutation by comparing the sequence profile of WT and mutants. The aggregation or intrinsic disorder score obtained from PASTA [52], and ESpritz [53], and a combined result obtained from Kyte-Doolittle [54] and FELLS [55]. SODA also predicts types of variation, including insertion and deletion in a given sequence [56].

Sequence conservation analysis
The importance of a particular amino acid in the structure and functions of protein can be generally retrieved from its conservation score using multiple sequence alignment. The blueprint of amino acid conservation was identified by ConSurf tool, which measures the degree of conservation of each amino acid at a particular position along with the evolutionary profile of amino acid sequence [57]. Conservation score ranged from 1 to 9, where 1 depicts rapidly evolving (variable), 5 indicates region which is evolving moderately, and 9 shows slowly evolving (evolutionarily conserved) position. Exposed residues with high conservation score are being considered as functional whereas buried residues with high conservation score are believed as structural residues.

Analysis of solvent accessibility
Relative side-chain solvent accessibility (RSA), residue depth and residue-occluded packing density (OSP) of WT, and mutant TEN1 protein have been performed using SDM2 server [41]. It uses ESSTs table to calculate the differences in their RSA, residue depth, and OSP of WT and mutant proteins. RSA has been calculated based on Lee and Richards method [58]. Three classes of relative RSA were defined based on the method of Lee and Richards, whereby a probe of given radius is rolled around the surface of the molecule [58].

Results and discussion
All reported SNPs of TEN1 gene was extracted from Ensembl (http://www.ensembl.org/) and dbSNP databases (http: //www.ncbi.nlm.nih.gov/snp). A total of about 5712 SNPs were mapped and classified into nine different functional classes. Four major classes of SNPs in TEN1 gene are shown in Figure 1. About 5250 SNPs were mapped in the intronic region and approximately 78 were found in the coding non-synonymous/missense region. The 5 -and 3 -UTR regions have 277 and 91 SNPs, respectively. In addition, 61 SNPs in coding synonymous, five SNPs in frameshift, and three SNPs in each 3 and 5 splice site regions are also observed. The present study focuses only on missense mutations mapped in the coding region. A total of 78 nsSNPs were taken for further analysis. To identify the structural and functional impact on missense mutations in TEN1 gene, we have employed a multi-tier approach. To collect high confidence nsSNPs in the TEN1 gene, all mapped TEN1 nsSNPs were first subjected to sequence-based prediction using PolyPhen-2, PROVEAN and SIFT, followed by structure-based stability predictions using PoPMuSiC, SDM2, DUET, mCSM, and STRUM web-servers. Further, distributions of high confidence nsSNPs were analyzed on the basis of their structure descriptors and phenotypic association. In consistence, we discuss pathogenic mutations in relation to their sequence conservation, functional importance, and aggregation propensities. Finally, we expand our analysis and extensively analyzed the structural and functional impact of pathogenic mutations on the local environment of the TEN1 protein. An overview of computational methods used in the present study is depicted in Figure 2.

Identification of deleterious nsSNPs
To pinpoint the structural and functional consequences of nsSNPs in TEN1 gene, we have performed an extensive structural analysis. The reason for using multiple tools is to improve the confidence level of prediction. Accumulation of deleterious nsSNPs using a single approach may not always be satisfactory as some mutations that have scored very close to cut-off value are prone to false prediction. Therefore, using multiple tools in both sequence-and structure-based predictions may provide an accurate result. The nsSNPs predicted to be deleterious in at least two methods from sequence-based prediction methods and three tools depict destabilizing effects from structure-based prediction were collected and termed as 'high confidence nsSNPs' .
Sequence-based prediction of all nsSNPs in TEN1 gene was calculated by SIFT, PROVEAN, and PolyPhen-2. A total of 78 nsSNPs of human TEN1 gene were considered for analysis. Sequence-and structure-based predictions are listed in Supplementary Table S1

Distribution of deleterious or destabilizing nsSNPs
TEN1 comprises of 123 amino acid residues and have one OB (oligonucleotide or oligosaccharides)-fold domain ( Figure 6) [31]. The OB-folds domain was originally identified from a group of yeast and bacteria [59]. The OB-fold domain can bind and establish protein-DNA, protein-RNA, and protein-protein interactions [60,61]. Amongst these functions, the interaction of OB-folds with ssDNA is extensively studied and characterized [13,62]. Structurally, the OB-folds are β-barrel consisting of five antiparallel β-strands capped by one α-helix at one end has a binding cleft on the other end. The variability in length amongst OB-folds domain is mainly due to the differences in the lengths of variable loops connecting the conserved secondary structure elements [62].
Identification of relative percentage of high confidence nsSNPs in the OB-fold of TEN1 protein provides information about the relationship of a particular secondary structure component to be neutral or pathogenic. The secondary structure components; α1, β1, β2, β3, L3-4 (loop connecting the β3 and β4), β4, β5, α2, respectively have 75, 60, 28, 50, 45, 22, 77, and 35% deleterious or destabilizing mutations ( Figure 6). Mutations in the α1 and β5 are having more than 75% chance to be deleterious, while β1, β3, and L3-4 have about 50% chance. In addition, mutations in L1-2, L2-3, L4-5, and L5-α2 (loop connecting β5 and α2) suggested that nsSNPs occurring in these region have negligible chance to be deleterious. From these results, we can suggest that mutations in the α1, β1, and β5 are possibly more lethal than in other parts of TEN1. These observations were further complemented by sequence conservation analysis, which suggested that residues belonging to α1, β1, and β5 of TEN1 are highly conserved.

Evaluation of disease phenotype
High confidence nsSNPs (deleterious and destabilizing) were analyzed for their phenotypic association using Mut-Pred and PhD-SNP methods (Table 1). These methods predict a particular mutation as benign or pathogenic based on prediction score. MutPred and PhD-SNP methods depict 14 (58%) and 10 (29%), respectively mutations are associated with the disease phenotype. Of the 34 high confidence nsSNPs, we have identified only eight (24%) mutations (W13G, L26P, C58Y, G70A, G77R, R92H, R92C, and C96Y) as pathogenic from both prediction methods. We can conclude that eight (10%) of the total mutation, 78 (100%), found in TEN1 gene are pathogenic in nature.

Analysis of conformational changes in protein structure
Root mean square deviation (RMSD) is a commonly used quantitative measure of the similarity between two superimposed atomic coordinates, considered as a relative measure of structural and conformational changes in a given protein structure [63]. We have performed a comparative analysis of modeled tertiary structure of mutant proteins with the WT to deduce possible structural and functional consequences imposed by pathogenic nsSNPs in TEN1  Figure 7A-F). Mutation G77R in the β4-strand of TEN1 protein showed a remarkable conformational change with the highest RMSD values in comparison with other mutations ( Figure 7C). R92H and R92C mutations are involving the substitution of arginine by a small histidine and cysteine, thus expecting to affect the conformation of TEN1 protein which is evident from changes in RMSD values of backbone atoms ( Figure 7D,E). Other three pathogenic mutations (W13G, L26P, and C96Y) are also showing a considerable structural change in the local structure as compared to WT.

Aggregation propensities analysis
Protein solubility is one of the critical attribute primarily related to its function [64,65]. Insoluble parts in proteins often tend to form an aggregate which leads to development of many diseases including, amyloidoses [66], Alzheimer's [67], and Parkinson diseases [68]. Aggregation propensity analysis was performed in the context of identification of a disease or pathogenic SNPs. SODA classifies SNPs based on changes in α-helix and β-strand propensities; aggregation and disorder score, etc. Out of eight pathogenic mutations obtained from MutPred and PhD-SNP tools, six (75%) were found to have an increased tendency to form an aggregate (Table 2). These aggregate forming potential of amino acid residues are primarily located at the C-terminal of TEN1 protein. Replacement of Arg92 by cysteine or histidine is considerably more prone to form an aggregate in comparison with other pathogenic mutations.

Structural and functional consequence of mutations
The OB-fold of TEN1 comprises of five antiparallel β-strands folded into a complex β-barrel flanked by two α-helices. N-terminal residues forming a long coil and plays a crucial role in STN1-TEN1 complex formation. Following N-terminal coil, there is a short α-helix (α1) located at an interface of two β-sheets known to provide stability to the structure. However, the C-terminal α-helix (α2) is situated at the opposite end of the β-barrel and spans the whole length of the structure. The N-terminal of STN1 forms a stable heterodimer complex with TEN1. Complex formation between these two proteins is mediated by extensive interactions between the α2and α3-helices of TEN1 and STN1, respectively (Supplementary Figure S2A). In addition to α-helices, β-barrels of TEN1 and STN1 also form extensive contacts (Supplementary Figure S2B) [31]. Some important amino acid residues, including Val159, Trp160, Ile164, Met167, and Leu168, of α3-helix and some region of flanking coils of STN1 form extensive hydrophobic contacts with the amino acid residues, Met100, Leu104, Leu105, and Ile109, of α2 of TEN1 (Supplementary Figure S2C). Additional interactions between the STN1 and TEN1 are mainly mediated by the conserved Tyr115 of TEN1 α2. Tyr115 is found at the interface of the STN1 and TEN1 and known to form extensive hydrophobic contacts with the side chains of Tyr49, Pro171, and Tyr174 of STN1. Similarly, interactions between the STN1 and TEN1 involve the surface of the β-barrels and the N-terminal tail of TEN1, that runs along the interface of the two domains and form extensive contacts with both of these two proteins (Supplementary Figure S2D). In particular, Arg27 of β1-strand and Arg119 of α2 of TEN1 make an important salt bridge with Asp78 of β2-strand and Asp33 of α2 of STN1, respectively. Further, Met167 of STN1 spans toward α2 and β-barrel interface of TEN1 and form extensive interactions with Leu105, Ala108, and Ile109 of α2 and Tyr9 of the N-terminal coil. It is fascinating that the STN1-TEN1 complex positions the ligand-binding pockets of each subunit on the same side of the heterodimer, forming an extensive ligand-binding pocket [31]. Mutations in protein are often coupled with destabilization or some time associated with disease pathogenesis. Previous studies on mutational analysis demonstrated that the effects of mutations on the stability of protein are primarily owing to changes in hydrophobic contacts [69][70][71]. However, subsequent studies in a number of cases revealed that substitutions of a large amino acid with smaller ones are usually accompanied by the formation of cavity and effect residue depth and solvent accessibility [72][73][74][75]. To find out the impact of a particular mutation on the local and global environment of TEN1 protein structure, we have calculated van der Waals, hydrogen bonding, electrostatic and hydrophobic interactions in WT, and mutant TEN1 using Arpeggio web server (Table 3) [76]. We have estimated the change in the RSA, OSP, and residue depth of wild-type and mutant TEN1 proteins ( Figure S3).
Trp13 is a highly conserved and a buried residue of the N-terminal flanking coil and plays important role in STN1-TEN1 complex formation. Substitutions of larger bulky and highly hydrophobic Trp13 by small, less hydrophobic glycine does not change van der Waals and hydrogen bond interactions significantly, while a large decrease in stacking and hydrophobic interactions are observed (Table 3). Differences in the size and polarity of Trp and glycine affecting the RSA, OSP, and residue depth of protein. Increased RSA value in the Trp13Gly substitution suggested that the substituted residue at Trp13 becomes more accessible to solvent, which is further supported by a decrease in packing density (Supplementary Figure S3). The Surface potential analysis shows a decrease in hydrophobicity in Trp13Gly substitution ( Figure 8A). The results suggested that the substitution of Trp13 with the glycine seems indispensable for the stability of TEN1 structure.
Similarly, Leu26 is a highly conserved and buried residue found at the β1-strand of TEN1. Substitutions of hydrophobic Leu26 by a less hydrophobic proline effects only van der Waals and hydrophobic interaction at a little extent (Table 1). However, no significant change was observed for RSA, OSP, surface potential, and residue depth by Leu26Pro mutation (Figure 8B). We may conclude that the incorporation of imino group as a side chain of proline may interfere with the folding pathway of TEN1 without effecting non-covalent interactions.
Gly77 is located in the β4-strand of TEN1 and plays an important role in maintaining the structure and stability ( Figure 7D). Substitution of small, hydrophobic, highly conserved, exposed, and functional Gly77 by a large and least hydrophobic, positively charged arginine shows an increase in the van der Waals, hydrogen bonding, and hydrophobic interactions. In consistence, Gly77Arg mutation shows an increase in RSA, a subtle decrease in OSP and residue depth. Gly77Arg mutation increases positively charge environment in the vicinity of Gly77 ( Figure 8C). Lethality of Gly77Arg mutation is associated with the changes in RSA of surrounding residues which are critical to maintaining the TEN1 stability.
Arg92 is belonging to the β4-strand of TEN1 and is important for the stability. Substitution of large, highly hydrophilic, conserved, exposed, and functional Arg92 by a small less hydrophilic, positively charged (histidine), and uncharged (cysteine) shows a disruption of one hydrogen bond and a large decrease in the ionic interactions. While, no significant change was observed in the van der Waals, stacking and hydrophobic interactions. Similarly, Arg92His and Arg92Cys mutations show an increase in RSA, and a slight decrease in the OSP and residue depth. The increase in RSA suggesting that the substitution of Arg92 may increase the solvent accessibility of newly incorporated residues. A marked change in surface potential has been observed in Arg92His and Arg92Cys mutations ( Figure 8D,E). These results indicate that the lethal effect of Arg92His and Arg92Cys mutations is primarily associated with the changes in hydrogen bonding, ionic interactions, and RSA and thus protein stability.
Cys96 is situated in the β5-strand of TEN1. Substitution of small, less hydrophobic, highly conserved and buried Cys96 by a large and more hydrophobic tyrosine show an increase in the stacking and hydrophobic interactions, while no change was observed in other interactions. Cys96Tyr mutation shows an increase in the RSA and decrease in OSP. No significant change in surface potential has been found except an increase in the hydrophobicity ( Figure  8F). Our findings suggest that Cys96Tyr mutation may increase the important hydrophobic and stacking interactions which are being considered as a driving force for protein stability. These increase in stability possibly overcome due to disruptions some important interaction Cys96.

Conclusion
SNPs are considered as one of the most recurring genetic variants associated with a number of diseases. In the present study, we have examined the consequences of nsSNPs in TEN1 gene using advanced integrated bioinformatics approach. We have identified a large number of deleterious and destabilizing nsSNPs, which are scattered in different secondary structural components of TEN1 with a high chance of occurring in α1-helix and β5-strands. Aggregation propensity analysis of pathogenic mutation shows that 75% of pathogenic mutations in TEN1 have a tendency to form aggregate and located at C-terminal of TEN1. In-depth structural analysis of these mutations reveals that the pathogenicity of these mutations may be driven through a large structural changes caused by loss/gain of non-covalent intramolecular interactions. The present study provides a mechanistic insight into the understanding of pathogenic mutations in TEN1 gene and their possible consequences.