Identification of novel non-synonymous variants associated with type 2 diabetes-related metabolites in Korean population

Abstract Metabolome-genome wide association studies (mGWASs) are useful for understanding the genetic regulation of metabolites in complex diseases, including type 2 diabetes (T2D). Numerous genetic variants associated with T2D-related metabolites have been identified in previous mGWASs; however, these analyses seem to have difficulty in detecting the genetic variants with functional effects. An exome array focussed on potentially functional variants is an alternative platform to obtain insight into the genetics of biochemical conversion processes. In the present study, we performed an mGWAS using 27,140 non-synonymous variants included in the Illumina HumanExome BeadChip and nine T2D-related metabolites identified by a targetted metabolomics approach to evaluate 2,338 Korean individuals from the Korea Association REsource (KARE) cohort. A linear regression analysis controlling for age, sex, BMI, and T2D status as covariates was performed to identify novel non-synonymous variants associated with T2D-related metabolites. We found significant associations between glycine and CPS1 (rs1047883) and PC ae C36:0 and CYP4F2 (rs2108622) variants (P<2.05 × 10−7, after the Bonferroni correction for multiple testing). One of the two significantly associated variants, rs1047883 was newly identified whereas rs2108622 had been previously reported to be associated with T2D-related traits. These findings expand our understanding of the genetic determinants of T2D-related metabolites and provide a basis for further functional validation.


Introduction
Metabolome-genome wide association studies (mGWASs) are useful for evaluating associations between genetic variants and serum or plasma metabolite levels by integrating genomics and metabolomics data [1]. Metabolites are thought to play important roles as markers and/or effectors of metabolic diseases or related traits, such as type 2 diabetes (T2D), hypertension, lipid levels, fasting plasma glucose, and fasting insulin levels. Thus, mGWASs are mainly aimed at refining and expanding our understanding of the causal determinants of metabolic diseases [2]. Several previous mGWASs using European cohorts from the Cooperative Health Research in the Region of Augsburg (KORA), Framingham Heart Study (FHS), and Twin UK have identified common variants in dozens of genetic loci associated with metabolite concentrations, including lipids, carbohydrates, amino acids, nucleotides, peptides, and cofactors [3][4][5][6]. In addition, ten common variants associated with metabolites related to T2D development have been identified in the Korea Association Resource (KARE) cohort [7].
However, mGWASs using existing genotype arrays, like Affymetrix 6.0, are generally focussed on non-coding variants located in intronic or intergenic regions, and have difficulty in capturing potentially functional genetic variants. To overcome this limitation, exome arrays focussed on potentially functional variants in protein-coding regions of genes are an alternative approach [8]. In particular, non-synonymous variants are critical for finding causal genetic factors for clinical phenotypes (e.g., metabolite changes) owing to their potential to alter protein function [8,9]. Recent exome-wide association studies using the Illumina HumanExome BeadChip have identified several associations between non-synonymous variants and metabolite traits. For instance, non-synonymous variants in five genes (CTH, GMPS, HAL, PAH, and UPB1) are associated with metabolite levels, including cystathionine, xanthosine, histidine, phenylalanine, and ureidopropionate, in European subjects from the FHS cohort, and these results have been replicated in the Atherosclerosis Risk in Communities cohort [8]. Additionally, associations of T2D-related metabolite ratios with three non-synonymous variants in CHIA, REV3L, and APOE have been identified [10].
In the present study, we searched for potentially functional genetic factors affecting the T2D risk based on associations with metabolite traits. Metabolite quantification by a targeted method and genotyping using the Illumina HumanExome BeadChip were used to evaluate 2,338 Korean subjects from the KARE cohort. Statistical analyses of genetic associations with T2D-related metabolites were performed. Non-synonymous variants located in two genes showed significant associations with T2D-related metabolites.

Study subjects
The KARE cohort is a community-based cohort established through the Korean Genome Epidemiologic Study (Ko-GES) project. In the Ansung and Ansan areas of Kyounggi province, South Korea, 10,030 individuals were initially collected from 2001 to 2002 and six subsequent surveys were performed every 2 years from 2003 to 2014 [11]. Amongst the participants included in KARE survey 2 (from 2005 to 2006), 2338 individuals with both genomic (exome array) and metabolomic datasets were recruited for the present study. Individuals were classified into T2D (n= 512), prediabetes (PD, n= 862), and normal glucose tolerance (NGT, n= 964) groups according to the American Diabetes Association (ADA) diagnostic criteria based on fasting plasma glucose levels (FPG), glycated hemoglobin levels (hemoglobin A1c, HbA1c), and 2-h plasma glucose levels (2h-PG) [12].

Genotyping and quality control
Genotype data were obtained from 2,338 Korean subjects in the KARE cohort using the Illumina HumanExome Bead-Chip v1.1. The genotypes were called by automated clustering and manual inspection using Illumina GenomeStudio v2011.1 according to the clustering method of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium [13]. A total of 59,340 non-synonymous variants were selected from the genotype data to identify potentially functional variants which may have causal effects on T2D-related metabolite levels by altering the amino acid sequences of proteins. For genotype quality control, variants with minor allele frequency (MAF) <0.001 (0.1%), significant deviation from Hardy-Weinberg equilibrium (P<1.0 × 10 −6 ), and a missing genotype rate over 5% were excluded. In total, 27,140 non-synonymous variants were included for the association analysis of T2D-related metabolites.

Metabolite level assessment
Serum metabolites in the 2,338 subjects with genotyping data were quantitated by targetted metabolomics using the AbsoluteIDQ p180 Kit (Biocrates Life Sciences, Innsbruck, Austria), including 40 acylcarnitines, 21 amino acids, 19 biogenic amines, 15 sphingolipids, 90 glycerophospholipids, and one hexose. This kit enables the simultaneous quantification of metabolites by liquid chromatography and flow injection mass spectrometry. A sample of pooled serum from healthy controls was used as a reference standard and was quantitated 36 times in randomly selected positions on the kit to estimate reproducibility. To ensure data quality, each metabolite was required to meet the following three criteria: (1) the coefficient of variance for the metabolites in the reference standards was <25%; (2) 50% of the estimated metabolite concentrations in the reference standard was above the limit of detection, which was set to three times the median of the three blank samples within each kit; and (3) 50% of the estimated metabolite concentrations in the experimental samples was above the limit of detection. In total, 123 metabolites met the criteria, and the final metabolomics dataset contained 1 hexose), 12 acylcarnitines, 21 amino acids, seven biological amines, ten sphingomyelins, 32 diacyl (aa) phosphatidylcholines (PCs), 32 acyl-alkyl (ae) PCs, and eight lysoPCs. The concentrations of all analyzed metabolites are reported in units of μM.

Statistical analyses Selection of T2D-related metabolites
The odds ratios (ORs) for single metabolites were calculated by logistic regression for comparisons of metabolite levels between two groups (NGT versus T2D and PD versus T2D) [7]. In the analyses, diabetic conditions and metabolite concentrations were considered as dependent and explanatory variables, respectively. For the linear regression analysis, β values were calculated based on the concentration of each metabolite as an explanatory variable and postprandial glucose was considered as a dependent variable. The cutoff for significance was determined according to Bonferroni correction for multiple testing (P< 4.07 × 10 −4 ).

Association between non-synonymous variants and T2D-related metabolites
Statistical analyses were performed to identify non-synonymous variants that are significantly associated with T2D-related metabolite levels. A linear regression analysis controlling for age, sex, body mass index (BMI), and T2D status (designated as a categorical predictor; normal = 1, PD = 2 and T2D = 3) as covariates was performed to evaluate the associations between allelic distributions of the 27,140 non-synonymous variants and nine T2D-related metabolite levels using PLINK (http://pngu.mgh.harvard.edu/∼purcell/plink). The concentration of each T2D-related metabolite was log transformed and normalized through inverse-rank method. Overall chromosomal positions and P-values for associations with T2D-metabolites for all non-synonymous variants were visualized by generating a Manhattan plot using R (version 3.6.0) ( Figure 1). The threshold P-value for significance was adjusted by the Bonferroni correction for multiple testing (P< 2.05 × 10 −7 ). Predictions of functional effects of the non-synonymous variants based on Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping (Polyphen) software were obtained from the genome aggregation database (gnomAD, https://gnomad.broadinstitute.org).

Characteristics of the study subjects
The clinical profiles of study subjects are summarized in Table 1. A total of 2,338 Korean individuals were divided into T2D (n=512), PD (n=862), and NGT (n=964) groups according to the ADA diagnostic criteria, as mentioned in the materials and methods. Parameters included in the clinical profiles differed significantly amongst the three groups (P<0.0001). Seven parameters (age, BMI, FPG, 2h-PG, HbA1c, total cholesterol [TCHL], and triglyceride [TG])  exhibited significantly higher values in the T2D group than in the NGT group, whereas the high-density lipoprotein (HDL) level was significantly lower in the T2D group (P<0.05). In addition, four parameters, i.e., FPG, 2h-PG, HbA1c, and TG, showed significant differences between the PD and T2D groups (P<0.05). These four parameters were higher in the T2D group than in the PD group.

T2D-related metabolites for association analysis
Amongst 123 quantified metabolites, nine were selected as T2D-related metabolites based on our previous study [7].

Non-synonymous variants associated with T2D-related metabolites
A total of 27,140 non-synonymous variants were included in the association analysis for nine T2D-related metabolite levels to identify potentially functional variants, which may have causal effects after genotype quality control. A Manhattan plot summarizing the association analysis of all non-synonymous variants with respect to T2D-related metabolites is presented with chromosomal positions (x-axis) and the negative logarithm of P-values (y-axis) for each variant (Figure 1). We identified two non-synonymous variants that were significantly associated with T2D-related metabolites using a linear regression analysis controlling for age, sex, BMI, and T2D status as covariates (P<2.05 × 10 −7 , Table 3). These variants were located in the coding regions of two genes, CPS1 (rs1047883) and CYP4F2 (rs2108622). A common variant of the CPS1 gene, rs1047883 (MAF = 0.470) was significantly associated with the glycine level (P=2.13 × 10 −8 ). The A allele of rs1047883, inducing an amino acid change from alanine to threonine (Ala350Thr), was correlated with a decreased level of glycine (β = −0.155). A CYP4F2 variant, rs2108622 (MAF = 0.325), was associated with PC ae C36:0 levels (P=8.19 × 10 −13 ). The A allele of the variant causing a change in the amino acid sequence from valine to methionine (Val433Met) was associated with an increase in PC ae C36:0 levels (β = 0.213). Predictions of functional significance of the non-synonymous variants based on SIFT and Polyphen software were obtained from gnomAD. Results from both of the two predictions indicate that rs2108622 (predicted as 'deleterious' in SIFT and 'probably damaging' in Polyphen) may be a pathogenic variant (Table 3). In contrast, the rs1047883 was predicted to yield moderate effect (predicted as 'tolerated' in SIFT and 'benign' in Polyphen). The top five non-synonymous variants associated with the nine T2D-metabolites are listed in Supplementary Table S1.

Linkage disequilibrium status of non-synonymous variants associated with T2D-related metabolites
Regional plots for the two non-synonymous variants that were significantly associated with T2D-related metabolites are presented in Figure 2. Each variant is plotted according to chromosomal position (x-axis) and negative logarithm of the P-value (y-axis). Recombination rates in the flanking region of each variant, indicated as blue peaks in the figure, were predicted to reflect linkage disequilibrium (LD) structures based on the 1000 Genomes Asian database.
No LD was observed in the flanking region of each variant, suggesting that the associations of these variants with T2D-related metabolites are independent of proxy variants.

Relationship between metabolite levels and genotype in the NGT and diabetic groups
Differences in metabolite levels according to genotype (carriers versus non-carriers of the minor allele) for each non-synonymous variant observed in overall subjects ('total'), NGT, and diabetic (PD and T2D, 'diabetes') groups are summarized as boxplots in Figure 3. With respect to glycine levels, significantly higher values were observed in minor allele non-carriers (GG) than in minor allele carriers (AG and AA) for rs1047883 in all three groups (P<0.01).
In contrast, PC ae C36:0 levels were observed significantly higher in minor allele carriers (AG and AA) than in minor allele non-carriers (GG) for rs2108622 (P<0.01). These results suggest that significant genotypic effects of the non-synonymous variants rs1047883 and rs2108622 on glycine and PC ae C36:0 levels exist.

Discussion
We hypothesized that non-synonymous variants are more likely to include causal genetic factors associated with human disease-related phenotypes owing to their potential to alter protein function. Using linear regression, we investigated associations between non-synonymous variants captured in an exome array and nine T2D-related metabolites selected based on a previous study in the KARE cohort. As a result, two non-synonymous variants were associated with glycine and PC ae C36:0 levels, respectively. A common non-synonymous variant located in the CPS1 gene, rs1047883, was significantly associated with glycine levels in the present study. To the best of our knowledge, this variant has not yet been associated with glycine levels or T2D risk. The minor allele (A allele) of this variant, inducing an amino acid change from alanine to threonine (Ala350Thr), was related to a decrease in glycine levels. In Figure 3, significant genotypic effects related to a decrease in glycine levels were also observed in minor allele carriers (AG and AA). Considering that glycine levels decrease according to T2D progression, as mentioned in Table 2, the minor allele of rs1047883 may be associated with an increased risk of T2D. The effects of CPS1 on glycine levels have been reported previously. CPS1 encodes a mitochondrial carbamoyl-phosphate synthase that produces carbamoyl phosphate from ammonia and bicarbonate. Carbamoyl-phosphate synthase indirectly affects glycine levels by the interconversion between glycine and ammonia via the glycine cleavage complex [14]. Glycine is an amino acid involved in gluconeogenesis [15], and decreases in glycine levels are observed in diabetic rats and are associated with obesity and insulin resistance [16,17]. Several previous reports have suggested that CPS1 variants are associated with glycine levels. For instance, a CPS1 variant located in the 3 -untranslated region, rs715, is associated with glycine levels in T2D [14]. In addition, missense variants in the coding region of CPS1 (rs1047891 and rs7422339) are associated with glycine levels [18,19]. Therefore, our results for the association between glycine and a CPS1 variant are consistent with previous results.
A CYP4F2 gene variant, rs2108622 was significantly associated with the PC ae C36:0 level. This is a known functional variant that reduces 20-hydroxyeicosatetraenoic acid (20-HETE) production, and is associated with hypertension, ischemic stroke [20,21], and metabolic syndrome, defined by the MetS score (sum of scores for waist circumference, blood pressure, lipid levels, and glucose levels), in the Swedish population [22]. The minor allele (A allele) of the rs2108622 polymorphism is associated with an increased level of PC ae C36:0. An amino acid substitution induced by rs2108622 was predicted to have a pathogenic effect using both SIFT and Polyphen (predicted as 'deleterious' and 'probably damaging' , respectively) suggesting that this variant may affect PC ae C36:0 levels by alteration of CYP4F2 protein structure. Significant relationships between a decrease in PC ae C36:0 levels and the rs2108622 genotype were also observed for minor allele carriers (AG and AA) in Figure 3. Considering that the level of PC ae C36:0 increases according to T2D progression, as mentioned in Table 2, the minor allele of rs2108622 may be related to an increased risk of T2D. Functional roles of CYP4F2 in T2D development have been reported previously. CYP4F2 encodes a cytochrome P450 (CYP2) protein expressed in human pancreatic islets [23]. CYP4F2 induces the formation and release of 20-HETE, an eicosanoid metabolite of arachidonic acid mediated by CYP2-dependent ω-hydroxylase in human β-cells [24,25]. ω-Hydroxylase-mediated 20-HETE formation and release is stimulated by glucose, and glucose-stimulated 20-HETE formation and insulin secretion are reduced in T2D human islets [23]. Taken together, rs2108622 may affect T2D development by disrupting the function of CYP4F2 in 20-HETE formation in human β-cells.
The present study has some limitations. For instance, the observed associations between non-synonymous variants and T2D-related metabolites were not replicated in other cohorts. Furthermore, it is insufficient to assure the causality of these non-synonymous variants because there is a possibility that association signals of the variants are due to the LDs with causal non-coding variants not being included in exome array. Therefore, further investigations including fine-mapping with GWAS data are needed to validate our results. Nevertheless, we were able to identify novel genetic factors involved in the T2D risk that have not been found in previous GWAS by the investigation of associations with T2D-related metabolites.
In conclusion, our genetic association analysis of T2D-related metabolites focussed on non-synonymous variants to identify causal markers of the T2D risk. Novel non-synonymous variants that are significantly associated with T2D-related metabolite levels were found. Although further investigations are needed to validate our results, these findings extend our understanding of the genetic determinants of T2D.

Ethics Statement
All human investigation was performed according to principles expressed in the Declaration of Helsinki. Written informed consent was given by each participant. The KARE study, including the protocols for subject recruitment, assessment, and obtaining informed consent from participants, was reviewed and approved by an ethics committee (Korea Centers for Disease Control and Prevention Institutional Review Board; approval number: 2017-03-01-P-A).