Identification of potential diagnostic and prognostic biomarkers for LUAD based on TCGA and GEO databases

Abstract Accumulating evidence has demonstrated that gene alterations play a crucial role in LUAD development, progression, and prognosis. The present study aimed to identify the hub genes associated with LUAD. In the present study, we used TCGA database to screen the hub genes. Then, we validated the results by GEO datasets. Finally, we used cBioPortal, UALCAN, qRT-PCR, HPA database, TCGA database, and Kaplan–Meier plotter database to estimate the gene mutation, gene transcription, protein expression, clinical features of hub genes in patients with LUAD. A total of 5930 DEGs were screened out in TCGA database. Enrichment analysis revealed that DEGs were involved in the transcriptional misregulation in cancer, viral carcinogenesis, cAMP signaling pathway, calcium signaling pathway, and ECM–receptor interaction. The combining results of MCODE and CytoHubba showed that ADCY8, ADRB2, CALCA, GCG, GNGT1, and NPSR1 were hub genes. Then, we verified the above results by GSE118370, GSE136043, and GSE140797 datasets. Compared with normal lung tissues, the expression levels of ADCY8 and ADRB2 were lower in LUAD tissues, but the expression levels of CALCA, GCG, GNGT1, and NPSR1 were higher. In the prognosis analyses, the low expression of ADCY8 and ADRB2 and the high expression of CALCA, GCG, GNGT1, and NPSR1 were correlated with poor OS and poor PFS. The significant differences in the relationship of the expression of 6 hub genes and clinical features were observed. In conclusion, 6 hub genes will not only contribute to elucidating the pathogenesis of LUAD and may be potential therapeutic targets for LUAD.


Introduction
Lung cancer is a common and severe disease which ranks the top among cancers worldwide in terms of mortality for both men and women [1]. In January 2021, the International Agency for Research on Cancer (IARC) published the latest cancer statistics in 2020, according to the statistics published after the investigation of incidence and mortality of 36 cancers in 185 countries, the new cases of lung cancer reached 2.2 million, ranking the second in the number of new cancer cases. About 1.8 million people die from lung cancer, the highest death rate from cancer. In China, the number of new cases and deaths of lung cancer is the highest among all cancers (820 thousand and 710 thousand respectively), accounting for 17.9% and 23.8% of all cancer incidence and mortality [2].
Lung adenocarcinoma (LUAD) is one of the subtypes of non-small cell lung cancer (NSCLC) with high morbidity and mortality [3]. As we all know, the occurrence, development, prognosis, and recurrence of tumors are not only related to the pathological type and clinical stage but also closely related to the expression of tumor genes [4]. With the continuous development of molecular biology and the advocacy of precision medicine, the research and development of targets and targeted drugs in LUAD are becoming more and more mature, and the targeted treatment of LUAD patients is getting more and more attention. A number of genes have been reported to be associated with LUAD, including EGFR, TP53, AKT1, KRAS, and PTEN [5][6][7][8]. At present, although the comprehensive treatment of surgery, chemoradiotherapy, targeted therapy, and immunotherapy are applied, the prognosis of LUAD is still poor due to local recurrence or distant metastasis [1]. Therefore, the molecules involved in the carcinogenesis and progression of LUAD need to be further explored, which will contribute to the development of novel treatment strategies of LUAD.
Previous studies have shown that several genetic abnormalities were associated with the initiation and development of LUAD [9,10], but the pathogenesis contributing to biological properties of LUAD remains inconclusive. At present, with the development of high-throughput molecular detection technology, a large volume of disease-associated bioinformatics data has been produced. High-throughput bioinformatics platforms may promote the analysis of differential gene expression, including microarrays, and have a wide range of applications in medical oncology, particularly in searching for disease-associated biomarkers [11], alternative splicing [12], and gene function prediction [13]. Numerous previous studies have generated a large volume of microarray data, and a number of gene expression profiling studies on LUAD have identified differentially expressed genes (DEGs) in various pathways, molecular functions, and biological processes. By bioinformatic analysis of these microarray data, many new genes associated with disease initiation and progression can be found. In the present study, gene expression profile data in LUAD were extracted from TCGA and GEO, and the data used to investigate the potential hub genes in LUAD.

Data collection and processing
High-throughput gene expression data of LUAD tissues and normal lung tissues were extracted from the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga). These RNA-seq data (HTSeq-count) from Illumina HiSeq RNASeq platform consisted of 502 LUAD samples and 49 adjacent non-cancerous samples, and were achieved from the publicly available Genomic Data Commons (GDC) data portal (https://portal.gdc.cancer.gov/).

Identification of the aberrantly expressed genes in LUAD
In order to identify the DEGs, R software (version 4.0.5) were applied to compare the expression profiles of LUAD tissues with those of normal tissues. Differential expression analysis of individual genes was carried out by edgeR Bioconductor package [14]. The |log 2 (fold change [FC])| > 2, P value < 0.01, and false discovery rate (FDR) < 0.01 were considered as the threshold values for DEG identification. We selected the top 300 up-regulated and 300 down-regulated genes for our study. The heatmap and volcanoplot were performed with 'gplots' and 'ggplot2' packages of R software.

GO and KEGG pathway enrichment analyses of DEGs
The Database for Annotation Visualization and Integrated Discovery (DAVID) [15] is a online tool, which provides a comprehensive set of functional annotation tools for researchers to investigate the biological meaning of genes. To identify the pathways which had the most significant involvement with the genes identified, up-regulated and down-regulated DEGs were submitted into DAVID for Gene ontology (GO) analysis including the biological process (BP), cellular component (CC), and molecular function (MF) enrichment [16] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [17] pathway analysis enrichment, and then visualized in bubble chart by 'ggplot2' in R software. P<0.05 was considered to indicate a statistically significant difference in the functional enrichment analysis.

GSEA of the hub genes aberrantly
Gene set enrichment analysis (GSEA, version 4.1.0) was applied to predict associated up-regulated and down-regulated genes and the signifcantly changed pathways based on the expression profle from TCGA database [21]. In each separate analysis, Student's t-test statistical score was conducted in consistent pathways and the mean of the DEGs was calculated. A permutation test with 1000 times was utilized to detect the signifcantly involved hallmark pathways. The adjusted P-value using Benjamini and Hochberg (BH) and FDR method by default was used to correct for the occurrence of false positive results. Significant involved genes were defined with an adjusted P-value < 0.01 and FDR < 0.25.

Verification of TCGA results by GEO datasets
To verify the results of TCGA, a search was performed in the Gene Expression Omnibus database (GEO, http: //www.ncbi.nlm.nih.gov/geo/). The search strategy was as follows: ('lung cancer' OR 'lung adenocarcinoma'). Gene expression profiles of datasets GSE118370, GSE136043, and GSE140797 were obtained from GEO. The dataset GSE118370 consists of 6 LUAD samples and 6 normal lung samples, GSE136043 consists of 5 LUAD samples and 5 normal lung samples, while GSE140797 consists of 7 LUAD samples and 7 normal lung samples. The data were analyzed on the GPL570 platform Affymetrix Human Genome U133 Plus 2.0 Array and the GPL13497 platform Agilent-026652 Whole Human Genome Microarray 4×44K v2. To identify DEGs between LUAD tissues and normal lung tissues, we employed the Limma package in R software (version 4.0.5) for processing [14]. The adjusted P-value was calculated using Benjamini-Hochberg method for controlling the FDR, thus correcting false positives. Cut-off criterion was defined as P<0.05 and |log 2 FC| > 2. The platform annotation files downloaded from the database were adopted to convert the probe data in the matrix files into gene symbols. The heatmap and volcanoplot were performed with 'gplots' and 'ggplot2' packages of R software. The common expression genes were determined with Jvenn. Then, GO enrichment analysis, KEGG pathway enrichment analysis, and PPI network were performed by the above methods [15][16][17][18][19].

Potential molecular mechanism of the hub genes in LUAD
It could be assumed that the expression of these genes in LUAD could be caused by genetic alterations, including amplification, deletion, or point mutations. Consequently, cBioPortal [22] was used to summarize the possible genetic alterations for these DEGs in LUAD.

Transcription level of the hub genes in LUAD
The expression of the hub genes in LUAD tissues and normal lung tissues were analyzed by online software UALCAN (http://ualcan.path.uab.edu/analysis.html). In this study, we analyzed the difference of hub genes expression from different perspectives of primary LUAD, TNM stage, and nodal metastasis status.

RNA extraction and qRT-PCR of the hub genes
Trizol reagent (Invitrogen, Carlsbad, CA, U.S.A.) was used to extract total RNA from the Beas-2B cells and A549 cells according to the manufacturer's protocol. Superscript II reverse transcriptase and random primers were used to synthesize cDNA. Quantitative real-time PCR (qRT-PCR) was conducted on the ABI 7900HT Sequence Detection System with SYBR-Green dye (Applied Biosystems, Foster City, CA, U.S.A.). All primers were shown in Table 1. The reaction parameters included a denaturation program (10 min at 95 • C), followed by an amplification and quantification program over 45 cycles (15 s at 95 • C and 34 s at 60 • C). Each sample was tested in triplicates, and each sample

Protein expression level of the hub genes in LUAD
The Human Protein Atlas (https://www.proteinatlas.org/) is a database of immunohistochemistry (IHC)-based protein expression profles in cancer tissues, normal tissues, and cell lines [23]. In the present study, the protein expression IHC images of hub genes in clinical specimens of LUAD patients were obtained from this database.

Relationship between the hub genes and prognosis of LUAD
The effects of the hub genes were assessed using Kaplan-Meier plotter database (http://kmplot.com) [24]. The Kaplan-Meier plotter database is capable to assess the effect of 54 k genes on survival in 21 cancer types, including breast cancer, ovarian cancer, lung cancer, and gastric cancer. In addition, LUAD patients obtained from TCGA database were divided into the high expression group and the low expression group based on the median expression of hub genes, and the survival curves were drawn with 'hash' and 'survival' packages of R software.

Diagnostic role of the hub genes in LUAD
The receiver operating characteristic (ROC) curve was used to assess the diagnostic effectiveness of the hub genes in LUAD. All genes expression data were obtained from TCGA database. ROC curve analysis was performed in R software using procedures from the 'pROC' package.

Relationship between the hub genes and clinicopathological features in LUAD
The clinical data of LUAD were extracted from the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga). Gene expression data and clinical data were combined by 'hash' package of R software, then the clinical data were divided into high expression group and low expression group according to the median of expression level of hub genes. The relationship between the expression levels of hub genes and clinical characteristics was analyzed by statistical analysis of the clinical characteristics of high expression group and low expression group, and visualized in forestplot by 'forestplot' package of R software. Statistical analyses were performed using Pearson's chi-squared test in SPSS (version 22.0). The odds ratio (OR) and corresponding 95% confidence interval (CI) of each factor were calculated by STATA (version 14.0) software. Data were considered statistically significant when P<0.05.

Aberrantly expressed genes based on TCGA data in LUAD
The expression level of each gene transformed with log2 was calculated by EdgeR. Following the calculating criteria, we achieved 5930 aberrantly expressed genes in LUAD, including 5208 highly and 722 lowly expressed genes. Then we listed the top 300 up-regulated and 300 down-regulated genes according to the value of |log 2 FC| (Table 2), which demonstrated that these genes might play vital roles in the occurrence of LUAD. The 'gplots' and 'ggplot2' packages of R software were used to draw heatmap and volcanoplot of the 600 genes ( Figure 1).

GO function and KEGG pathway analyses for the DEGs
To identify the pathways which had the most significant involvement with the genes identified, enriched GO categories and KEGG pathways were identified by uploading selected DEGs to DAVID. The results of the GO analysis indicated that in biological process terms, the up-regulated DEGs were mainly enriched in nucleosome assembly, telomere organization, and cellular protein metabolic process ( Figure 2A, Table 3), down-regulated DEGs were significantly enriched in angiogenesis, receptor internalization, and cell surface receptor signaling pathway ( Figure 2B, Table 3). In cell component terms, up-regulated DEGs were mainly enriched in extracellular region, nucleosome, and nuclear chromosome ( Figure 2A, Table 3), whereas down-regulated DEGs were mainly enriched in plasma membrane, extracellular region, and integral component of plasma membrane ( Figure 2B, Table 3). In molecular function terms, up-regulated DEGs were mainly enriched in sequence-specific DNA binding, protein heterodimerization activity, and hormone activity ( Figure 2A, Table 3), whereas down-regulated DEGs were mainly enriched in heparin binding, ion channel binding, and receptor activity ( Figure 2B, Table 3).  Up-regulated  TFF2, REG4, LINC00676, FGB, DEFA5, MAGEA3, ALB, MAGEA6, RNU5B-1, MAGEC1, CALCA, TFF1, MAGEA10,  MAGEA4, LINC01419, FGF19, AFP, CGA, MAGEA12, PRB4, SPINK4, PITX2, TFAP2D, CASP14, PSG1, GC KEGG pathway analysis demonstrated that up-regulated DEGs were enriched in transcriptional misregulation in cancer, viral carcinogenesis, and glucagon signaling pathway ( Figure 2C, Table 3), whereas down-regulated DEGs were significantly enriched in cAMP signaling pathway, calcium signaling pathway, and ECM-receptor interaction ( Figure 2D, Table 3).

PPI network construction and module analysis of DEGs
Interactions between the identified DEGs were revealed by constructing a PPI network. In total, there were 414 nodes and 596 edges in the network ( Figure 3A). Subsequently, CytoHubba plugin was used to identify the 10 hub nodes with the highest degrees (Table 4), including albumin gene (ALB, score: 34), proglucagon gene (GCG, score: 30),     (Table 4).
After combining the results of MCODE and CytoHubba plugins, 6 hub genes were determined, including ADCY8, ADRB2, CALCA, GCG, GNGT1, and NPSR1 ( Figure 3B). The 6 hub genes were loaded into the STRING database, to obtain the PPI data among them, and PPIs with highest interaction score (confidence > 0.4) were selected. In total, there were 6 nodes and 14 edges in the network ( Figure 3C). GSEA enrichment analysis, including ADCY8, ADRB2, CALCA, GCG, GNGT1, and NPSR1 indicated that the low expression of ADCY8 and ADRB2 and the high expression of CALCA, GCG, GNGT1, and NPSR1 were positively correlated with E2F targets, G2M checkpoint, glycolysis, mitotic spindle, and PI3K-Akt-mTOR signaling. The details were illustrated in Figure 3D.

The verification results of GEO database
This section included three gene sets (GSE118370, GSE136043, and GSE140797), of which GSE118370 included 407 DEGs, GSE136043 included 554 DEGs, and GSE140797 included 641 DEGs. The heatmaps and volcanoplots of above three gene sets were shown in Figure 4A. In all included datasets, compared with normal samples, there are 77 common DEGs ( Figure 4B, Table 5). By using DAVID, we found that these DEGs were mainly enriched in cell adhesion, morphogenesis of a branching structure, single organismal cell-cell adhesion, extracellular region, proteinaceous extracellular matrix, plasma membrane, heparin binding, etc. ( Figure 4C, Table 6). At the same time, the analysis of the KEGG pathway showed that 77 DEGs were mainly enriched in 4 pathways, namely, ECM-receptor interaction, dilated cardiomyopathy, focal adhesion, and PI3K-Akt signaling pathway ( Figure 4D, Table 6). By using the STRING database and Cytoscape 3.8.2 software, a PPI network was constructed for the 77 DEGs. The PPI network had a total of 77 nodes and 60 edges, and an interaction score > 0.4 was considered a high-confidence interaction relationship ( Figure 4E). Using Cytoscape plugin MCODE, we identified the top 8 genes with the most connectedness ( Figure 4F), including ADCY8, ADRB2, CALCA, GCG, GNGT1, GRK5, NPSR1, and SSTR1. The 8 connected genes contained the 6 hub genes that we had determined in TCGA database.

Transcription levels of 6 hub genes in LUAD tissues and normal lung tissues
The online database UALCAN was used to analyze the expression of the 6 hub genes in LUAD tissues and normal lung tissues. The results showed that compared with normal lung tissues, the expression levels of ADCY8 and ADRB2 were down-regulated and the expression levels of CALCA, GCG, GNGT1, and NPSR1 were up-regulated in LUAD tissues (all P<0.001) ( Figure 6A). In terms of the TNM stage, the ADCY8 expression level in normal lung tissues was higher than in stage I, stage II, and stage III tissues (all P<0.001), the ADRB2 expression level in normal lung tissues was higher than in stage I, stage II, stage III, and stage IV tissues (all P<0.001), the CALCA expression level in normal lung tissues was lower than in stage I and stage II tissues (all P<0.001), the GCG expression level in normal lung tissues was lower than in stage I, stage II, and stage III tissues (P<0.001, P<0.001, P=0.038, respectively), the GNGT1 expression level in normal lung tissues was higher than in stage I, stage II, stage III, and stage IV tissues (P<0.001, P<0.001, P<0.001, P=0.001, respectively), and the NPSR1 expression level in normal lung tissues was higher than in stage II and stage IV tissues (P<0.001, P=0.011, respectively) ( Figure 6B). In terms of the nodal metastasis, the ADCY8 expression level in normal lung tissues was higher than in N0, N1, N2, and N3 tissues (all P<0.001), the ADRB2 expression level in normal lung tissues was higher than in N0, N1, N2, and N3 tissues (P<0.001, P<0.001, P<0.001, P=0.006, respectively), the CALCA expression level in normal lung tissues was lower than in N0 and N1 tissues (all P<0.001), the GCG expression level in normal lung tissues was lower than in N0 and N1 tissues (P<0.001, P=0.005, respectively), the GNGT1 expression level in normal lung tissues was higher than in N0, N1, and N2 tissues (all P<0.001), the NPSR1 expression level in normal lung tissues was lower than in N0 and N1 tissues (P=0.013, P=0.021, respectively) ( Figure 6C). To further verify the results of bioinformatics analysis, the mRNA levels of the 6 hub genes were determined in Beas-2B cells and A549 cells with qRT-PCR. As illustrated in Figure 6D, the ADCY8 and ADRB2 were significantly down-regulated in A549 cells compared with Beas-2B cells (all P<0.05), while the CALCA, GCG, GNGT1, and NPSR1 were signficantly up-regulated in A549 cells compared with Beas-2B cells (all P<0.05), as predicted by the bioinformatics analysis.

The protein expression levels of 6 hub genes in LUAD tissues
To determine the differentially protein expression of 6 hub genes in LUAD, IHC staining images for the hub genes proteins in LUAD tissues as well as normal lung tissues were obtained from the HPA database. Consistent with the above results, the results showed that the protein expression levels of CALCA, GCG, GNGT1, and NPSR1 were higher in LUAD tissues than that in normal lung tissues, while the protein expression levels of ADCY8 and ADRB2 were lower in LUAD tissues than that in normal lung tissues (Figure 7).

Relationship between the 6 hub genes and prognosis of LUAD patients
The overall survival (OS) and progression-free survival (PFS) were analyzed for patients with LUAD using the Kaplan-Meier survival plot. Briefly, the 6 genes were uploaded to the database and Kaplan-Meier curves were plot-  Figure  8).
Subsequently, we downloaded the survival data of patients with LUAD from TCGA database, combined the survival data and gene expression data, and divided them into high expression group and low expression group to study the effect of 6 hub genes on the prognosis of patients with LUAD, the results showed that high expression of ADRB2 (P=0.00895) and ADCY8 (P=0.00195), and lower expression of CALCA (P=0.00399), GCG (P=0), GNGT1 (P=0.00713), and NPSR1 (P=0.00121) were correlated with significantly better OS in LUAD patients (Figure 8), which were consistent with the results of Kaplan-Meier survival plot.

ROC of the 6 hub genes in patients with LUAD
We obtained the gene expression data and clinical data from TCGA database. ROC curve analysis was performed in R software using procedures from the 'pROC' package. The results of the ROC curves indicated that the 6 hub genes had different specificity and sensitivity in predicting the OS of LUAD patients ( Figure 9). However, all the area under receiver operating characteristics (AUCs) were lower than 55, which indicated that the predictive role of 6 hub genes were poor.

Relationship between the 6 hub genes and clinical features in LUAD patients
There were 489 LUAD patients in TCGA database, including 470 patients with age data, 233 patients ≥65 years old and 237 patients <65 years old. There were 489 patients with gender data, 223 men and 266 women. There were 486 patients with data of primary tumor, 426 patients in stage T1-2 and 60 patients in stage T3-4. There were 475 patients with data of nodal metastasis, 317 patients without nodal metastasis and 158 patients with nodal metastasis. There were 357 patients with data of distant metastasis, 333 patients without distant metastasis and 24 patients with distant metastasis. There were 481 patients with data of TNM stage, 378 patients in stage I-II and 103 patients in stage III-IV (Table 7).
Pearson's chi-squared test showed that the expression level of ADCY8 was negatively correlated with T stage [OR  Table 8).

Discussion
LUAD is one of the important subtype of NSCLC with high morbidity and mortality [3]. LUAD is a product of cumulative genetic, epigenetic, somatic, and endocrine aberrations, therefore, understanding the biological mechanism of LUAD is of critical importance for clinical diagnosis and treatment. As microarrays have a wide range of applications in oncology, including identification of disease-associated biomarkers, alternative splicing, and gene function prediction, it has been widely used to predict the potential therapeutic targets for multiple cancers. In the present study, we extract the microarray data from TCGA, 5208 up-regulated and 722 down-regulated DEGs between LUAD samples and normal samples were identified using bioinformatics analysis. We selected the top 300 up-regulated and 300 down-regulated genes for our study. In order to obtain additional analysis of these DEGs, GO and KEGG analyses were performed using DAVID software.
The GO analysis results indicated that the up-regulated DEGs were primarily associated with nucleosome assembly, telomere organization, cellular protein metabolic process, extracellular region, nucleosome, nuclear chromosome, sequence-specific DNA binding, protein heterodimerization activity, and hormone activity, while the down-regulated DEGs were primarily enriched in angiogenesis, receptor internalization, cell surface receptor signaling pathway, plasma membrane, extracellular region, integral component of plasma membrane, heparin binding, ion channel binding, and receptor activity. Summarily, most of the functional enrichment was related to the structure and function of chromosomes. As we all know, chromosomal instability is a major form of genomic volatility and contributes to abnormal chromosomal structure and numbers. Micro-satellite instability and increased frequency of base-pair mutations are other described forms of genomic instability, and the genome instability that is a hallmark of cancer almost certainly contributes to further alter sequences in regulatory regions that can promote tumor progression [25]. Furthermore, the enriched KEGG pathways of up-regulated DEGs included transcriptional misregulation in cancer, viral carcinogenesis, and glucagon signaling pathway. Previous studies have highlighted the link between disease-associated transcriptional misregulation and various cancers, such as lung cancer [26], breast cancer [27], colorectal cancer [28], and renal cancer [29]. Oliveira et. al [30] found that many DNA viruses targeted multiple cellular pathways to support malignant transformation and tumor development. Down-regulated DEGs were related to cAMP signaling pathway, calcium signaling pathway, and ECM-receptor interaction. The cyclic AMP (cAMP) signaling pathway is activated by cAMP. The increase in cAMP levels activates target molecules, such as cAMP-dependent protein kinase, exchange protein directly activated by cAMP, and cyclic nucleotide-gated ion channels [31]. These target effector molecules regulate various cellular responses, including metabolism, gene expression, proliferation, and apoptosis. Therefore, cAMP signaling has been studied as a target for various disease treatments, including cancer [32]. Various alterations to key molecules of the cAMP signaling pathway have been observed in lung cancer, and phosphodiesterase inhibitors have been shown to synergize with cisplatin to induce apoptosis in a broad panel of human lung cancer cell lines. These findings present cAMP signaling as a promising cellular target for antitumor treatments [33,34]. Calcium is one of the small signaling molecules regulating various biological functions in cells. Cell cycle regulation and cell death have been suggested to closely correlate with the intracellular calcium ion ([Ca2+]i) concentration [35]. Under pathological conditions, such as ischemia-reperfusion injury and oxidative stress, the [Ca2+]i level has been reported to markedly increase in many types of cells. This condition is known as calcium overload, which eventually leads to the activation of pro-apoptotic factors, resulting in apoptosis [36]. Logan et al. [37] reported that pathological Ca 2+ overload triggers cell death. ECM-receptor interaction pathways are the most up-regulated gene-enriched signaling pathways, which play an important role in the process of tumor shedding, adhesion, degradation, movement, and hyperplasia. The roles of ECM in other cancers have been proved. Previous studies reported that the ECM was up-regulated in prostate cancer tissue [38] and participated in the process of tumor invasion and metastasis in gastric cancer [39]. In addition, the ECM in colorectal cancer could promote the development of epithelial-mesenchymal transition (EMT) in cancer cells [40].
By constructing a PPI network with DEGs, our study identified the top 10 degree hub genes, including ADCY8, ADRB2, ALB, CALCA, F2, GCG, GNGT1, INS, NPSR1, and SST. Then, a significant module was subsequently constructed with 14 nodes, which gained the highest MCODE score. After combining the results of MCODE and Cy-toHubba, 6 hub genes were chosen, including ADCY8, ADRB2, CALCA, GCG, GNGT1, and NPSR1. ADCY8 is a member of the adenylyl cyclase family of genes and produces the enzyme adenylyl cyclase (AC8). Orchel et al. [41] found that ADCY8 cause disturbance in the underlying biological processes, which could be important for the pathogenesis of endometrial cancer. ADRB2, located on chromosome 5q31-q32, consists of a single exon of 2,015 nucleotides, encoding a 413 amino acid protein for the beta-2-adrenergic receptor. The beta-adrenergic receptor is a member of the G-protein-coupled adrenergic receptor family and functions in adipose tissue by stimulating lipolysis, which affects lipid mobilization within human fat cells and the regulation of energy expenditure [42]. To date, only one epidemiologic studies have examined the association of genetic variation in ADRB2 with breast cancer risk among postmenopausal breast cancer [43]. The CALCA gene codes for calcitonin, an important regulator of bone calcium metabolism. Previous studies have reported that CALCA is a candidate gene for tumor-specific hypermethylation in cancer [44][45][46]. The expression level of CALCA protein is related to the pathological process of cervical cancer [47], and the methylation of CALCA gene promoter is closely related to the occurrence of cervical cancer [48]. The proglucagon gene (GCG), located on chromosome 2q24.2, can be activated by TCF7L2 in the Wnt signaling pathway, and expresses glucagon-like peptide1 (GLP-1) in the intestine [49]. GLP-1 plays an essential role in regulating blood glucose level by stimulating glucose-dependent insulin secretion [50]. A previous study suggested that higher incidences of pancreatic and medullary thyroid carcinoma in patients treated with GLP1 agonists [51]. GNGT1 located on chromosome 7q21.3 and code the gamma subunits of transducin [52]. Transducin, also known as GMPase, mediates the activation of a cyclic GTP-specific (guanosine monophosphate) phosphodiesterase by rhodopsin. Zhang et al. [53] reported that the expression level of GNGT1 in NSCLC was overexpressed, and the high expression of GNGT1 was significantly associated with worse OS in patients with NSCLC. NPSR1 is a G-protein-coupled receptor that induces intracellular signaling upon stimulation by neuropeptide S (NPS) via mobilization of calcium, increased cyclic adenosine monophospate (cAMP) levels, and activation of the mitogen-activated protein kinase (MAPK) pathway [54,55]. Zhang et al. [53] found that the expression level of NPSR1 in NSCLC was overexpressed. In addition, a previous study suggested that NPSR1 is a marker widely expressed in neuroendocrine tumors (NET) with the exception of adrenal pheochromocytomas, the stimulation of NPSR1 with NPS results in activation of pathways that are relevant for cancer development [56].
Considering the enrichment results of the 6 hub genes in the present study, it was demonstrated that LUAD was associated with E2F targets, G2M checkpoint, glycolysis, mitotic spindle, and PI3K-Akt-mTOR signaling. E2F family members play a major role in cell cycle regulation and DNA synthesis in mammalian cells [57]. Previous studies reported that the expression levels of E2Fs were deregulated in several human malignancies, including lung cancer [58,59], breast cancer [60], and gastrointestinal cancer [61]. The G2M checkpoint is initiated to allow repair of DNA damage prior to mitosis. Therefore, G2M checkpoint abnormal is closely related to genomic instability and induce cells undergoing malignant transformation [62,63]. It has been found that the metabolic switch from mitochondrial respiration to glycolysis during hypoxia (where oxidative phosphorylation will be inactive) as well as mitochondrial dysfunction [64,65] are critical for cancer cell growth. One form of genomic instability found in cancer cells, chromosomal instability, is characterized by losses or gains of chromosomes during cell replication [66]. Some studies suggested that chromosomal instability results from a defective mitotic spindle mechanism that allows segregation of improperly aligned chromosomes during mitosis [67]. Marinov et al. [68] found continuous Akt activation and mTOR phosphorylation in 51% of NSCLC samples and 74% of NSCLC cell lines. The release of regulation of PI3K/Akt/mTOR signal transduction pathway can promote the development of lung cancer, while the application of PI3K inhibitors such as LY294002 can promote NSCLC apoptosis [69].
In addition, we also use GEO database to verify the above results. We found 77 DEGs by screening common differentially expressed genes from GSE118370, GSE136043, and GSE140797 datasets. The function of 77 DEGs was mainly enriched in cell adhesion. Furthermore, the enriched KEGG pathways of 77 DEGs included ECM-receptor interaction, dilated cardiomyopathy, focal adhesion, and PI3K-Akt signaling pathway. Subsequently, we screened out 8 genes through the establishment of PPI network and the application of Cytoscape plugin MCODE, which includes the 6 genes screened from TCGA database. So the 6 genes are used as hub genes for further research.
Subsequently, cBioPortal was used to summarize the possible genetic alterations for 6 hub genes in LUAD. We found that the mutation is the most common mutation in 6 hub genes. Mutations in transcription factors have long been known to contribute to tumorigenesis, and previous studies indicated that overexpressed oncogenic transcription factors could alter the core autoregulatory circuitry of the cell [70]. Mutations in a variety of chromatin regulators have been implicated in development of cancer cells, and the normal functions of these regulators provided some clues to the mechanisms involved in altered gene expression. Loss of function mutations in several nucleosome remodeling proteins are associated with multiple types of cancer [71,72]. In addition, ADCY8, ADRB2, CALCA, GCG, GNGT1, and NPSR1 had genetic alterations, including missense mutation, splice mutation, truncating mutation, fusion, amplification, and deep deletion. However, the clinical potential of these genetic alterations needs to be confirmed with larger sample size and the exact mechanism of these genetic alterations also required in vitro and in vivo verification.
In this paper, we used UALCAN database, RT-PCR, and HPA database to study the expression differences of 6 hub genes between LUAD tissues and normal tissues. The results showed that the expression levels of GCG, GNGT1, NPSR1, and CALCA were higher in LUAD tissues than in normal lung tissues, and the expression levels of ADCY8 and ADRB2 were lower in LUAD tissues than in normal lung tissues. Previous studies have shown that altered expression levels of the 6 hub genes. For example, ADCY8 hypermethylation and altered expression have been observed in endometrial cancer [41,73]. Feigelson et al. [43] showed that high ADRB2 expression promoted the occurrence and development of breast cancer. Lei et al. [47] reported that the expression level of CALCA was positively correlated with cervical lesion pathogenesis. Vangoitsenhoven et al. [51] found that GLP-1, encoded by GCG gene, could lead to pancreatic and medullary thyroid carcinoma. Pulkkinen et al. [56] reported that NPSR1 was a marker widely expressed in neuroendocrine tumor and activated intracellular pathways relevant for cell growth.
During the survival analysis of the present study, KMplot and TCGA database were used to assess the effect of expression levels of the 6 hub genes in patients with LUAD. Notably, high expression of ADCY8 and ADRB2 and low expression of CALCA, GCG, GNGT1, and NPSR1 were correlated with significantly better OS and PFS in LUAD patients. Previous studies have reported the relationship between some of these genes and prognosis of cancers. For example, Wu et al. [74] found that the high expression of ADRB2 was positively relative with the prognosis of hepatocellular carcinoma. Martinelli et al. [75] found that high frequency of CALCA methylation was associated with non-seminomatous tumors and promoter methylation of CALCA predicts poor clinical outcome for testicular germ cell tumors patients. Zhang et al. [53] found that the expression level of GNGT1 in NSCLC was high, and the high expression of GNGT1 was significantly associated with worse OS in patients with NSCLC.
Finally, we analyzed the clinical data of TCGA database to clarify the relationship between the expression level of 6 hub genes and the clinical manifestations of patients with LUAD. The results showed that the expression level of ADCY8 was negatively correlated with T stage, distant metastasis, and TNM stage. The expression level of ADRB2 in female was significantly higher than in male, and the expression level of ADRB2 was negatively correlated with T stage, nodal metastasis, distant metastasis, and TNM stage. The expression level of CALCA in male was significantly lower than in female, and the expression level of CALCA was positively correlated with nodal metastasis, distant metastasis, and TNM stage. The expression level of GCG was negatively correlated with age, and the expression level of GCG was positively correlated with nodal metastasis, distant metastasis, and TNM stage. The expression level of GNGT1 was positively correlated with T stage and nodal metastasis. The expression level of NPSR1 was positively correlated with T stage, nodal metastasis, and TNM stage.
In summary, it can be seen that the expression levels of ADCY8, ADRB2, CALCA, GCG, GNGT1, and NPSR1 are significantly related to the OS and PFS of LUAD, which are also important indicators for the evaluation of the prognosis of LUAD and the evaluation of further treatment. The analysis of the 6 genes indicated that they could effectively distinguish between LUAD tissues and normal tissues, which may increase the accuracy of predicting LUAD.

Data Availability
High-throughput gene expression data of LUAD and normal lung tissues were extracted from the TCGA Data Portal (https: //tcga-data.nci.nih.gov/tcga). These RNA-seq data (HTSeq-count) from Illumina HiSeq RNASeq platform consisted of 502 LUAD samples and 49 adjacent non-cancerous lung tissues, and were achieved from the publicly available Genomic Data Commons (GDC) data portal (https://portal.gdc.cancer.gov/). In addition, gene expression profiles of datasets GSE118370, GSE136043, and GSE140797 were obtained from GEO.