Copy number variations independently induce autism spectrum disorder

The examination of copy number variation (CNV) is critical to understand the etiology of the CNV-related autism spectrum disorders (ASD). DNA samples were obtained from 64 ASD probands, which were genotyped on an Affymetrix CytoScan HD platform. qPCR or FISH were used as a validation for some novel recurrent CNVs. We further compared the clinical phenotypes of the genes in the Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources (DECIPHER) database with these overlapping genes. Using vast, readily available databases with previously reported clinically relevant CNVs from human populations, the genes were evaluated using Enrichment Analysis and GO Slim Classification. By using the Ploysearch2 software, we identified the interaction relationship between significant genes and known ASD genes. A total of 29 CNVs, overlapping with 520 genes, including 315 OMIM genes, were identified. Additionally, myocyte enhancer factor 2 family (MEF2C) with two cases of CNV overlapping were also identified. Enrichment analysis showed that the 520 genes are most likely to be related to membrane components with protein-binding functions involved in metabolic processes. In the interaction network of those genes, the known ASD genes are mostly at the core position and the significant genes found in our samples are closely related to the known ASD genes. CNVs should be an independent factor to induce autism. With the strategy of our study, we could find the ASDs candidate genes by CNV data and review certain pathogenesis of this disorder. Those CNVs were associated with ASD and they may contribute to ASD by affecting the ASD-related genes.


Introduction
ASDs (OMIM 209850) represent a group of neurodevelopmental disorders present in 1% of the population, characterized by impairments in communication and reciprocal social interactions [1]. Approximately 10% of the ASD population possesses large chromosomal rearrangements [1,2]. In recent years, chromosomal microarray analyses (CMA) have offered high diagnostic ability for the detection of genomic CNVs in clinical diagnostics. While whole-exome and whole-genome technologies identify interesting single nucleotide variants and show promise in detecting in/dels and copy number changes in ASDs, the genetics of ASDs are heterogeneous and not fully understood [2,3]. CNVs alter the genome structure leading to Mendelian diseases and complex traits, including ASDs. The contribution of CNVs to genomic disorders is not only via de novo occurrence, X-linked and recessive inheritance but also through mosaicism, imprinting, digenic inheritance and gene copy number dose dependent effects [4]. Thus, it is be important to relate the genes affected with CNVs to clinical and functional phenotypic features.
Several CNV studies in autism have identified associated genes and loci in European individuals and Han Chinese population controls [5][6][7][8]. Genetic studies on ASD in Han Chinese individuals have primarily been presented as case reports or association studies of particular common single nucleotide polymorphisms in ASD populations of European ancestry [9]. However, in the present study, we identified autism-associated genes from CNVs in 64 ASD samples and databases, and conducted additional analyses of these genes overlapping with enrichment analysis to further improve the results obtained from previous studies [5][6][7][8].

Sample selection
A total of 64 ASD-affected individuals and their families were referred to the Clinical Genetics Service for genetic testing. The probands diagnosis and study inclusion criteria were completed as previously described [9]. The Autism Behavior Checklist (ABC) and Childhood Autism Rating Scale (CARS) were used for the diagnosis. We assessed ASD probands with standardized measures of intelligence, language, and adaptive function and collected information on developmental, medical, and physical measures and family history (Supplementary material). All data were collected with the informed consent of the patients. Ethical approval for the present study was obtained from the hospital.

Consent for publication
The consent to publish has been obtained from the guardians of all the involved children.

Ethical approval
Ethical approval for the present study was obtained from the Ethics Committee of the Third Affiliated Hospital of Guangzhou Medical University. All observed copy number changes were compared with the CNVs annotated in the Database of Genomic Variants (DGV; http://projects.tcag.ca/variation/) and the UCSC genome browser (http://genome.ucsc.edu/). The gene content of CNVs of interest were determined using the UCSC Browser based on GRCH37. CNVs located in segmental duplications and repetitive regions were removed, and CNVs overlapping with the in-house benign CNV database were also removed. Each of the CNVs were further evaluated with respect to the affected genes likely to be associated with ASD phenotypic features observed in the patients.

CNV verification and representation
Two independent methods were used for verifying the accuracy of our CNV algorithm. First, FISH was used when once the novel recurrent CNVs were detected, some of the results had already been published [10]. Second, 20 CNV regions at different sizes were randomly selected and qPCR technology was used for detection. Each PCR detection system was repeated three times and the corresponding control groups were set. Compared with the control region, in the detection regions, only the copy number greater than 1.4 times or less than 0.7 times, could be considered as copy number gain (gene duplication) or copy number loss (gene deletion). In addition, S.D. of the multiple must be less than 1 to assure the reliability of the experiment.

Database mining
In addition, the intersecting CNV regions were extracted as relatively reliable CNVs associated with autism. The CNVs were further analyzed for the functional annotation of genes, transcripts, and drugs and disease relationships, and the relationship between autism and this information was further analyzed. We focused on the affected genes, including encoding and non-coding genes. The genes identified in the database were defined as having suspected gene

Functional analysis
The bioinformatics analysis for gene functional analysis was performed using WebGestalt (http://bioinfo.vanderbilt. edu/webgestalt/) [11]. Moreover, we also analyzed the relation of enriched GO item and the associated genes between them.

Comparison with other similar studies
In order to better study the CNVs found in our study, we also made a comparison with the database. We extracted the CNV regions that contain the ASD-related genes from the clinical variation achieved in the dbVar database (https://www.ncbi.nlm.nih.gov/dbvar) of NCBI and downloaded the ASD-related CNV regions reported in the autism databases, SFARI gene database (https://gene.sfari.org/autdb) and National Database for Autism Research (NDAR) database (https://ndar.nih.gov/). In order to further investigate the relation between those genes and ASD, we first collected the genes associated with ASD reported in the literatures by using the Ploysearch2 software (https://ploysearch.cs.ualberta.ca/) and then identified the interaction/relationship between significant genes and known genes.

Results
Amongst a cohort of ASD patients, SNP array identified 29 (45%, 29/64) CNVs: 9.3% of probands (6/64) carried a recurrent microdeletion/duplication syndrome ( Table 1); and 19 non-recurrent CNVs (Table 2). Additionally, MEF2C with two cases of CNVs overlap were also identified ( Table 3). These CNVs overlapped with 520 genes, including 315 OMIM genes, present on chromosomes 2, 4, 5, 7, 8, 12, 14, 15, 16, and 22. We compared these 520 genes using We-bGestalt (total number of user IDs: 520). A total of 496 user IDs were unambiguously mapped to 487 unique Entrez Gene IDs, and 24 user IDs were mapped to multiple Entrez Gene IDs or could not be mapped to any Entrez Gene ID. Thus, the Enrichment Analysis and GO Slim Classification was based on the 487 unique Entrez Gene IDs. The results of the molecular functional classification, showing each biological process and cellular component category, are presented in Table 4.
Comparing the CNV regions with the CNVs in the above databases (dbVar database, FARI gene database, NDAR database), we found 472 (84.1% of total) CNV regions overlapping with those in the databases (Figure 1). There are 89 CNV regions that do not overlap with the databases. In addition, we compared the CNVs in our ASD samples with CNVs in the normal individuals and identified the CNVs in our ASD samples showing significantly different from  those of normal individuals. Then we compared them with those in the databases. At this time, there were 72 CNVs (87.8% of the total) overlapping with the databases and there were 10 CNVs not overlapping with databases. As those genes show significantly different CNVs amongst ASD patients and normal controls, they are related with the ASD. In the interaction network of those genes, the known ASD genes are mostly at the core position and the significant genes found in our samples are closely related to the known genes. Amongst the significant genes, MAPK11, TSC2, and F10 are also at the core position ( Figure 2).

Discussion
In the present study, we used a conceptually different approach, evaluating CNV hotspots across 64 ASD samples to analyze the human genome using vast, readily available databases with previously reported clinically relevant CNVs from human populations, thereby enhancing the current understanding of the etiology of CNVs in ASDs. Several   [7].
In the present study, one autism-related CNV deletion gene was identified : MEF2C (Table 3). In particular, MEF2C exhibited a high HI score (e.g., 0-10%), indicating that these genes have a higher likelihood of exhibiting HI, with the most obvious pathogenic mechanism being heterozygous loss-of-function mutations (such as large, rare deletions) where a functional copy of a gene is insufficient to maintain the normal function [12]. As it was known that the transcription factors of the myocyte enhancer factor 2 family (MEF2 A-D) are highly expressed in the brain and play a key role in neuronal survival/apoptosis, differentiation and synaptic plasticity [13]. Previous study also showed that MEF2C limits excessive synapse formation during activity-dependent refinement of synaptic connectivity and thus facilitates hippocampal-dependent learning and memory [14]. In fact, increasing evidence has demonstrated the relevance of synapse dysfunction as a major determinant in many neurological diseases including autism [15]. Thus, the results of the present study provided evidence that the MEF2C deletion might be a factor for the induction of autism.
Recent studies have identified more than 103 genes and 44 genomic loci with mutations among individuals with some form of ASD [16]. The CNV-contained genes found in our study, MAPK11, TSC2, and F10 are the known ASD-associated genes, which locate at the core position of interaction network, indicating that those CNVs were associated with ASD and they may contribute to ASD by affecting the ASD-related genes. In the present study, enrichment analysis showed that the 520 genes are most likely to be associated with membrane components with protein-binding functions involving metabolic processes, reflecting the tardive dyskinesia in autism and providing the biological basis of autism susceptibility. Based on the previous study, several metabolic defects have been associated with autistic symptoms with a rate higher than that found in the general population, and, inborn errors of metabolism (IEM) can probably account for less than 5% of individuals [17]. In addition, inherited metabolic disorders (IMD) responsible for ASDs are usually identified via clinical manifestations such as microcephaly, dysmorphic features, convulsions, and hepatosplenomegaly, and, patients with no additional clinical symptoms suggestive of an IMD may be diagnosed as having an idiopathic ASD [18,19]. Indeed, genetic factors leading to the abnormal expression of nerve growth associated proteins, enzymes, receptors, and neurotransmitters result in the abnormal proliferation and differentiation of neurones, including excessive trim, abnormal synaptic connections, and abnormal neural circuits [20][21][22][23]. As a result, it is logical that genes with associated metabolic disorders may also be considered as candidate ASDs' genes.
In addition, genome variations are one of the genetic etiology of human disease. In the field of autism, point mutation is the top priority in the researches related with pathological genetic characteristics [24][25][26]. Several human syndromes derived from a single gene mutation increase the risk for ASD: such as fragile X-chromosome syndrome