The Critical Thinking
Personality Spectrum Ensemble Select the section that interests you. |
|
The following was partially extracted from:
Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence
Nature Genetics; Published online 2017 May 22; 49: ?-?; SSuzanne Sniekers, Sven Stringer, Kyoko Watanabe, Philip R Jansen, Jonathan R I Coleman, Eva Krapohl, Erdogan Taskesen, Anke R Hammerschlag, Aysu Okbay, Delilah Zabaneh, Najaf Amin, Gerome Breen, David Cesarini, Christopher F Chabris, William G Iacono, M Arfan Ikram, Magnus Johannesson, Philipp Koellinger, James J Lee, Patrik K E Magnusson, Matt McGue, Mike B Miller, William E R Ollier, Antony Payton, Neil Pendleton, Robert Plomin, Cornelius A Rietveld, Henning Tiemeier, Cornelia M van Duijn & Danielle Posthuma
Received 10 January; accepted 24 April; published online 22 May 2017; doi:10.1038/ng.3869
For the entire article, click the link to the article to get to the source.
Abstract
Intelligence is associated with important economic and health-related life outcomes (ref 1). Despite intelligence having substantial heritability (ref 2) (0.54) and a confirmed polygenic nature, initial genetic studies were mostly underpowered (ref 3, ref 4, ref 5). Here we report a meta-analysis for intelligence of 78,308 individuals. We identify 336 associated SNPs (METAL P < 5x10-8) in 18 genomic loci, of which 15 are new. Around half of the SNPs are located inside a gene, implicating 22 genes, of which 11 are new findings. Gene-based analyses identified an additional 30 genes (MAGMA P < 2.73x10-6), of which all but one had not been implicated previously. We show that the identified genes are predominantly expressed in brain tissue, and pathway analysis indicates the involvement of genes regulating cell development (MAGMA competitive P = 3.5x10-6). Despite the well-known difference in twin-based heritability (ref 2) for intelligence in childhood (0.45) and adulthood (0.80), we show substantial genetic correlation (rg = 0.89, LD score regression P = 5.4x10-29). These findings provide new insight into the genetic architecture of intelligence.
Letter
We combined genome-wide association study (GWAS) data for intelligence in 78,308 unrelated individuals from 13 cohorts (Online Methods). Of these individuals, full GWAS results for intelligence on n = 48,698 have been published in two different studies (ref 5, ref 6) (n = 12,441 and n = 36,257, respectively), while GWAS results for the remaining 29,610 individuals have not been published previously. Across the different cohorts, various tests to measure intelligence were used. Therefore - following previous publications on combining intelligence phenotypes across different cohorts (ref 5, ref 7) - the cohorts either calculated Spearman's g or used a primary measure of fluid intelligence (Supplementary Table 1), which is known to correlate highly with g (ref 8). Previous research has shown that many different aspects of intelligence are highly correlated to each other and that Spearman's g captures the latent general intelligence trait, irrespective of the specific tests used to construct it (ref 9, ref 10).
Supplementary Figure 1:All association studies were performed on individuals of European descent; standard quality control procedures included correcting for population stratification and filtering on minor allele frequency (MAF) and imputation quality (Online Methods). As 8 of the 13 cohorts consisted of children (aged <18 years; total n = 19,509) and 5 consisted of adults (n = 58,799; aged 18-78 years), we first performed meta-analysis of the children and adult-based cohorts separately using METAL software (ref 11) and subsequently calculated rg using LD score regression (ref 12). The estimated rg was 0.89 (s.e.m. = 0.08, P = 5.4x10-29), indicating substantial overlap between the genetic variants influencing intelligence in childhood and adulthood, and warranting a combined meta-analysis. The genetic correlations between all individual cohorts were generally larger than 0.80 except for those involving some of the smaller-sized cohorts (n < 4,000), which, given the large standard errors of the rg values, is likely due to the relatively low sample sizes in some of the individual cohorts (Supplementary Table 2). The full meta-analysis of all 13 cohorts (maximum n = 78,308) included 12,104,294 SNPs. The quantile-quantile plot of all SNPs exhibited some inflation (λall = 1.21; Supplementary Fig. 1 and Supplementary Table 3), which is within the expected range for a polygenic trait at the current sample size and heritability (ref 13). We performed LD score regression to quantify the proportion of inflation in the mean χ2 that was due to confounding biases. An intercept of 1.01 and mean χ2 of 1.30 were obtained, suggesting that more than 95% of the inflation was caused by true polygenic signal. SNP-based heritability was estimated at 0.20 (s.e.m. = 0.01) in the total sample, and this was comparable in adults (0.21, s.e.m. = 0.01) and children (0.20, s.e.m. = 0.03). These estimates were obtained using LD score regression and are likely to be biased downward.
Quantile-quantile plots
Quantile-quantile plot for SNP-based P values
(top) and gene-based P values (bottom)
Table 1: Genomic loci and lead SNPs associated with intelligence in thexx
meta-analysis based on n = 78,308
rsID Annotation Locusa Ref Alt RefF z P value Directionb n nGWS rs2490272 FOXO3 intronic 6q21 T C 0.63 7.44 9.96x10-14 ++++-+++ 78,307 28 rs9320913 Intergenic 6q16.1 A C 0.48 6.61 3.79x10-11 ++++-+++ 78,307 13 rs10236197 PDE1C intronic 7p14.3 T C 0.63 6.46 1.03x10-10 +++++-++ 78,286 35 rs2251499 Intergenic 13q33.2 T C 0.26 6.31 2.74x10-10 ++++++++ 78,307 22 rs36093924 CYP2D7 ncRNA_intr 22q13.2 T C 0.46 -6.31 2.87x10-10 ?--????? 54,119 100 rs7646501 Intergenic 3p24.2 A G 0.74 6.02 1.79x10-9 ?++-++++ 65,866 5 rs4728302 EXOC4 intronic 7q33 T C 0.60 -5.97 2.42x10-9 ---+--+- 78,307 45 rs10191758 ARHGAP15 intronic 2q22.3 A G 0.61 -5.93 3.06x10-9 ?--????? 54,119 17 rs12744310 Intergenic 1p34.2 T C 0.22 -5.88 4.20x10-9 ?------- 65,866 28 rs66495454 NEGR1 upstream 1p31.1 G GTCCT 0.62 -5.75 9.08x10-9 ?--????? 54,119 1 rs113315451 CSE1L intronic 20q13.13 A ATTAT 0.43 5.71 1.15x10-8 ?++????? 54,119 1 rs12928404 ATXN2L intronic 16p11.2 T C 0.59 5.71 1.15x10-8 ++++++++ 78,307 19 rs41352752 MEF2C intronic 5q14.3 T C 0.97 -5.68 1.35x10-8 ?--????? 54,119 1 rs13010010 LINC01104 ncRNA_intr 2q11.2 T C 0.38 5.65 1.56x10-8 ++++++++ 78,308 11 rs16954078 SKAP1 intronic 17q21.32 A T 0.21 -5.55 2.84x10-8 ?----+-- 65,866 7 rs11138902 APBA1 intronic 9q21.11 A G 0.54 5.49 .12x10-8 +++++-++ 78,307 1 rs6746731 ZNF638 intronic 2p13.2 T G 0.43 -5.46 4.88x10-8 -----+-- 78,307 1 rs6779302 Intergenic 3p24.3 T G 0.37 -5.45 4.99x10-8 ?--????? 54,119 1
SNP P values and z scores were computed in METAL by a weighted z-score method. A total of 336 SNPs reached
genome-wide significance (P < 5x10-8); 18 independent signals were obtained by linkage disequilibrium (LD)-based
clumping, using an R2 threshold of 0.1 and a window size of 300 kb.
Ref - effect or reference allele; Alt - non-effect or alternative allele;
RefF - effect allele frequency in UK Biobank, based on individuals of European ancestry; z - z score from METAL;
Direction - direction of the effect in each of the cohorts; n - sample size;
nGWS - number of genome-wide significant SNPs in the locus.
a - Cytogenetic band, build hg19. b - Order: CHIC, UKB-wb, UKB-ts, ERF, GENR, HU, MCTFR, STR.
Supplementary Figure 2:The meta-analysis identified 18 independent genome-wide significant loci (Figs. 1 and 2a, and Table 1), including 336 top SNPs (below the genome-wide threshold of significance; Supplementary Table 4). Of the 18 identified loci, 3 have been implicated in intelligence previously: 6q16.1 (ref 14), 7p14.3 and 22q13.2 (ref 6) (Supplementary Table 5). The top SNPs implicated 22 genes, of which 11 were new. Functional annotation of the 336 genome-wide significant SNPs showed that a large proportion were intronic (162/336) (Fig. 2b). Of the 18 lead SNPs, 10 were intronic (Fig. 2b), all were in an active chromatin state (Fig. 2c and Supplementary Figs. 2-4) and 8 SNPs were expression quantitative trait loci (eQTLs; Fig. 2d and Supplementary Tables 4 and 6). Lead SNP rs12928404 (located in the intronic region of ATXN2L) had the highest probability of being a regulatory SNP on the basis of Regulome database score (ref 15) and, of the eight lead SNPs that were eQTLs, this SNP was associated with differential expression of the largest number of genes (n = 14). Focusing on brain tissue, the T allele of this SNP, which was associated with higher intelligence scores, was associated with lower expression of TUFM (Supplementary Table 6).
Regional chromatin state plots
for SNPs with P < 5x10-8
in four genomic loci.
(a-d) Chromatin state plots are included for
4 of the 18 genome-wide significant loci.
The 1p31.1 and 20q13.13 loci are not
included because the lead SNPs in these
regions (rs66495454 and rs113315451) are
indels. In each picture, the top panel shows
the lead SNP (purple) and all other SNPs
reaching genome-wide significance in the
region. The colors represent r2 with the
lead SNP. The bottom panel shows chromatin
states for 127 tissue types (y axis) across
the whole region. Different colors represent
the different states, varying from "active TSS"
(state 1) to "quiescent/low" (state 15). This
information can be used to determine which
SNPs to study in a functional follow-up.
Supplementary Figure 3:We calculated the variance explained (R2) in intelligence by the GWAS results in four independent samples, using LDpred (ref 16) (Online Methods, Supplementary Fig. 5 and Supplementary Table 7). Our calculations show that the current results explain up to 4.8% of the variance in intelligence and that on average across the four samples there is a 1.9-fold increase in explained variance in comparison to the most recent GWAS on intelligence (ref 6).
Regional chromatin state plots
for SNPs with P < 5x10-8
in six genomic loci.
(a-f) Chromatin state plots are included for
6 of the 18 genome-wide significant loci.
Supplementary Figure 4:Apart from a SNP-by-SNP GWAS, we conducted a genome-wide gene association analysis (GWGAS) as implemented in MAGMA (ref 17) (Online Methods). GWGAS relies on converging evidence from multiple genetic variants in the same gene and can yield novel genome-wide significant signals on a gene-based level that are not necessarily picked up by a standard GWAS. The GWGAS identified 47 associated genes (Fig. 3a and Supplementary Table 8). The GWGAS and GWAS identified 17 overlapping genes; thus, the total number of genes implicated either by a SNP hit or by GWGAS was 22 + 47 - 17 = 52. Twelve of the 52 genes have been associated with intelligence previously (Supplementary Table 9). Tissue expression analyses (Online Methods) of the 52 genes using the GTEx data resource showed that 14 of 44 genes for which GTEx data were available were more strongly expressed in the brain than in other tissues (Fig. 3b). Epigenetic states were calculated for 51 of the 52 implicated genes (Online Methods) and showed that 57% of genes were at least weakly transcribed in at least 50% of tissues (Fig. 3c and Supplementary Fig. 6). Pathway analysis for 6,166 Gene Ontology (GO (ref 18)) and 674 Reactome (ref 19) gene sets (obtained from MSigDB (ref 20)) resulted in one associated gene set (GO: regulation of cell development, which is defined as any process that modulates the rate, frequency or extent of the progression of the cell over time, from its formation to the mature structure) (MAGMA competitive P = 3.5x10-6; corrected P = 0.03; Supplementary Tables 10 and 11). This gene set contains four genes that were genome-wide significant - BMPR2, SHANK3, DCC and ZFHX3 - and many other genes that showed weaker association (Supplementary Table 12). Three of the genome-wide significant genes are involved in neuronal function: SHANK3 is involved in synapse formation, DCC encodes a netrin receptor involved in axon guidance and is associated with putamen volume, and ZFHX3 is known to regulate myogenic and neuronal differentiation. The fourth gene, BMPR2, has a role in embryogenesis and endochondral bone formation and has been linked to pulmonary arterial hypertension. The four GO pathways with the subsequent smallest P values are not independent from the top associated gene set and provide insight into more specific functions of the genes driving the observed gene set association. These four gene sets are regulation of nervous system development (P = 3.0x10-5; 87% of genes overlapping with the regulation of cell development pathway, including the four genome-wide significant genes), negative regulation of dendrite development (P = 7.9x10-5; 100% overlapping, thus a complete subset), myelin sheath (P = 8.5x10-5; 14% overlapping) and neuron spine (P = 1.5x10-4; 34% overlapping).
Regional chromatin state plots
for SNPs with P < 5x10-8
in six genomic loci.
(a-f) Chromatin state plots are included for
6 of the 18 genome-wide significant loci.
Supplementary Figure 5:Intelligence has been associated with many socioeconomic and health-related outcomes. We used whole-genome LD score regression (ref 12) to calculate the genetic correlation with 32 traits from these domains for which GWAS summary statistics were available for download. Significant genetic correlations were observed with 14 traits. The strongest, positive genetic correlation was with educational attainment (rg = 0.70, s.e.m. = 0.02, P = 2.5x10-287). Moderate, positive genetic correlations were observed with smoking cessation, intracranial volume, head circumference in infancy, autism spectrum disorder and height. Moderate negative genetic correlations were observed with Alzheimer's disease, depressive symptoms, having ever smoked, schizophrenia, neuroticism, waist-to-hip ratio, body mass index and waist circumference (Fig. 3d and Supplementary Table 13).
Predictive power (R2) of the polygenic
score based on different intelligence
discovery GWAS studies in four
independent hold-out samples.
Comparisons of the explained variance (R2)
in cognitive ability between polygenic scores
based on the current meta-analysis and previous
GWAS studies. The error bars represent the
standard error. Cohorts: HIQ: High IQ sample;
RS: Rotterdam Study; TEDS: Twins Early
Development Study; ACPRC: Age and
Cognitive Performance Research Centre;
Discovery GWAS: Benyamin et al. 2014:
childhood IQ; Davies et al. 2016: UK Biobank
cognitive test (touchscreen). The R2 for HIQ
is reported on the liability scale (assuming a
population prevalence of 3x10-4).
To examine the robustness of the 336 SNPs and 47 genes that reached genome-wide significance in the primary analyses, we sought replication. Because there are no reasonably large GWAS for intelligence available and given the high genetic correlation with educational attainment, which has been used previously as a proxy for intelligence 7, we used the summary statistics from the latest GWAS for educational attainment (ref 21) for proxy-replication (Online Methods). We first deleted overlapping samples, resulting in a sample of 196,931 individuals for educational attainment. Of the 336 top SNPs for intelligence, 306 were available for look-up in educational attainment, including 16 of the independent lead SNPs. We found that the effects of 305 of the 306 available SNPs in educational attainment were sign concordant between educational attainment and intelligence, as were the effects of all 16 independent lead SNPs (exact binomial P < 10-16; Supplementary Table 14). This approach resulted in nine proxy-replicated loci (P < 0.05/16): seven for which the lead SNP was significant (16p11.2, 1p34.2, 2q11.2, 2q22.3, 3p24.3, 6q16.1 and 7q33) and two for which another correlated top SNP in the same locus was significant (3p24.2 and 7p14.3). Of the 47 genes that were significantly associated with intelligence in the GWGAS, 15 were also significantly associated with educational attainment (P < 0.05/47; Supplementary Table 15). Given the high (0.70) but not perfect genetic correlation between educational attainment and intelligence, these results strongly support the involvement of the proxy-replicated SNPs and genes in intelligence.
The strongest emerging association with intelligence is with rs2490272 (6q21) in an intronic region of FOXO3 and neighboring SNPs in the promoter of the same gene. This gene is part of the insulin/insulin-like growth factor 1 signaling pathway and is believed to trigger apoptosis, including neuronal cell death as a result of oxidative stress (ref 22). Moreover, it has been shown to be associated with longevity (ref 23, ref 24). The gene with the strongest association in the GWGAS is CSE1L, which also has a role in apoptosis and cell proliferation (ref 25). Of all 52 genes that were implicated, 35 were reported in the GWAS catalog for a previous association with at least one of 67 distinct traits. Nine genes (ATP2A1, NEGR1, SKAP1, FOXO3, COL16A1, YIPF7, DCC, SH2B1 and TUFM) were previously implicated with body mass index (ref 26, ref 27, ref 28, ref 29), seven (CYP2D6, NAGA, NDUFA6, TCF20 and SEPT3, FAM109B and MEF2C) were implicated with schizophrenia (ref 30) and four (NEGR1, SH2B1, DCC and WNT4) were implicated with obesity (ref 31, ref 32, ref 33). EXOC4 and MEF2C have been associated previously with Alzheimer's disease (Supplementary Tables 16 and 17). Many of the implicated genes are involved in neuronal function, including DCC, APBA1, PRR7, ZFHX3, HCRTR1, NEGR1, MEF2C, SHANK3 and ATXN2L (see the Supplementary Note for the GeneCards summaries).
Supplementary Text
Results of chromatin state mapping for 16 lead SNPs
The loci 1p31.1 and 20q13.13 were not included, because the lead SNPs in these regions (rs66495454 and rs113315451) are indels.
For rs2251499 the consensus state is quiescent/low and the minimum state is weak transcription (Supplementary Fig. 2a).
For rs12928404 the consensus state is strong transcription and the minimum state is transcr. at gene 5' and 3' (Supplementary Fig. 2b).
For rs16954078 the consensus state is weak repressed PolyComb and the minimum state is strong transcription (Supplementary Fig. 2c).
For rs36093924 the consensus state is weak transcription and the minimum state is strong transcription (Supplementary Fig. 2d).
For rs12744310 the consensus state is quiescent/low and the minimum state is weak transcription (Supplementary Fig. 3a).
Forrs6746731 the consensus state is weak transcription and the minimum state is strong transcription (Supplementary Fig. 3b).
For rs13010010 the consensus state is quiescent/low and the minimum state is weak transcription (Supplementary Fig. 3c).
For rs10191758 the consensus state is quiescent/low and the minimum state is strong transcription (Supplementary Fig. 3d).
For rs6779302 the consensus state is quiescent/low and the minimum state is weak transcription (Supplementary Fig. 3e).
For rs7646501 the consensus state is quiescent/low and the minimum state is enhancers (Supplementary Fig. 3f).
For rs41352752 the consensus state is quiescent/low and the minimum state is transcribed (Supplementary Fig. 4a).
For rs9320913 the consensus state is quiescent/low and the minimum state is weak transcription (Supplementary Fig. 4b).
For rs2490272 the consensus state is weak transcription and the minimum state is strong transcription (Supplementary Fig. 4c).
For rs10236197 the consensus state is quiescent/low and the minimum state is weak transcription (Supplementary Fig. 4d).
For rs4728302 the consensus state is weak transcription and the minimum state is weak transcription (Supplementary Fig. 4e).
For rs11138902 the consensus state is quiescent/low and the minimum state is weak transcription. (Supplementary Fig. 4f).
Supplementary Note for the GeneCards summaries
Gene Summaries for implicated genes
We included the gene summaries from GeneCards (http://www.genecards.org) for all genes that were significant in the GWGAS (ordered by P-value) or implicated by single SNP GWAS (the last five in this list):
CSE1L
Proteins that carry a nuclear localization signal (NLS) are transported into the nucleus by the importin-alpha/beta heterodimer. Importin-alpha binds the NLS, while importin-beta mediates translocation through the nuclear pore complex. After translocation, RanGTP binds importin-beta and displaces importin-alpha. Importin-alpha must then be returned to the cytoplasm, leaving the NLS protein behind. The protein encoded by this gene binds strongly to NLS-free importin-alpha, and this binding is released in the cytoplasm by the combined action of RANBP1 and RANGAP1. In addition, the encoded protein may play a role both in apoptosis and in cell proliferation. Alternatively spliced transcript variants have been found for this gene. [provided by RefSeq, Jan 2012]
EXOC4
The protein encoded by this gene is a component of the exocyst complex, a multiple protein complex essential for targeting exocytic vesicles to specific docking sites on the plasma membrane. Though best characterized in yeast, the component proteins and functions of exocyst complex have been demonstrated to be highly conserved in higher eukaryotes. At least eight components of the exocyst complex, including this protein, are found to interact with the actin cytoskeletal remodeling and vesicle transport machinery. The complex is also essential for the biogenesis of epithelial cell surface polarity. Alternate transcriptional splice variants, encoding different isoforms, have been characterized. [provided by RefSeq, Jul 2008]
CYP2D6
This gene encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This protein localizes to the endoplasmic reticulum and is known to metabolize as many as 25% of commonly prescribed drugs. Its substrates include antidepressants, antipsychotics, analgesics and antitussives, beta adrenergic blocking agents, antiarrythmics and antiemetics. The gene is highly polymorphic in the human population; certain alleles result in the poor metabolizer phenotype, characterized by a decreased ability to metabolize the enzyme's substrates. Some individuals with the poor metabolizer phenotype have no functional protein since they carry 2 null alleles whereas in other individuals the gene is absent. This gene can vary in copy number and individuals with the ultrarapid metabolizer phenotype can have 3 or more active copies of the gene. Alternatively spliced transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Jul 2014]
WBP2NL
WBP2NL is a sperm-specific WW domain-binding protein that promotes meiotic resumption and pronuclear development during oocyte fertilization (Wu et al., 2007 [PubMed 17289678]).[supplied by OMIM, Mar 2008]
FOXO3
This gene belongs to the forkhead family of transcription factors which are characterized by a distinct forkhead domain. This gene likely functions as a trigger for apoptosis through expression of genes necessary for cell death. Translocation of this gene with the MLL gene is associated with secondary acute leukemia. Alternatively spliced transcript variants encoding the same protein have been observed. [provided by RefSeq, Jul 2008]
APBA1
The protein encoded by this gene is a member of the X11 protein family. It is a neuronal adapter protein that interacts with the Alzheimer's disease amyloid precursor protein (APP). It stabilizes APP and inhibits production of proteolytic APP fragments including the A beta peptide that is deposited in the brains of Alzheimer's disease patients. This gene product is believed to be involved in signal transduction processes. It is also regarded as a putative vesicular trafficking protein in the brain that can form a complex with the potential to couple synaptic vesicle exocytosis to neuronal cell adhesion. [provided by RefSeq, Jul 2008]
SEPT3
This gene belongs to the septin family of GTPases. Members of this family are required for cytokinesis. Expression is upregulated by retinoic acid in a human teratocarcinoma cell line. The specific function of this gene has not been determined. Alternative splicing of this gene results in two transcript variants encoding different isoforms. [provided by RefSeq, Jul 2008]
NAGA
NAGA encodes the lysosomal enzyme alpha-N-acetylgalactosaminidase, which cleaves alpha-N-acetylgalactosaminyl moieties from glycoconjugates. Mutations in NAGA have been identified as the cause of Schindler disease types I and II (type II also known as Kanzaki disease). [provided by RefSeq, Jul 2008]
STAU1
Staufen is a member of the family of double-stranded RNA (dsRNA)-binding proteins involved in the transport and/or localization of mRNAs to different subcellular compartments and/or organelles. These proteins are characterized by the presence of multiple dsRNA-binding domains which are required to bind RNAs having double-stranded secondary structures. The human homologue of staufen encoded by STAU, in addition contains a microtubule-binding domain similar to that of microtubule-associated protein 1B, and binds tubulin. The STAU gene product has been shown to be present in the cytoplasm in association with the rough endoplasmic reticulum (RER), implicating this protein in the transport of mRNA via the microtubule network to the RER, the site of translation. Five transcript variants resulting from alternative splicing of STAU gene and encoding three isoforms have been described. Three of these variants encode the same isoform, however, differ in their 5'UTR. [provided by RefSeq, Jul 2008]
NDUFA6
No Entrez Gene Summary. GeneCards Summary: NDUFA6 (NADH:Ubiquinone Oxidoreductase Subunit A6) is a Protein Coding gene. Diseases associated with NDUFA6 include Korean Hemorrhagic Fever and Bird Fancier's Lung. Among its related pathways are Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins. and Metabolism. GO annotations related to this gene include NADH dehydrogenase (ubiquinone) activity.
DCAF5
No Entrez Gene Summary. GeneCards Summary: DCAF5 (DDB1 And CUL4 Associated Factor 5) is a Protein Coding gene. Diseases associated with DCAF5 include Leiomyoma. An important paralog of this gene is DCAF6.
EFTUD1
No Entrez Gene Summary. GeneCards Summary: EFL1 (Elongation Factor Like GTPase 1) is a Protein Coding gene. Diseases associated with EFL1 include Shwachman-Diamond Syndrome. Among its related pathways are Ribosome biogenesis in eukaryotes.
DDN
No Entrez Gene Summary. GeneCards Summary: DDN (Dendrin) is a Protein Coding gene. GO annotations related to this gene include RNA polymerase II core promoter proximal region sequence-specific DNA binding and transcription factor activity, RNA polymerase II core promoter proximal region sequence-specific binding.
ZNF407
This gene encodes a zinc finger protein whose exact function is not known. It may be involved in transcriptional regulation. Several alternatively spliced transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Sep 2009]
ZNF638
The protein encoded by this gene is a nucleoplasmic protein. It binds cytidine-rich sequences in double-stranded DNA. This protein has three types of domains: MH1, MH2 (repeated three times) and MH3. It is associated with packaging, transferring, or processing transcripts. Multiple alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. [provided by RefSeq, Jul 2008]
PDE1C
This gene encodes an enzyme that belongs to the 3'5'-cyclic nucleotide phosphodiesterase family. Members of this family catalyze hydrolysis of the cyclic nucleotides, cyclic adenosine monophosphate and cyclic guanosine monophosphate, to the corresponding nucleoside 5'-monophosphates. The enzyme encoded by this gene regulates proliferation and migration of vascular smooth muscle cells, and neointimal hyperplasia. This enzyme also plays a role in pathological vascular remodeling by regulating the stability of growth factor receptors, such as PDGF-receptor-beta. [provided by RefSeq, Jul 2016]
RPL15
Ribosomes, the organelles that catalyze protein synthesis, consist of a small 40S subunit and a large 60S subunit. Together these subunits are composed of 4 RNA species and approximately 80 structurally distinct proteins. This gene encodes a ribosomal protein that is a component of the 60S subunit. The protein belongs to the L15E family of ribosomal proteins. It is located in the cytoplasm. This gene shares sequence similarity with the yeast ribosomal protein YL10 gene. Although this gene has been referred to as RPL10, its official symbol is RPL15. This gene has been shown to be overexpressed in some esophageal tumors compared to normal matched tissues. Alternate splicing results in multiple transcript variants. As is typical for genes encoding ribosomal proteins, there are multiple processed pseudogenes of this gene dispersed through the genome. [provided by RefSeq, Nov 2011]
ATXN2L
This gene encodes an ataxin type 2 related protein of unknown function. This protein is a member of the spinocerebellar ataxia (SCAs) family, which is associated with a complex group of neurodegenerative disorders. Several alternatively spliced transcripts encoding different isoforms have been found for this gene. [provided by RefSeq, Jul 2008]
SH2B1
This gene encodes a member of the SH2-domain containing mediators family. The encoded protein mediates activation of various kinases and may function in cytokine and growth factor receptor signaling and cellular transformation. Alternatively spliced transcript variants have been described. [provided by RefSeq, Mar 2009]
NKIRAS1
No Entrez Gene Summary. GeneCards Summary: NKIRAS1 (NFKB Inhibitor Interacting Ras Like 1) is a Protein Coding gene. Among its related pathways are NF-KappaB Family Pathway and TNF-alpha/NF-kB Signaling Pathway. GO annotations related to this gene include GTP binding and GTPase activity. An important paralog of this gene is NKIRAS2.
TUFM
This gene encodes a protein which participates in protein translation in mitochondria. Mutations in this gene have been associated with combined oxidative phosphorylation deficiency resulting in lactic acidosis and fatal encephalopathy. A pseudogene has been identified on chromosome 17. [provided by RefSeq, Jul 2008]
BMPR2
This gene encodes a member of the bone morphogenetic protein (BMP) receptor family of transmembrane serine/threonine kinases. The ligands of this receptor are BMPs, which are members of the TGF-beta superfamily. BMPs are involved in endochondral bone formation and embryogenesis. These proteins transduce their signals through the formation of heteromeric complexes of two different types of serine (threonine) kinase receptors: type I receptors of about 50-55 kD and type II receptors of about 70-80 kD. Type II receptors bind ligands in the absence of type I receptors, but they require their respective type I receptors for signaling, whereas type I receptors require their respective type II receptors for ligand binding. Mutations in this gene have been associated with primary pulmonary hyper tension, both familial and fenfluramine-associated, and with pulmonary venoocclusive disease. [provided by RefSeq, Jul 2008]
ATP2A1
This gene encodes one of the SERCA Ca(2+)-ATPases, which are intracellular pumps located in the sarcoplasmic or endoplasmic reticula of muscle cells. This enzyme catalyzes the hydrolysis of ATP coupled with the translocation of calcium from the cytosol to the sarcoplasmic reticulum lumen, and is involved in muscular excitation and contraction. Mutations in this gene cause some autosomal recessive forms of Brody disease, characterized by increasing impairment of muscular relaxation during exercise. Alternative splicing results in three transcript variants encoding different isoforms. [provided by RefSeq, Oct 2013]
JMJD1C
The protein encoded by this gene interacts with thyroid hormone receptors and contains a jumonji domain. It is a candidate histone demethylase and is thought to be a coactivator for key transcription factors. It plays a role in the DNA-damage response path way by demethylating the mediator of DNA damage checkpoint 1 (MDC1) protein, and is required for the survival of acute myeloid leukemia. Mutations in this gene are associated with Rett syndrome and intellectual disability. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Dec 2015]
SHANK3
This gene is a member of the Shank gene family. Shank proteins are multidomain scaffold proteins of the postsynaptic density that connect neurotransmitter receptors, ion channels, and other membrane proteins to the actin cytoskeleton and G-protein-coupled signaling pathways. Shank proteins also play a role in synapse formation and dendritic spine maturation. Mutations in this gene are a cause of autism spectrum disorder (ASD), which is characterized by impairments in social interaction and communication, and restricted behavioral patterns and interests. Mutations in this gene also cause schizophrenia type 15, and are a major causative factor in the neurological symptoms of 22q13.3 deletion syndrome, which is also known as Phelan-McDermid syndrome. Additional isoforms have been described for this gene but they have not yet been experimentally verified. [provided by RefSeq, Mar 2012]
ARFGEF2
ADP-ribosylation factors (ARFs) play an important role in intracellular vesicular trafficking. The protein encoded by this gene is involved in the activation of ARFs by accelerating replacement of bound GDP with GTP and is involved in Golgi transport. It contains a Sec7 domain, which may be responsible for its guanine-nucleotide exchange activity and also brefeldin A inhibition. [provided by RefSeq, Jul 2008]
GRK6
This gene encodes a member of the guanine nucleotide-binding protein (G protein)-coupled receptor kinase subfamily of the Ser/Thr protein kinase family. The protein phosphorylates the activated forms of G protein-coupled receptors thus initiating their deactivation. Several transcript variants encoding different isoforms have been described for this gene. [provided by RefSeq, Jul 2008]
RNF123
The protein encoded by this gene contains a C-terminal RING finger domain, a motif present in a variety of functionally distinct proteins and known to be involved in protein-protein and protein-DNA interactions, and an N-terminal SPRY domain. This protein displays E3 ubiquitin ligase activity toward the cyclin-dependent kinase inhibitor 1B which is also known as p27 or KIP1. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Feb 2016]
RNF185
No Entrez Gene Summary. GeneCards Summary: RNF185 (Ring Finger Protein 185) is a Protein Coding gene. Among its related pathways are Protein processing in endoplasmic reticulum. GO annotations related to this gene include ligase activity. An important paralog of this gene is RNF5.
YIPF7
No Entrez Gene Summary. GeneCards Summary: YIPF7 (Yip1 Domain Family Member 7) is a Protein Coding gene. An important paralog of this gene is YIPF5.
GBF1
This gene encodes a member of the Sec7 domain family. The encoded protein is a guanine nucleotide exchange factor that regulates the recruitment of proteins to membranes by mediating GDP to GTP exchange. The encoded protein is localized to the Golgi apparatus and plays a role in vesicular trafficking by activating ADP ribosylation factor 1. The encoded protein has also been identified as an important host factor for viral replication. Multiple transcript variants have been observed for this gene. [provided by RefSeq, Dec 2010]
PEF1
This gene encodes a calcium-binding protein belonging to the penta-EF-hand protein family. The encoded protein has been shown to form a heterodimer with the programmed cell death 6 gene product and may modulate its function in Ca(2+) signaling. Alternative splicing results in multiple transcript variants and a pseudogene has been identified on chromosome 1.[provided by RefSeq, May 2010]
COL16A1
This gene encodes the alpha chain of type XVI collagen, a member of the FACIT collagen family (fibril-associated collagens with interrupted helices). Members of this collagen family are found in association with fibril-forming collagens such as type I and II, and serve to maintain the integrity of the extracellular matrix. High levels of type XVI collagen have been found in fibroblasts and keratinocytes, and in smooth muscle and amnion. [provided by RefSeq, Jul 2008]
DCC
This gene encodes a netrin 1 receptor. The transmembrane protein is a member of the immunoglobulin superfamily of cell adhesion molecules, and mediates axon guidance of neuronal growth cones towards sources of netrin 1 ligand. The cytoplasmic tail interacts with the tyrosine kinases Src and focal adhesion kinase (FAK, also known as PTK2) to mediate axon attraction. The protein partially localizes to lipid rafts, and induces apoptosis in the absence of ligand. The protein functions as a tumor suppressor, and is frequently mutated or downregulated in colorectal cancer and esophageal carcinoma. [provided by RefSeq, Oct 2009]
PRR7
No Entrez Gene Summary. GeneCards Summary: PRR7 (Proline Rich 7 (Synaptic)) is a Protein Coding gene.
CCDC101
CCDC101 is a subunit of 2 histone acetyltransferase complexes: the ADA2A (TADA2A; MIM 602276)-containing (ATAC) complex and the SPT3 (SUPT3H; MIM 602947)-TAF9 (MIM 600822)-GCN5 (KAT2A; MIM 602301)/PCAF (KAT2B; MIM 602303) acetylase (STAGA) complex. Both of these complexes contain either GCN5 or PCAF, which are paralogous acetyltransferases 1. [supplied by OMIM, Apr 2010]
ARHGAP15
RHO GTPases (see ARHA; MIM 165390) regulate diverse biologic processes, and their activity is regulated by RHO GTPase-activating proteins (GAPs), such as ARHGAP15 2. [supplied by OMIM, Mar 2008]
SEPT4
This gene is a member of the septin family of nucleotide binding proteins, originally described in yeast as cell division cycle regulatory proteins. Septins are highly conserved in yeast, Drosophila, and mouse, and appear to regulate cytoskeletal organization. Disruption of septin function disturbs cytokinesis and results in large multinucleate or polyploid cells. This gene is highly expressed in brain and heart. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. One of the isoforms (known as ARTS) is distinct; it is localized to the mitochondria, and has a role in apoptosis and cancer. [provided by RefSeq, Nov 2010]
ZFHX3
This gene encodes a transcription factor with multiple homeodomains and zinc finger motifs, and regulates myogenic and neuronal differentiation. The encoded protein suppresses expression of the alpha-fetoprotein gene by binding to an AT-rich enhancer motif. The protein has also been shown to negatively regulate c-Myb, and transactivate the cell cycle inhibitor cyclin-dependent kinase inhibitor 1A (also known as p21CIP1). This gene is reported to function as a tumor suppressor in several cancers, and sequence variants of this gene are also associated with atrial fibrillation. Multiple transcript variants expressed from alternate promoters and encoding different isoforms have been found for this gene. [provided by RefSeq, Sep 2009]
EEA1
No Entrez Gene Summary. GeneCards Summary: EEA1 (Early Endosome Antigen 1) is a Protein Coding gene. Diseases associated with EEA1 include Subacute Cutaneous Lupus Erythematosus and Cat-Scratch Disease. Among its related pathways are Tuberculosis and Cytoskeletal Signaling. GO annotations related to this gene include protein homodimerization activity and 1-phosphatidylinositol binding. An important paralog of this gene is FYCO1.
WNT4
The WNT gene family consists of structurally related genes which encode secreted signaling proteins. These proteins have been implicated in oncogenesis and in several developmental processes, including regulation of cell fate and patterning during embryogenesis. This gene is a member of the WNT gene family, and is the first signaling molecule shown to influence the sex-determination cascade. It encodes a protein which shows 98% amino acid identity to the Wnt4 protein of mouse and rat. This gene and a nuclear receptor known to antagonize the testis-determining factor play a concerted role in both the control of female development and the prevention of testes formation. This gene and another two family members, WNT2 and WNT7B, may be associated with abnormal proliferation in breast tissue. Mutations in this gene can result in Rokitansky-Kuster-Hauser syndrome and in SERKAL syndrome. [provided by RefSeq, Jul 2008]
DRG1
No Entrez Gene Summary. GeneCards Summary: DRG1 (Developmentally Regulated GTP Binding Protein 1) is a Protein Coding gene. GO annotations related to this gene include identical protein binding and transcription factor binding.
IP6K1
This gene encodes a member of the inositol phosphokinase family. The encoded protein may be responsible for the conversion of inositol hexakisphosphate (InsP6) to diphosphoinositol pentakisphosphate (InsP7/PP-InsP5). It may also convert 1,3,4,5,6-pentakisphosphate (InsP5) to PP-InsP4. Alternatively spliced transcript variants have been described. [provided by RefSeq, Jun 2011]
APOBR
Apolipoprotein B48 receptor is a macrophage receptor that binds to the apolipoprotein B48 of dietary triglyceride (TG)-rich lipoproteins. This receptor may provide essential lipids, lipid-soluble vitamins and other nutrients to reticuloendothelial cells. If overwhelmed with elevated plasma triglyceride, the apolipoprotein B48 receptor may contribute to foam cell formation, endothelial dysfunction, and atherothrombogenesis. [provided by RefSeq, Jul 2008]
HCRTR1
The protein encoded by this gene is a G-protein coupled receptor involved in the regulation of feeding behavior. The encoded protein selectively binds the hypothalamic neuropeptide orexin A. A related gene (HCRTR2) encodes a G-protein coupled receptor that binds orexin A and orexin B. [provided by RefSeq, Jan 2009]
PIK3IP1
No Entrez Gene Summary. GeneCards Summary: PIK3IP1 (Phosphoinositide-3-Kinase Interacting Protein 1) is a Protein Coding gene. GO annotations related to this gene include phosphatidylinositol 3-kinase catalytic subunit binding.
TCF20
This gene encodes a transcription factor that recognizes the platelet-derived growth factor-responsive element in the matrix metalloproteinase 3 promoter. The encoded protein is thought to be a transcriptional coactivator, enhancing the activity of transcription factors such as JUN and SP1. Mutations in this gene are associated with autism spectrum disorders. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Sep 2015]
SKAP1
This gene encodes a T cell adaptor protein, a class of intracellular molecules with modular domains capable of recruiting additional proteins but that exhibit no intrinsic enzymatic activity. The encoded protein contains a unique N-terminal region followed by a PH domain and C-terminal SH3 domain. Along with the adhesion and degranulation-promoting adaptor protein, the encoded protein plays a critical role in inside-out signaling by coupling T-cell antigen receptor stimulation to the activation of integrins. [provided by RefSeq, Jul 2008]
FAM109B
No Entrez Gene Summary. GeneCards Summary: FAM109B (Family With Sequence Similarity 109 Member B) is a Protein Coding gene. GO annotations related to this gene include protein homodimerization activity. An important paralog of this gene is FAM109A.
MEF2C
This locus encodes a member of the MADS box transcription enhancer factor 2 (MEF2) family of proteins, which play a role in myogenesis. The encoded protein, MEF2 polypeptide C, has both trans-activating and DNA binding activities. This protein may play a role in maintaining the differentiated state of muscle cells. Mutations and deletions at this locus have been associated with severe mental retardation, stereotypic movements, epilepsy, and cerebral malformation. Alternatively spliced transcript variants have been described. [provided by RefSeq, Jul 2010]
NEGR1
No Entrez Gene Summary. GeneCards Summary: NEGR1 (Neuronal Growth Regulator 1) is a Protein Coding gene. Diseases associated with NEGR1 include Podoconiosis and Obesity. Among its related pathways are Cell adhesion molecules (CAMs). An important paralog of this gene is LSAMP.
ATP2A1-AS1
No Entrez Gene Summary. GeneCards Summary: ATP2A1-AS1 (ATP2A1 Antisense RNA 1) is an RNA Gene, and is affiliated with the non-coding RNA class.
Genetic correlation with Alzheimer's disease for different age groups
Since Alzheimer's variants could be affecting cognitive abilities through cognitive decline in older subjects, we calculated the genetic correlation between intelligence and Alzheimer's disease for three different age groups: 1. UKB group (aged 40-77.5): rg=-0.33, SE=0.10, P=1.7x10-3 2. Adults (aged 18-78): rg=-0.35, SE=0.11, P=1.1x10-3 3. Children (aged < 18): rg=-0.30, SE=0.11, P=6.2x10-3.
As can be seen, the rg's are very similar across age (which we would expect given the high genetic correlation between intelligence in children and adults that we found), suggesting that the observed genetic correlation between Alzheimer's disease and intelligence based on the full sample is not influenced by one particular age group.
Independent datasets available for PRS
1. Manchester and Newcastle Longitudinal Studies of Cognitive Ageing Cohorts
The University of Manchester Age and Cognitive Performance Research Centre (ACPRC) programme was established in 1983 and this study has documented longitudinal trajectories in cognitive function in a large sample of older adults in the North of England, UK 3. Recruitment took place in Newcastle and Greater Manchester between 1983 and 1992. At the outset of the study, 6063 volunteers were available (1825 men and 4238 women), with a median age of 65 years (range 44 to 93 years). Over the period 1983 to 2003, two alternating batteries of cognitive tasks applied biennially were designed to measure fluid and crystallized aspects of intelligence. These included: the Alice Heim 4 (AH4) parts 1 and 2 tests of general intelligence, Mill Hill Vocabulary A and B Tests, the Cattell and Cattell Culture Fair intelligence tests, and the Wechsler Adult Intelligence Scale Vocabulary test. Detailed task descriptions were provided previously (ref 3).
Following informed consent, venesected whole blood was collected for DNA extraction in approximately 1600 volunteers who had continued to participate in the longitudinal study in 1999 - 2004 which constitutes the Dyne-Steel DNA bank for the genetics of ageing and cognition. Ethical approval for all projects was obtained from the University of Manchester.
To represent crystallized intelligence (g c), we used the Mill Hill Vocabulary A and B Tests in the Manchester and Newcastle samples. For fluid-type intelligence (g f) in the Manchester and Newcastle samples empirical Bayes estimates for each individual were obtained from a random effects model fitted by maximum likelihood (ML) to the standardized age-regressed residuals obtained for each sex from the Alice Heim 4 test and the Cattell Culture Fair test scores. The phenotypes for g c were corrected for age and gender and the phenotypes for g f were corrected for age and derived separately for males and females. The standardized residuals were used for all subsequent analyses.
Participants had DNA extracted and were genotyped for 599,011 common single nucleotide polymorphisms (SNPs) using the Illumina 610-Quadv1 chip. Stringent quality control analyses of the genotype data were applied, after which 549,692 of the 599,011 SNPs on the Illumina 610 chip in 1,558 individuals were retained. Individuals were excluded from this study based on unresolved gender discrepancy, relatedness, call rate (≤ 0.95), and evidence of non-Caucasian descent.
SNPs were included in the analyses if they met the following conditions: call rate ≥ 0.98, minor allele frequency ≥ 0.01, and Hardy-Weinberg equilibrium test with P ≥ 10-3. Each cohort was tested for population stratification and any outliers were excluded. More details can be found in ref. 4.
2. Twins Early Development Study
The Twins Early Development Study (TEDS) is a multivariate longitudinal study that recruited over 11,000 twin pairs born in England and Wales in 1994, 1995 and 1996. Both the overall TEDS sample and the genotyped subsample have been shown to be representative of the UK population (ref 5, ref 6, ref 7). The project received approval from the Institute of Psychiatry ethics committee (05/Q0706/228) and parental consent was obtained before data collection. For the current study, we selected individuals that were not included in ref. 8, which resulted in a sample of N=1,173 available for PRS analyses. DNA was extracted from saliva and buccal cheek swab samples and hybridized to HumanOmniExpressExome-8v1.2 genotyping arrays at the Institute of Psychiatry, Psychology and Neuroscience Genomics & Biomarker Core Facility. The raw image data from the array were normalized, pre-processed, and filtered in GenomeStudio according to Illumina Exome Chip SOP v1.4. ( http://confluence.brc.iop.kcl.ac.uk:8090/display/PUB/Production+Version%3A+Illumina+Exome+Chip+SOP+v1.4). In addition, prior to genotype calling, 869 multi-mapping SNPs and 353 samples with call rate < 0.95 were removed. The ZCALL program was used to augment the genotype calling for samples and SNPs that passed the initial QC. Samples were removed from subsequent analyses on the basis of call rate (< 0.99), suspected non-European ancestry, heterozygosity, array signal intensity, and relatedness. SNPs were excluded if the minor allele frequency was < 5%, if more than 1% of genotype data were missing, or if the Hardy Weinberg P-value was lower than 10-5. Non-autosomal markers and indels were removed. Association between the SNP and the platform, batch, or plate on which samples were genotyped was calculated and SNPs with an effect P-value smaller than 10-3 were excluded. After alignment to the 1000 Genomes (phase 3) reference data, 3,617 individuals and 515,536 SNPs remained. A principal component analysis was performed on a subset of 42,859 common (MAF > 5%) autosomal HapMap3 SNPs 9, after stringent pruning to remove markers in linkage disequilibrium (R2 > 0.1) and excluding high linkage disequilibrium genomic regions so as to ensure that only genome-wide effects were detected. Thirty PCs were used in the present analyses.
Individuals were tested on two verbal tests at the age of 12, the WISC-III-PI Multiple Choice Information (General Knowledge) and Vocabulary Multiple Choice subtests 10, and on two nonverbal reasoning tests, the WISC-III-UK Picture Completion 10 and Raven's Standard and Advanced Progressive Matrices (ref 11, ref 12), which were all administered online (ref 13, ref 14). g-scores were derived as the arithmetic mean of the four standardized test scores. The residuals after regressing the measure on sex and age at assessment were used. These were obtained using the rstandard function of the lm package in R (version 3.2.2), which produces standardized residuals via normalization to unit variance using the overall error variance of the residuals.
3. High IQ Sample
Individuals with extremely high intelligence were recruited from the top 1% of the Duke University Talent Identification Program (ref 15) (TIP), which recruits from the top 3% of the intelligence distribution. DNA was collected using buccal swabs. Illumina Omni Express genotypes were available for 1,236 white European Caucasian individuals following quality control. A population comparison cohort was obtained from The University of Michigan Health and Retirement Study (HRS). Details about the HRS can be found on (http://hrsonline.isr.umich.edu/). DNA was extracted from saliva. Genotypes were available from the Illumina Human Omni-2.5 Quad Beadchip, with a coverage of 2.5 million SNPs. Genotype data were obtained through dbGaP (accession: phs000428.v2.p2). After quality control and ancestry-matching to the TIP participants, genotypes were available for 8,168 white Caucasian individuals. All individuals were imputed to the Haplotype Reference Consortium reference panel (rv1.1), using P BWT (ref 16) 16 as implemented in the Sanger Imputation Server (imputation.sanger.ac.uk). SNPs taken forward to analyses had INFO > 0.9, MAF ≥ 0.01, call rate > 99.9% and Hardy-Weinberg P <10-8. Samples had call rate > 98%, heterozygosity < 4 standard deviations from the mean, and one of each pair of related samples was removed (r > 0.025). For the analyses performed in LDpred high IQ individuals were treated as "cases" and population comparisons as controls. All analyses were controlled for gender and 10 principal components.
4. Rotterdam Study
The Rotterdam Study is a large population-based cohort study in the Netherlands among individuals aged ≥ 45 years and residing in the Ommoord area, a suburb of Rotterdam (ref 17). The current study includes all participants under 60 years of age for whom genotypic information was available, who underwent cognitive testing at the study centre from 2002 onwards, and have been approved by the medical ethics committee according to the Population Study Act Rotterdam Study, executed by the Ministry of Health, Welfare and Sports of the Netherlands. Written informed consent was obtained from all participants. Genotype data were collected on Illumina 550, Illumina 550duo and Illumina 610 quad SNP arrays. Variants were filtered on MAF < 0.01, call rate < 95% and Hardy-Weinberg P <10-6. Individuals were filtered based on genotype missingness rate > 0.05, gender mismatch and relatedness (one of each pair of individuals with IBD > 0.185). Analyses were restricted to individuals from Northern European ancestry, resulting in a sample size of 2,015.
Participants underwent detailed cognitive assessment with a neuropsychological test battery comprising of the letter-digit substitution task (number of correct digits in one minute), the verbal fluency test (animal categories), the Stroop test (error-adjusted time in seconds for Stroop reading and interference tasks), and a 15-word learning test (delayed recall). To obtain a measure of global cognitive function, a compound score (g-factor) was computed based on the aforementioned tests using principal component analysis. The g-factor explained 56.0% of the variance in cognitive test scores in the population.
Additional Acknowledgements
see full article
Summary
In conclusion, we conducted a meta-analysis GWAS and GWGAS for intelligence, including 13 cohorts and 78,308 individuals. We confirmed 3 loci and 12 genes, and identified 15 new genomic loci and 40 new genes for intelligence. Pathway analysis demonstrated the involvement of genes regulating cell development. We showed genetic overlap with several neuropsychiatric and metabolic disorders. These findings provide starting points for understanding the molecular neurobiological mechanisms underlying intelligence, one of the most investigated traits in humans.
URLs.
UK Biobank - http://www.ukbiobank.ac.uk
genotyping and quality control of UK Biobank - http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=155580
CHIC summary statistics - http://ssgac.org/documents/CHIC_Summary_Benyamin2014.txt.gz
SNPTEST - https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html
MAGMA - http://ctg.cncr.nl/software/magma
MSigDB - http://software.broadinstitute.org/gsea/msigdb/collections.jsp
METAL - http://genome.sph.umich.edu/wiki/METAL_Program
LD score regression (LDSC) - https://github.com/bulik/ldsc
METHODS
Methods, including statements of data availability and any associated accession codes and references, are available in the online version of the paper.
Note: Any Supplementary Information and Source Data files are available in the online version of the paper (below).
ONLINE METHODS
Discovery sample.
The current study was based on 78,308 individuals. The origin of the samples is as follows:
- (1) UK Biobank web-based measure (UKB-wb; n = 17,862). GWAS results have not yet been published; raw genotypic data were available for the present study.
- (2) UK Biobank touchscreen measure (UKB-ts; n = 36,257, non-overlapping with UKB-wb). Results have been published before 6; raw genotypic data were available for the present study.
- (3) CHIC consortium 5 (n = 12,441). Results have been published before; meta-analysis summary statistics were available for the present study.
- (4) Five additional cohorts (n = 11,748). For these, 69 SNP associations with IQ have previously been published as part of a lookup effort 7, but full GWAS results have not been published previously. Per-cohort full GWAS summary statistics were available for the present study.
We describe these data sets in more detail below.
- UK Biobank samples (UKB-wb, UKB-ts). We used the data provided by the UK Biobank Study 35 resource (see URLs), which is a major national health resource including >500,000 participants.
- All participants provided written informed consent;
- the UK Biobank received ethical approval from the National Research Ethics Service Committee North West-Haydock (reference 11 /NW/0382), and all study procedures were performed in accordance with the World Medical Association Declaration of Helsinki ethical principles for medical research.
- The current study was conducted under UK Biobank application number 16406.
The study design of the UK Biobank has been described in detail elsewhere 35,36. Briefly, invitation letters were sent out in 2006-2010 to ~9.2 million individuals, including all people aged 40-69 years who were registered with the National Health Service and living up to ~25 miles from one of the 22 study assessment centers. A total of 503,325 participants were subsequently recruited into the study 35. Apart from registry-based phenotypic information, extensive self-reported baseline data have been collected by questionnaire, in addition to anthropometric assessments and DNA collection. For the present study, we used imputed data obtained from UK Biobank (May 2015 release) including ~73 million genetic variants in 152,249 individuals. Details on the data are provided elsewhere (see URLs). In summary, the first ~50,000 samples were genotyped on the UK BiLEVE Axiom array, and the remaining ~100,000 samples were genotyped on the UK Biobank Axiom array. After standard quality control of the SNPs and samples, which was centrally performed by UK Biobank, the data set comprised 641,018 autosomal SNPs in 152,256 samples for phasing and imputation. Imputation was performed with a reference panel that included the UK10K haplotype panel and the 1000 Genomes Project Phase 3 reference panel.
We used two fluid intelligence phenotypes from the Biobank data set. These are based on questionnaires that were taken either in the assessment center at the initial intake ('touchscreen', field 20016) or at a later moment at home ('web-based', field 20191). The measures indicate the number of correct answers out of 13 fluid intelligence questions. The data distribution roughly approximates a normal distribution.
For the analyses in our study, we only included individuals of European descent. After removal of related individuals and those with discordant sex, who withdrew consent or had missing phenotype data, 36,257 individuals remained for analysis for the fluid intelligence touchscreen measure and 28,846 remained for the web-based version. As 10,984 individuals had taken both the touchscreen and web-based test, we only included the data from the touchscreen test for these individuals. This resulted in 54,119 individuals with a score on either the fluid intelligence web-based (UKB-wb) or touchscreen (UKB-ts) version (Supplementary Table 1). At the time of taking the test, the age of the participants ranged between 40 and 78 years. Half of the participants were between 40 and 60 years old, 44% were between 60 and 70 years old and 6% were older than 70 years. The mean age was 58.98 years with a standard deviation of 8.19.
Summary statistics from the CHIC consortium. We downloaded the publicly available combined GWAS results from the meta-analyses as reported by CHIC 5 (see URLs). Details on the included cohorts and performed analyses are reported in the original publication 5. Briefly, CHIC includes six cohorts totaling 12,441 individuals: the Avon Longitudinal Study of Parents and Children (ALSPAC, n = 5,517), the Lothian Birth Cohorts of 1921 and 1936 (LBC1921, n = 464; LBC1936, n = 947), the Brisbane Adolescent Twin Study subsample of the Queensland Institute of Medical Research (QIMR, n = 1,752), the Western Australian Pregnancy Cohort Study (Raine, n = 936) and the Twins Early Development Study (TEDS, n = 2,825). All individuals are children aged from 6-18 years. Within each cohort, the cognitive performance measure was adjusted for sex and age and principal components were included to adjust for population stratification. See also Supplementary Table 1.
Full GWAS data from additional cohorts. We used the same additional (non-CHIC) cohorts as described in detail in ref. 7, which included 11,748 individuals from five cohorts. In ref. 7, results were only reported for 69 SNPs, as these served as a secondary analysis for a lookup effort. In the current study, we used the full genome-wide results from these cohorts. GWAS were conducted in 2013, and summary statistics were obtained from the PIs of the five cohorts. The quality control protocol entailed excluding SNPs with MAF <0.01, imputation quality score <0.4, Hardy-Weinberg P value <1x10-6 and call rate <0.95 (ref 7). The five cohorts included the Erasmus Rucphen Family Study (ERF, n = 1,076), the Generation R Study (GenR, n = 3,701), the Harvard/Union Study (HU, n = 389), the Minnesota Center for Twin and Family Research Study (MCTFR, n = 3,367) and the Swedish Twin Registry Study (STR, n = 3,215). Detailed descriptions of these cohorts are provided in ref. 7 and summarized in Supplementary Table 1. Within each cohort, the cognitive performance measure was adjusted for sex and age and principal components were included to adjust for population stratification.
SNP analysis in the UK Biobank sample.
Association tests were performed in SNPTEST (ref. 37) (see URLs), using linear regression. Both phenotypes were corrected for a number of covariates, including age, sex and a minimum of five genetically determined principal components, depending on how many were associated with the phenotype (5 for the web-based test and 15 for the touchscreen version, tested by linear regression). Additionally, we included the Townsend deprivation index as a covariate, which is based on postal code and measures material deprivation. The touchscreen version of the phenotype was also corrected for assessment center and genotyping array. SNPs with imputation quality <0.8 and MAF <0.001 (based on all Europeans present in the total sample) were excluded after the association analysis, resulting in 12,573,858 and 12,595,966 SNPs for the touchscreen and web-based test, respectively.
Gene analysis.
The SNP-based P values from the meta-analysis were used as input for the gene-based analysis. We used all 19,427 protein-coding genes from the NCBI 37.3 gene definitions as the basis for a genome-wide gene association analysis (GWGAS) in MAGMA (see URLs). After SNP annotation, there were 18,338 genes that were covered by at least one SNP. Gene association tests were performed taking LD between SNPs into account. We applied a stringent Bonferroni correction to account for multiple testing, setting the genome-wide threshold for significance at 2.73x10-6.
Pathway analysis.
We used MAGMA to test for association of predefined gene sets with intelligence. A total of 6,166 GO and 674 Reactome gene sets were obtained (see URLs). We computed competitive P values, which are less likely to be below the threshold of significance than self-contained P values. Competitive P values are the outcomes of the test that the combined effect of genes in a gene set is significantly larger than the combined effect of all other genes, whereas self-contained P values are informative when testing against the null hypothesis of no association. Self-contained P values are not interpreted and not reported by us. Competitive P values were corrected for multiple testing using MAGMA's built-in empirical multiple-testing correction with 10,000 permutations.
Meta-analysis.
Meta-analysis of the results of the 13 cohorts was performed in METAL (ref 11) (see URLs). We did not include SNPs that were not present in the UK Biobank sample. The analysis was based on P values, taking sample size and direction of effect into account using the sample size scheme.
Genetic correlations.
Genetic correlations (rg) were calculated between intelligence and 32 other traits for which summary statistics from GWAS were publicly available, using LD score regression (see URLs). This method corrects for sample overlap, by estimating the intercept of the bivariate regression. A conservative Bonferroni-corrected threshold of 1.56x10-3 was used to determine significant correlations.
Functional annotation.
We identified all SNPs that had an R2 value of 0.1 or higher with the 18 independent lead SNPs and were included in the METAL output. We used the 1000 Genomes Project Phase 3 reference panel to calculate R2. We further filtered on SNPs with P < 0.05. In addition, we only annotated SNPs with MAF >0.01.
Positional annotations for all lead SNPs and SNPs in LD with the lead SNPs were obtained by performing ANNOVAR gene-based annotation using RefSeq genes. In addition, CADD scores (ref 38) and RegulomeDB (ref 15) scores were annotated to SNPs by matching chromosome, position, reference and alternative alleles. For each SNP, eQTLs were extracted from GTEx (44 tissue types) (ref 39) the Blood eQTL browser (ref 40) and BIOS gene-level eQTLs (ref 41). The eQTLs obtained from GTEx were filtered on gene P < 0.05, and eQTLs obtained from the other two databases were filtered on FDR < 0.05. The FDR values were provided by GTEx, BIOS and the Blood eQTL browser. For GTEx eQTLs, there is one FDR value available per gene–tissue pair. As such, the FDR is identical for all eQTLs belonging to the same gene-tissue pair. For BIOS and the Blood eQTL browser, an FDR value was computed for each SNP.
To test whether the SNPs were functionally active by means of histone modifications, we obtained epigenetic data from the NIH Roadmap Epigenomics Mapping Consortium (ref 42) and ENCODE (ref 43). For every 200 bp of the genome, a 15-core chromatin state was predicted by a hidden Markov model based on five histone marks (H3K4me3, H3K4me1, H3K27me3, H3K9me3 and H3K36me3) for 127 tissue and cell types (ref 44). We annotated chromatin states (15 states in total) to SNPs by matching chromosome and position for every tissue or cell type. We computed the minimum state (1, the most active state) and the consensus state (majority of states) across 127 tissue and cell types for each SNP.
Chromatin states were also determined for the 52 genes (47 from the gene-based test + 5 additional genes implicated by single-SNP GWAS). For each gene and tissue, the chromatin state was obtained per 200-bp interval in the gene. We then annotated the genes by means of a consensus decision when multiple states were present for a single gene; that is, the state of the gene was defined as the modus of all states present in the gene.
Tissue expression of genes.
RNA sequencing data from 1,641 tissue samples with 45 unique tissue labels were derived from the GTEx consortium (ref 39). This set includes 313 brain samples over 13 unique brain regions (see Supplementary Table 18 for sample size per tissue). Of the 52 genes implicated by either the GWAS or the GWGWAS, 44 were included in the GTEx data. Normalization of the data was performed as described previously (ref 45). Briefly, genes with RPKM value smaller than 0.1 in at least 80% of the samples were removed. The remaining genes were log2 transformed (after using a pseudocount of 1), and finally a zero-mean normalization was applied.
Proxy replication in educational attainment.
For the replication analysis, we used a subset of the data from ref. 21. In particular, we excluded the Erasmus Rucphen Family Study, the Minnesota Center for Twin and Family Research Study, the Swedish Twin Registry Study, the 23andMe data and all individuals from UK Biobank, to make sure that there was no sample overlap with our IQ data set. Genetic correlation between intelligence and educational attainment in this non-overlapping subsample was rg = 0.73, s.e.m. = 0.03, P = 1.4x10-163. The replication analysis was based on the phenotype EduYears, which measures the number of years of schooling completed. A total of 306 of our 336 top SNPs (and 16 of 18 independent lead SNPs) were available in the educational attainment sample. We performed a sign concordance analysis for the 16 independent lead SNPs, using the exact binomial test. For each independent signal we determined whether either the lead SNP had a P value smaller than 0.05/16 in the educational attainment analysis or another (correlated) top SNP in the same locus had such a P value, if this was not the case for the lead SNP. All 47 genes implicated in the GWGAS for intelligence were available for lookup in the educational attainment sample. For each gene, we determined whether it had a P value smaller than 0.05/47 in the educational attainment analysis.
Polygenic risk score analysis.
We used LDpred (ref 16) to calculate the variance explained in intelligence in independent samples by a polygenic risk score based on our discovery analysis, as well as two previous GWAS for intelligence (ref 5, ref 6). LDpred adjusts GWAS summary statistics for the effects of LD by using an approximate Gibbs sampler that calculates the posterior means of effects, conditional on LD information, when calculating polygenic risk scores. We used varying priors for the fraction of SNPs with nonzero effects (priors: 0.01, 0.05, 0.1, 0.5, 1 and an infinitesimal prior). Independent data sets available for polygenic risk score analyses are described in the Supplementary Note.
Data availability.
Summary statistics have been made available for download from http://ctg.cncr.nl/software/summary_statistics. Genotype data that underlie the findings of this study are available from UK Biobank but restrictions apply to the availability of these data, which were used under license for the current study (application number 16406) and so are not publicly available. Summary statistics from the CHIC consortium are available from http://ssgac.org/documents/CHIC_Summary_Benyamin2014.txt.gz. Additional supporting data are provided in the supplementary material.
Acknowledgments
This work was funded by the Netherlands Organization for Scientific Research (NWO VICI 453-14-005). The analyses were carried out on the Genetic Cluster Computer, which is financed by the Netherlands Scientific Organization (NWO: 480-05-003), by VU University, Amsterdam, the Netherlands, and by the Dutch Brain Foundation and is hosted by the Dutch National Computing and Networking Services SurfSARA. This research has been conducted using the UK Biobank resource under application number 16406. We thank the participants and researchers who collected and contributed to the data.
Summary statistics have been made available for download from http://ctg.cncr.nl/software/summary_statistics.
Author Contributions
S. Sniekers performed the analyses. D.P. conceived the study. S. Stringer performed quality control on the UK Biobank data. K.W. and E.T. conducted in silico follow-up analyses. P.R.J., E.K. and J.R.I.C. conducted polygenic risk score analyses. P.K., C.A.R., D.Z., H.T., C.M.v.D., N.A., P.M., D.C., M.J., M.M., M.B.M., W.G.I., J.J.L., G.B., R.P., N.P., A.P., W.E.R.O., M.A.I. and C.F.C. contributed data. A.R.H. provided scripts for the pathway analyses. A.O. performed the educational attainment meta-analysis. S. Sniekers and D.P. wrote the manuscript. All authors discussed the results and commented on the manuscript.
Competing Financial Interests
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/reprints/index.html. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References:
[Click here], then browse.
→ This Summary Article extract was last updated 10 Jun 2017 12:30 PDT ←