Introduction

Endometrial cancer accounts for ~7% of new cancer cases in women1 and is the most common invasive gynecological cancer in developed countries (http://gco.iarc.fr/today/home). Risk of endometrial cancer is approximately double for women who have a first degree relative with endometrial cancer2,3. Rare high-risk pathogenic variants in mismatch-repair genes, PTEN, and DNA polymerase genes4 explain a small proportion of endometrial cancers, and the eight previously published common endometrial cancer-associated single-nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) studies5,6,7,8 together explain <5% of the familial relative risk (FRR).

Here, we conduct a meta-GWAS including 12,906 endometrial cancer cases and 108,979 country-matched controls of European ancestry from 17 studies identified via the Endometrial Cancer Association Consortium (ECAC), the Epidemiology of Endometrial Cancer Consortium (E2C2) and the UK Biobank and report a further nine genome-wide significant endometrial cancer genetic risk regions. One of these risk regions on 12q24.12 was previously identified by meta-GWAS of endometrial and colorectal cancer9. eQTL and gene network analyses reveal candidate causal genes and pathways relevant for endometrial carcinogenesis.

Results

GWAS meta-analysis

Details of genotyping for each study are found in Supplementary Data 1 and individual studies described in the Supplementary Information. Following standard quality control (QC) for each dataset (Supplementary Methods), genotypes were imputed using the 1000 Genomes Project v3 reference panel (combined with the UK10K reference panel for the WHI and UK Biobank studies). SNP-disease associations in each study were tested using logistic regression, adjusting for principal components, and risk estimates were combined using inverse-variance weighted fixed-effects meta-analysis. We found little evidence of genomic inflation in any dataset (λ1000 = 0.996–1.128) or overall (λ1000 = 1.004) (Supplementary Fig. 1). Using linkage disequalibrium (LD) score regression, we estimate that 93% of the observed test statistic inflation is due to polygenic signal, as opposed to population stratification.

Seven of the eight published genome-wide significant endometrial cancer loci were confirmed with increased significance (Table 1, Fig. 1a), although the effect sizes for some loci were slightly attenuated compared with our previous analysis (comprising 7737 cases and 37,144 controls7, all also included in the current analysis). For example, the most significant SNP in this meta-analysis, rs11263761 intronic in HNF1B, had an odds ratio (OR) = 1.15 (1.12–1.19; P = 3.2 × 10−20), compared with OR = 1.20 (1.15–1.25; P = 2.8 × 10−19) in our previous analysis7. The previously reported associations with intronic AKT1 SNPs (rs2498796 OR = 1.17 (1.07–1.17); P = 3.6 × 10−8)6,10 did not reach genome-wide significance (rs2498796 OR = 1.07 (1.03–1.11) P = 6.3 × 10−5, Bayes false discovery probability (BFDP) 98%) in this meta-analysis, although the risk estimate direction is consistent with our original finding.

Table 1 Meta-analysis results for previously identified genome-wide significant endometrial cancer risk loci
Fig. 1
figure 1

Manhattan plot of the results of the endometrial cancer meta-analysis of 12,906 cases and 108,979 controls. Genetic variants are plotted according to chromosome and position (x axis) and statistical significance (y axis). The red line marks the 5 × 10−8 GWAS significance threshold. a Endometrial cancer (all histologies). b Endometrial cancer (all histologies) excluding variants within 500 kb of previously published endometrial cancer variants. c Endometrioid histology endometrial cancer excluding variants within 500 kb of previously published endometrial cancer variants. d Non-endometrioid histology endometrial cancer

Excluding the 500 kb, either side of the risk loci previously reported at genome-wide significance for endometrial cancer alone, we found 125 SNPs with P < 5 × 10−8. Using approximate conditional association testing with GCTA software11, these were resolved into nine independent risk loci; eight newly reported regions, plus the 12q24.12 locus previously identified by a joint endometrial-colorectal cancer analysis9 (Table 2, Fig. 1b, Fig. 2a–i). The BFDP was ≤4% for all nine novel loci. The analysis was repeated with the restricted set of 8758 cases with endometrioid cancer, the most common histology (Fig. 1c); this identified one additional variant at 7p14.3 reaching genome-wide significance (rs9639594; Supplementary Data 2). However, given the sparse LD at this region and the fact that this is a single, imputed variant, further investigation of this region is required to confirm its association with endometrial cancer risk. No SNP reached genome-wide significance in an analysis restricted to the 1230 non-endometrioid cases (Fig. 1d) or in separate analyses of carcinosarcomas, serous, clear cell or mucinous carcinomas, for which statistical power is very limited (Supplementary Data 2, Supplementary Fig. 2).

Table 2 Meta-analysis results for newly identified genome-wide significant endometrial cancer risk loci
Fig. 2
figure 2

Regional association plots for the nine novel endometrial cancer loci. –log10(p) from the fixed-effects meta-analysis is on the left y axis, recombination rate (cM/Mb) is on the right y axis (plotted as blue lines). The color of the circles shows the level of linkage disequilibrium between each variant and the most significantly associated variant (purple diamond) (r2 from the 1000 Genomes 2014 EUR reference panel—see key). Genes in the region are shown beneath each plot. a 1p34.3, b 2p16.1, c 9p21.3, d 11p13, e 12p12.1, f 12q24.11, g 12q24.21, h 17q11.2, i 17q21.32

For these nine newly reported endometrial cancer loci, a statistically significant difference in risk estimates by histological subgroup was observed only for the 2p16.1 locus; the risk was higher for non-endometrioid than for endometrioid cancer (rs148261157 OR = 1.64 (1.32–2.04) and OR = 1.25 (1.14–1.38), respectively, case-only Pf = 0.003, Table 2). There was no evidence of secondary signals at any of these nine loci after conditioning on the most significant variant. There was no significant between-study heterogeneity (minimum Cochran Q-test Phet = 0.04, maximum I2 = 41%, Supplementary Fig. 3), and random-effects meta-analyses produced very similar results (Supplementary Data 2). Twenty-five additional independent loci showed moderately significant (P < 1 × 10−6) associations, nine with endometrial cancer overall, nine specifically with endometrioid histology, and seven with non-endometrioid histology (Supplementary Data 2).

Overlap with published GWAS associations

Using a 100:1 likelihood ratio, “credible causal risk” variants (ccrSNPs) were compiled for each of the nine new endometrial cancer risk loci (Supplementary Data 3). These included 239 variants located in non-coding regions, 2 missense variants (rs2278868 SKAP1 Gly161Ser and rs3184504 SH2B3 Trp262Arg), and 1 synonymous variant (rs1129506 EVI2A Ser23Ser). Comparing to the NHGRI-EBI catalog of published GWAS, 37 SNPs previously associated with a cancer, hormonal trait, or anthropometric trait fall within 500 kb of any one of the novel endometrial cancer SNPs. However, the only overlap from the set of ccrSNPs with other traits was the colorectal and endometrial cancer susceptibility SNP rs3184504 in SH2B3 (Supplementary Data 4).

eQTL analyses

LD score regression analyses using eQTL results from GTEx12 showed that endometrial cancer heritability exhibited the strongest evidence for enrichment for variants associated with genes specifically expressed in vaginal and uterine tissue, in line with prior assumptions, although none of the tissue-specific enrichments were significant (weighted regression with jackknife standard errors) after Bonferroni correction, adjusting for the number of tissues tested (Supplementary Fig. 4). eQTL analyses were performed using data from a variety of tissue sources (Supplementary Methods), including endometrial tumor and adjacent normal endometrium tissue from The Cancer Genome Atlas (TCGA)13, normal cycling endometrium14 and, in view of the GTEx enrichment results, vaginal and uterine tissue. Additionally, we assessed eQTLs from whole blood15, which provided substantially increased power over solid tissue analyses due to increased sample size. eQTLs were detected at five of the nine novel loci (Supplementary Data 3, Supplementary Data 5, Supplementary Figs. 513, Table 2).

Gene network analysis

Network analysis was performed using candidate causal genes identified in this study, in addition to candidate causal genes identified in previous studies6,7,8 (Supplementary Data 6). One major network was identified, containing 18 of the 25 candidate causal genes (Supplementary Fig. 14). Network hubs included CCND1, CTNNB1, and P53, which are encoded by genes that are somatically mutated in endometrial cancer13. Analysis of the network revealed significant enrichment (Benjamini–Hochberg adjusted P < 0.05, hypergeometric test) in relevant pathways such as endometrial cancer signaling, adipogenesis, Wnt/β-catenin signaling, estrogen-mediated S-phase entry, P53 signaling, and PI3K/AKT signaling (Supplementary Data 7).

Functional annotation of ccrSNPs

Next, ccrSNPs were mapped to epigenomic features from endometrial cancer cell lines (Supplementary Data 3, Supplementary Figs. 513). Chromatin immunoprecipitation (ChIP-seq) was used to map histone modifications indicative of promoters or enhancers (H3K4Me1, H3K4Me3, and H3K27Ac) in two endometrial cancer cell lines (Ishikawa and JHUEM-14). Mapping of DNaseI hypersensitivity sites (indicative of open chromatin) and ChIP-seq data for transcription factor binding sites from Ishikawa cells were accessed from ENCODE16. We also included mapping of H3K427Ac histone modifications for uterus and vagina from ENCODE. Overall, 73% of ccrSNPs overlapped at least one epigenomic feature, including at least one ccrSNP per novel risk region. This overlap was significantly greater than the overlap observed for these epigenomic features with ccrSNPs related to, for example, endometriosis17 (51%; Fisher’s exact P = 8.7 × 10−8) or schizophrenia18 (40%; Fisher’s exact P < 2.2 × 10−16). These findings indicate the relevance of the selected cell and tissue types for informing endometrial cancer biology and a role for the assessed epigenomic features in regulatory processes related to the ccrSNPs. Overlaps between ccrSNPs and epigenomic features increased significantly after stimulation with estrogen (50% versus 38% for unstimulated features; Fisher’s exact P = 5.6 × 10−3), emphasizing the importance of estrogen in endometrial cancer etiology.

Mendelian randomization analyses

This expanded meta-analysis allowed us to strengthen our previous Mendelian randomization findings19,20 that higher body mass index (BMI) (P = 1.7 × 10−11, two-sample inverse-variance weighted Mendelian randomization (MR) test), but not waist:hip ratio (P = 0.71), is causal for endometrial cancer (Table 3) and that the protective effect of later menarche on endometrial cancer risk (OR = 0.82, 95% CI 0.77–0.87 per year of delayed menarche, P = 2.2 × 10−9) is partially mediated by the known relationship between lower BMI and later menarche, with a more modest protective effect of later menarche after adjusting for genetically predicted BMI (OR = 0.88, 95% CI 0.82–0.94, P = 3.8 × 10−4). The association between genetically predicted age at natural menopause and endometrial cancer did not reach statistical significance (OR = 1.03, 95% CI 1.00–1.06, P = 0.060). In contrast to the reported effects for breast and prostate cancer21,22, we found no evidence that genetically predicted adult height is associated with endometrial cancer (P = 0.90).

Table 3 Effects of genetically predicted anthropometric and reproductive traits on risk of endometrial cancer

Genetic correlation analyses

Cross-trait LD score regression of 224 non-cancer traits available via the LD Hub interface23, identified significant genetic correlations between endometrial cancer and 14 traits. All of these are either a measure of obesity or are strongly and significantly (correlation-corrected jackknife P < 10−12) genetically correlated with BMI (i.e., age of menarche, type 2 diabetes, and years of schooling) (Supplementary Data 8), in line with the established relationship between obesity and endometrial cancer risk.

Discussion

In the largest GWAS meta-analysis assessing endometrial cancer risk, we discovered nine new genetic risk regions. We also confirmed the association of genetic variants with endometrial cancer risk at seven of the eight previously published genetic risk regions for this disease5,6,7,8. Using this larger GWAS-meta dataset, we were also able to confirm the previously published Mendelian randomization studies finding that higher BMI is causal for endometrial cancer risk20, and the protective effect of later age of menarche on endometrial cancer risk19. Genetic correlation analyses also indicated a relationship between endometrial cancer and obesity-related traits.

Candidate causal genes identified through eQTLs included CDCA8 (1p34.3), a putative ovarian cancer oncogene24, which encodes an essential regulator of mitosis and cell division25; RCN1 (11p13), encoding a calcium-binding protein that binds oncoproteins such as JAK226 and MYC27; WT1-AS (11p13), a long non-coding RNA that regulates the WT1 oncogene28,29; SH2B3 (12p24.11) encoding a negative regulator of the oncogenic KIT and JAK2 signal transduction proteins30; and tumor suppressor gene NF1 (17q11.2) encoding a negative regulator of RAS-mediated signal transduction31, which acquires putative driver mutations in TCGA endometrial tumors (http://www.cbioportal.org/study?id=ucec_tcga). Notably, the highly significant eQTL associations between ccrSNPs and expression of SH2B3 (linear regression P ≥ 5.62 × 10−20) and NF1 (P ≥ 1.32 × 10−56) in blood revealed risk alleles to be associated with decreased gene expression for both loci, consistent with the role of these genes in tumor development.

Intersections of ccrSNPs with epigenomic marks mapped in endometrial cancer cell lines, uterine tissue, and vaginal tissue found more endometrial cancer ccrSNPs overlapped with these features than ccrSNPs for endometriosis17 or schizophrenia18. These findings highlight the relevance of these tissues for functional studies of endometrial cancer biology. Given the established role of estrogen in endometrial carcinogenesis32, it is perhaps not surprising that endometrial cancer ccrSNPs exhibited greater overlap with epigenomic features present after estrogen stimulation. However, this finding provides evidence that functional studies of endometrial cancer should be performed under these conditions.

Using LD score regression, we estimated that ~28% of the approximately twofold FRR of endometrial cancer could be explained by variants, which can be reliably imputed from OncoArray genotypes. The common endometrial cancer variants identified to date together explain up to 6.8% of the FRR, including 2.7% contributed by the nine additional variants reported here; this may be an overestimate, given that the ORs for the new loci likely include some upwards bias (the so-called winner’s curse). In summary, we have doubled the number of endometrial cancer risk loci, explaining around one quarter (6.9%/28%) of the portion of the FRR attributable to common, readily-imputable SNPs. Furthermore, eQTL analyses have identified candidate causal genes and pathways related to tumor development for follow-up studies that will provide further insight into endometrial cancer biology.

Methods

Study samples

Analyses were based on 13 studies of endometrial cancer, of which four studies contributed case samples to more than one genotyping project. Data were also included from the E2C2 consortium of 45 separate studies. All participants were of European ancestry. Data from the E2C2 genome-wide association studies (GWAS) and from the ANECS, SEARCH, NSECG GWASs and the iCOGS project have been previously published, and are described in de Vivo et al.33 and Cheng et al.6, respectively.

The OncoArray study

The “OncoArray” genotyping chip34 contains 533,631 variants, around half of which were selected to provide a “GWAS backbone,” with the remaining variants selected on the basis of prior evidence of association with cancer or a cancer-related trait. The OncoArray chip was used to genotype 5061 endometrial cancer cases from ten studies in Australia, Belgium, Germany, Sweden, UK, and USA. Genotyping was carried out at two sites: the Center for Inherited Disease Research (CIDR; nine studies) and The University of Melbourne (one study). Details of the genotype calling are given in Amos et al.34

SNP-wise QC was conducted using genotype data from all consortia participating in the OncoArray experiment34. SNPs with call rate <95% in any of the consortia, SNPs not in Hardy–Weinberg equilibrium (HWE) (P < 10−7 in controls and P < 10−12 in cases) and SNPs with concordance <98% among 5280 duplicate pairs of samples were excluded, leaving 483,972 SNPs. Prior to imputation, SNPs with minor allele frequency (MAF) <1% and call rate <98% in any consortium were also excluded, as were SNPs that could not be linked to the 1000 Genomes Project reference panel or for which the MAF differed significantly from the European reference panel frequency. A further 1128 SNPs were excluded after review of cluster plots, hence 469,364 SNPs were used in the imputation.

The 5061 OncoArray-genotyped endometrial cancer cases were country-matched to controls who had been genotyped in an identical process as part of the Breast Cancer Association Consortium35,36. Samples with call rate <95%, with excessively low or high heterozygosity or with an estimated proportion of European ancestry <80% (based on a principal components analysis of 2318 informative markers and with reference to the HapMap populations) were excluded, as were suspected males and individuals who were XO or XXY.

Duplicates and close relatives were identified from estimated genomic kinship matrices. Pairwise comparisons were made among all samples genotyped as part of the OncoArray, iCOGS, or ANECS/SEARCH/NSECG GWAS genotyping projects. Where pairs of duplicates or close relatives were identified between projects, the sample with the more recent genotyping was retained, hence the numbers of cases included here from the ANECS/SEARCH/NSECG GWASs and iCOGS projects are lower than in the original publications. For case–control pairs from within the same project, the case was preferentially retained, and for case–case or control–control pairs, the sample with the higher call rate was used. Following these exclusions, OncoArray genotypes from 4710 cases and 19,438 controls were included in the analyses.

All OncoArray samples (along with all samples from the ANECS/SEARCH/NSECG GWASs and the iCOGS project) were imputed using the October 2014 (version 3) release of the 1000 Genomes Project reference panel. Samples were phased using SHAPEITv237 and genotypes were imputed using the IMPUTEv238 software for non-overlapping 5-Mb intervals. Analyses were restricted to the ~11.4 million SNPs with MAF >0.5% and r2 > 0.4.

Other studies

The 2695 cases and 2777 controls from the E2C2 consortium were genotyped using the Illumina Human OmniExpress array (2271 cases, 2219 controls from the United States) or the Illumina Human 660W array (424 cases, 558 controls from Poland)33 and both sets were separately imputed to the 1000 Genomes Project v3 reference panel using “minimac2” software, following standard quality control steps38,39.

The 288 cases from six population-based case–control studies within the Women’s Health Initiative were genotyped using five different arrays (Supplementary Data 1) and were each separately imputed using the combined 1000 Genome Project v3 and UK10K reference panels using “minimac2” software39, following standard quality measures and the exclusion of SNPs with a MAF <1%. Five controls for each case were selected randomly, matched on study.

Data were also included from the first phase of UK Biobank genotyping, comprising 636 Cancer Registry-confirmed endometrial cancer cases (as of October 2016) and 62,853 cancer-free female controls. Samples were genotyped using Affymetrix UK BiLEVE Axiom array and Affymetrix UK Biobank Axiom® array and imputed to the combined 1000 Genome Project v3 and UK10K reference panels using SHAPEIT340 and IMPUTE341.

No analyses to identify duplicates or relatives between samples from the E2C2, WHI, or UK Biobank studies, and any other study were carried out. However, given the sampling frame of these studies, it is very unlikely that there would have been any meaningful sample overlap.

After QC exclusions, the analysis included 12,906 endometrial cancer cases (3613 of which have not been included in any previous publication) and 108,979 controls. Analyses were also carried out specifically for endometrial cancer of endometrioid histology (8758 cases) and endometrial cancer with non-endometrioid histology (1230 cases). Exploratory analyses for specific non-endometrioid histologies (serous carcinoma, carcinosarcoma, clear cell carcinoma, and mucinous carcinoma) included a small number of cases of mixed histotype, where the major component was non-endometrioid. The UK Biobank data did not include information about histology.

All participating studies were approved by research ethics committees from QIMR Berghofer Medical Research Institute, University-Clinic Erlangen, Karolinska Institutet, UZ Leuven, The Mayo Clinic, The Hunter New England Health District, The Regional Committees for Medical and Health Research Ethics Norway, and the UK National Research Ethics Service (04/Q0803/148 and 05/MRE05/1). All participants provided written, informed consent.

Statistical analyses

Per-allele ORs and the s.e. of the logORs were computed using logistic regression for each of the ANECS, SEARCH, NSECG, WHI, and UK Biobank GWASs, for the two E2C2 GWASs and, by country, for the iCOGS and OncoArray studies, giving a total of 17 strata. Case-only analyses were used to assess heterogeneity in SNP effects by histology (endometrioid histology versus non-endometrioid histology). In the OncoArray analysis, potential population stratification was adjusted for using the first nine principal components; these were estimated using data for 33,661 uncorrelated SNPs with MAF >0.05 and pairwise r2 < 0.1 (including 2318 SNPs specifically selected as informative for continental ancestry) using purpose-written software (http://ccge.medschl.cam.ac.uk/software/pccalc). Other studies were similarly adjusted for their relevant principal components.

Analyses were carried out using SNPTEST42 for the ANECS, SEARCH, and NSECG GWASs, using ProbABEL43 for the E2C2 GWASs, and using in house software for the iCOGS, OncoArray, WHI, and UK Biobank studies. We assessed residual population stratification by computing the test statistic inflation adjusted to a sample size of 1000 cases and 1000 controls (λ1000s), both overall and with each strata, using 33,278 uncorrelated SNPs (r2 < 0.1). The overall λ1000 was 1.004, with stratum-specific λ1000’s between 0.996 and 1.128 (observed for the smallest strata, the German iCOGS dataset; Supplementary fig. 1).

The estimated ORs from the different studies were combined in a fixed-effects inverse-variance weighted meta-analysis using the “meta” software44. For each variant, results from any strata for which the imputation information score was <0.4, the MAF <0.005 or the OR >3 or <0.333 were excluded. Following the meta-analysis, SNPs with valid results in fewer than two of the strata, or with between-strata heterogeneity P < 5 × 10−8 were also excluded, leaving 11.7 million SNPs. A random-effects meta-analysis was also carried out.

Using the conventional 5 × 10−8 genome-wide significance threshold, all SNPs lying within ± 500 kb of a significant SNP were initially considered as part of that locus. Approximate conditional analysis in the GCTA program11,45 with an LD reference panel of 4000 OncoArray-genotyped control subjects were then used to look for additional independently associated SNPs within each locus. Only uncorrelated (r2 < 0.05) secondary signals were included. The only locus with evidence of significant signals after conditioning on the most strongly associated SNP was the previously published 8q24 locus6 (Table 1). For each locus, the set of credible causal risk SNPs (ccrSNPs) was defined as those variants within ± 500 kb of the most significant SNP and for which the likelihood from the association analysis was no less than one hundredth the likelihood of the most significant SNP (i.e., odds of <1 : 100). A BFDP for each significant SNP was estimated on the basis of a maximum plausible OR of 1.5 and a prior probability of association of 0.000146.

The proportion of the FRR of endometrial cancer due to the identified variants was estimated using a log-additive model, where pj, βj, and τj are the MAF, logOR, and se(logOR), respectively for variant j, and λ = 2 is the reported FRR of endometrial cancer. The effect estimates used were those estimated in the current study, both for the new loci and for the loci replicated from previous studies.

$${{{\rm{Proportion}}\,{\rm{FRR}}}} = \frac{1}{{\ln \left( \lambda \right)}}\mathop {\sum}\limits_j {p_j( {1 - p_j} ) ( {\beta _j^2 - \tau _j^2} )}.$$

The proportion of the endometrial cancer FRR that can be explained by all SNPs is given by the frailty-scale heritability, hf2, divided by 2ln(λ). This was estimated using LD score regression47, based on the full set of meta-analysis summary estimates, restricted to those SNPs present on the HapMap v3 dataset with MAF >1% and imputation quality R2 > 0.9 in the OncoArray imputation using the 1000 Genomes Phase 3 reference panel. The frailty-scale heritability (as opposed to the observed-scale heritability) was obtained by replacing the total sample, N, for each study with an effective sample size Nj for SNP j, which effectively weights each SNP according to its frequency and the variance of the effect estimate, i.e.,

$$N_j{\mathrm{ = }}\frac{1}{{2p_j\left( {1 - p_j} \right)\tau _j^2}}.$$

Cross-trait LD score regression via the LD Hub interface (28 September 2017, v1.4.1) was used to estimate the genetic correlations between endometrial cancer and 224 traits from 24 categories23.

The casual effects of five anthropometric or reproductive factors on the risks of endometrial cancer were estimated using two-sample summary statistic inverse-variance weighted MR analyses48. Instrumental variables for each factor consisted of the most recent set of published GWAS-significant SNPs for that trait; 77 SNPs for body mass index (BMI)49, 47 SNPs for waist:hip ratio50, 814 SNPs for adult height51,52, 54 SNPs for age at natural menopause53, and 368 SNPs for age at menarche19. A multivariable MR adjusting for the effects of the 368 menarche SNPs on BMI (a potential mediator) was used to estimate the direct effect of menarche on endometrial cancer, not via BMI54.

Cell culture

Ishikawa and JHUEM-14 cells were a gift from Prof PM Pollock (Queensland University of Technology). Cell lines were authenticated using STR profiling and confirmed to be negative for mycoplasma contamination. Ishikawa cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM; Life Technologies #1195-065) with 10% fetal bovine serum (FBS) and antibiotics (100 IU/ml penicillin and 100 μg/ml streptomycin). JHUEM-14 cells were cultured in DMEM/F12 medium (Life Technologies #11320-033) with 10% FBS and antibiotics.

Cell fixing and chromatin shearing

Ishikawa and JHUEM-14 cells were plated on to 10-cm tissue culture dishes in phenol red-free DMEM (Sigma-Aldrich #D1145) supplemented with l-glutamine, sodium pyruvate, and 10% charcoal-dextran-stripped FBS. Three days later, media were replaced and cells incubated with fresh medium containing either 10 nM estradiol or DMSO (vehicle control) for 3 h. Cells were washed twice with PBS and fixed at room temperature in 1% formaldehyde in PBS. After 10 min, cells were placed on ice and washed twice with ice-cold PBS. The reaction was quenched with 10 mM DTT in 100 mM Tris-HCl (pH 9.4) and cells removed from the dish with a cell scraper. Cells were incubated at 30 °C for 15 min, then spun for 5 min at 2000×g. Cells were washed sequentially with ice-cold PBS, buffer I (0.25% Triton X-100, 10 mM EDTA, 0.5 mM EGTA, 10 mM HEPES, pH 6.5) and buffer II (200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 10 mM HEPES, pH 6.5) and centrifuged for 5 min at 2000×g at 4 °C. Cells were resuspended in 300–750 μl of lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCl, pH 8.1, with complete protease inhibitor cocktail (Sigma-Aldrich #11836145001)). Ishikawa cells were sonicated for eight cycles (10 s) and JHUEM-14 cells for 20 cycles using the highest power setting of a Branson Digital Sonifier SLPt. After chromatin shearing was confirmed by agarose gel electrophoresis, samples were centrifuged for 10 min at 4 °C.

Chromatin immunoprecipitation and sequencing

Samples were diluted 10-fold in 1% Triton X-100, 2 mM EDTA, 20 mM Tris.HCl (pH 8.1), and 150 mM NaCl with complete protease inhibitor cocktail. Magna ChIP protein A/G magnetic beads (EMD Millipore #16-663) were added to 500 μl of diluted chromatin and incubated with 5 μg of antibody overnight at 4 °C. Antibodies to H3K4Me1 (Abcam #ab8895), H3K4Me3 (Abcam #ab8580), and H3K27Ac (Abcam #ab4729) were used (Supplementary Table 1). The next day supernatant was removed and the beads washed three times with the following ice-cold buffers: RIPA 150 (0.1% SDS, 1% Triton X-100, 1 mM EDTA, 50 mM Tris-HCl (pH 8.10, 150 mM NaC1, 0.1% sodium deoxycholate), RIPA 500 (0.1% SDS, 1% Triton X-100, 1 mM EDTA, 50 mM Tris-HCl (pH 8.10, 500 mM NaC1, 0.1% sodium deoxycholate), LiCl RIPA (500 mM LiCl, 1% NP-40, 1% deoxycholate, 1 mM EDTA, 50 mM Tris-HCl (pH 8.1)), and TE buffer. Chromatin was then eluted by incubating beads overnight at 60 °C with 100 μl of elution buffer (1% SDS, 100 mM NaHCO3) and 0.5 mg/ml proteinase K. The next day beads were incubated at 95 °C for 10 min and supernatant removed. Chromatin was purified using the QIAquick Spin kit (QIAGEN) and eluted from columns using 50 μl of QIAGEN EB buffer. DNA was quantified using a Qubit dsDNA HS Assay kit (ThermoFisher Scientific).

Samples from two biological replicates for each treatment were sent to the Australian Genome Research Facility (Melbourne, Australia) for library preparation and sequencing (Illumina HiSeq 2500) with 50 bp reads. Mapping and analysis of ChIP-seq reads were performed using the ENCODE analysis pipeline, histone ChIP-seq Unary Control (GRCh37), with DNAnexus software tools (https://dnanexus.com). Replicated peaks across biological replicates were used for downstream analyses.

eQTL analyses

Summary eQTL results for non-cancer tissue were obtained using uterine (N = 70) and vaginal (N = 79) tissue-specific data generated by the Genotype-Tissue Expression Project (GTEx)12, an endometrium eQTL dataset (N = 123) provided by Fung et al.14, and a blood eQTL dataset (males and females; N = 5311)15.

Data from endometrial cancer tumors and adjacent normal endometrial tissue were accessed from The Cancer Genome Atlas13. Patient germ line SNP genotypes (Affymetrix 6.0 arrays) and tissue expression RNA-seq data were downloaded through the controlled access portal, while epidemiological and tumor tissue copy-number data were downloaded through the public access portal. RNA-seq data were aligned and expression quantified to reads per kilobase per million (RPKM) as described in Painter et al.10 and quality control performed on the germ line SNP genotypes as per Carvajal-Carmona et al.55 Complete genotype, RNA-seq, and copy-number data were available for 277 genetically European patients (218 with endometrioid histology, 29 with adjacent normal tissue).

Germ line genotypes underwent further quality control before imputation to the 1000 Genomes Phase 3v5 reference panel by Eagle v2.356, using the Michigan Imputation Server57. Briefly, subjects were removed for genotype missingness >10% and SNPs were removed for missingness >10%, MAF <5%, and HWE P-value <5 × 10−8. SNPs were also removed if they were indels or non-biallelic variants, were ambiguous SNPs with a MAF >40%, were not matched to the reference panel, had a MAF difference with the reference panel of >20%, or were duplicates.

Genes with a median expression level of 0 RPKM across samples were removed, and the RPKM values of each gene were log2-transformed and samples were quantile normalized. The expression of the genes located within a 2-Mb window surrounding the ccrSNP at each of the newly identified risk loci were extracted from the expression dataset.

The associations between ccrSNPs and gene expression in all endometrial cancer tumor tissues, endometrioid endometrial cancer tissues only, and adjacent normal endometrial tissue, were evaluated using linear regression models using the MatrixEQTL program in R58, adjusting for sequencing platform. Tumor tissue expression was also adjusted for copy-number variation, as previously described in Li et al.59 A false discovery rate of <20% was used to report eQTL results from all datasets, except for the endometrium eQTL dataset where we used a P-value <0.01.

Candidate causal gene network analysis

Candidate causal genes identified in our previous studies and from the eQTL results in the current study (Supplementary Table 6) were analyzed using Ingenuity Pathway Analysis (QIAGEN; accessed on 23 March 2018 and available at www.qiagen.com/ingenuity) to define gene networks and enrichment of genes in canonical signaling pathways.

Data availability

OncoArray germ line genotype data for the ECAC studies and E2C2 germ line genotype data have been deposited through the database of Genotypes and Phenotypes (dbGaP; accession number phs000893.v1.p1). Meta-GWAS summary statistics are available from the authors by request. Genotype data for non-cancer controls were provided by the Breast Cancer Association Consortium (BCAC) by application to the BCAC Data Access Coordination Committee (http://bcac.ccge.medschl.cam.ac.uk/). ChIP-seq data are available from the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE113818.