Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 22:16:834.
doi: 10.1186/s12864-015-1991-5.

Copy number variations in the genome of the Qatari population

Affiliations

Copy number variations in the genome of the Qatari population

Khalid A Fakhro et al. BMC Genomics. .

Abstract

Background: The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population.

Methods: CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future.

Results: Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40% of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5% of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population.

Conclusion: We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
CNV analysis strategy. CNV detection in Qataris was assessed at two tiers. First, CNVs were called in 100 individuals using two algorithms each, on two primary input datasets: genotyping array (OMNI2.5 M) and next-generation whole genome sequencing reads (Illumina PE 100 bp, Mean Depth: 37X). A size cut-off of at least 5 consecutive probes for genotyping data and at least 5 consecutive windows for whole genome sequencing data was used to increase specificity (see Methods). Three samples with an unusually high number of CNVs were first removed from the population (see Additional file 1: Figure S1). In the second step, high-quality CNVs from the remaining 97 subjects called by all 4 platforms were distributed into 97 individual files. CNVs were first compared intra-individuals and retained if observed by more than one algorithm. If no overlap was detected within the individual, the CNV was compared inter-individuals to detect a second occurrence in the remaining 97 individuals. CNVs observed only once in the entire sample were discarded. CNVs passing these filters were merged across the population to generate population level CNV regions (CNVRs), which were taken into the detailed analysis steps. *Denotes data was provided as-is from proprietary Illumina Genome Network sequencing pipeline without the ability of the user to alter parameters
Fig. 2
Fig. 2
Probability distributions of CNVs by frequency and size in each copy number class in 97 Qataris. Density curves showing the probability (y-axis) of a given individual from each of the 3 subpopulations having a certain number of CNVs (a-d) or a certain cumulative size of the genome affected by CNVs (e-h) in each copy number class (a, e. CN = 0; b, f. CN = 1; c, g. CN = 3; d, h. CN = 4+). All p -values are calculated using the ANOVA-Tukey method. Black trace – Q1, Blue trace - Q2, Red trace – Q3
Fig. 3
Fig. 3
All SNPs within 500 kb of start and end breakpoints of 1,193 deletions were used to detect for each deletion a SNP with the maximum pairwise LD correlation. This was done both for a. all 1193 CNVs and b. for only 422 Genic CNVs. In both cases, the WGS SNVs significantly outperformed the OMNI2.5 M SNPs, especially at higher r2 values. WGS-SNVs: Whole genome sequencing detected variants (●). OMNI2.5 M-SNPs: SNPs present on the OMNI2.5 M array (Ο)

Similar articles

Cited by

References

    1. Oppenheimer S. Out-of-Africa, the peopling of continents and islands: tracing uniparental gene trees across the map. Philos Trans R Soc Lond B Biol Sci. 2012;367:770–84. doi: 10.1098/rstb.2011.0306. - DOI - PMC - PubMed
    1. Hunter-Zinck H, Musharoff S, Salit J, Al-Ali KA, Chouchane L, Gohar A, Matthews R, Butler MW, Fuller J, Hackett NR, et al. Population genetic structure of the people of Qatar. Am J Hum Genet. 2010;87:17–25. doi: 10.1016/j.ajhg.2010.05.018. - DOI - PMC - PubMed
    1. Omberg L, Salit J, Hackett N, Fuller J, Matthew R, Chouchane L, Rodriguez-Flores JL, Bustamante C, Crystal RG, Mezey JG. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations. BMC Genet. 2012;13:49. doi: 10.1186/1471-2156-13-49. - DOI - PMC - PubMed
    1. Rodriguez-Flores JL, Fuller J, Hackett NR, Salit J, Malek JA, Al-Dous E, Chouchane L, Zirie M, Jayoussi A, Mahmoud MA, et al. Exome sequencing of only seven qataris identifies potentially deleterious variants in the qatari population. PLoS One. 2012;7:e47614. doi: 10.1371/journal.pone.0047614. - DOI - PMC - PubMed
    1. Rodriguez-Flores JL, Fakhro K, Hackett NR, Salit J, Fuller J, Gosto-Perez F, Gharbiah M, Malek JA, Zirie M, Jayyousi A, et al. Exome sequencing identifies potential risk variants for Mendelian disorders at high prevalence in Qatar. Hum Mutat. 2014;35:105–16. doi: 10.1002/humu.22460. - DOI - PMC - PubMed

Publication types