This is a preprint.
Genotyping sequence-resolved copy-number variations using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes
- PMID: 39149335
- PMCID: PMC11326217
- DOI: 10.1101/2024.08.11.607269
Genotyping sequence-resolved copy-number variations using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes
Abstract
Copy-number variable (CNV) genes are important in evolution and disease, yet sequence variation in CNV genes is a blindspot for large-scale studies. We present a method, ctyper, that leverages pangenomes to produce copy-number maps with allele-specific sequences containing locally phased variants of CNV genes from NGS reads. We extensively characterized accuracy and efficiency on a database of 3,351 CNV genes including HLA , SMN , and CYP2D6 as well as 212 non-CNV medically-relevant challenging genes. The genotypes capture 96.5% of underlying variants in new genomes, requiring 0.9 seconds per gene. Expression analysis of ctyper genotypes explains more variance than known eQTL variants. Comparing allele-specific expression quantified divergent expression on 7.94% of paralogs and tissue-specific biases on 4.7% of paralogs. We found reduced expression of SMN-1 converted from SMN-2, which potentially affects diagnosis of spinal muscular atrophy, and increased expression of a duplicative translocation of AMY2B . Overall, ctyper enables biobank-scale genotyping of CNV and challenging genes.
Similar articles
-
Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.Genes (Basel). 2020 Jan 29;11(2):141. doi: 10.3390/genes11020141. Genes (Basel). 2020. PMID: 32013076 Free PMC article.
-
Arabidopsis thaliana population analysis reveals high plasticity of the genomic region spanning MSH2, AT3G18530 and AT3G18535 genes and provides evidence for NAHR-driven recurrent CNV events occurring in this location.BMC Genomics. 2016 Nov 8;17(1):893. doi: 10.1186/s12864-016-3221-1. BMC Genomics. 2016. PMID: 27825302 Free PMC article.
-
Tissue-Specific eQTL in Zebrafish.Methods Mol Biol. 2020;2082:239-249. doi: 10.1007/978-1-0716-0026-9_17. Methods Mol Biol. 2020. PMID: 31849020
-
Copy Number Variations in Adult-onset Neuropsychiatric Diseases.Curr Genomics. 2018 Sep;19(6):420-430. doi: 10.2174/1389202919666180330153842. Curr Genomics. 2018. PMID: 30258274 Free PMC article. Review.
-
Copy number variants, diseases and gene expression.Hum Mol Genet. 2009 Apr 15;18(R1):R1-8. doi: 10.1093/hmg/ddp011. Hum Mol Genet. 2009. PMID: 19297395 Review.
Publication types
LinkOut - more resources
Full Text Sources
Research Materials