This is a preprint.
Genotyping sequence-resolved copy-number variations using pangenomes reveals paralog-specific global diversity and expression divergence of gene duplication
- PMID: 39149335
- PMCID: PMC11326217
- DOI: 10.1101/2024.08.11.607269
Genotyping sequence-resolved copy-number variations using pangenomes reveals paralog-specific global diversity and expression divergence of gene duplication
Abstract
Copy-number variable (CNV) genes are important in evolution and disease, yet sequence variation in CNV genes remains a blindspot in large-scale studies. We present ctyper, a method that leverages pangenomes to produce allele-specific copy numbers with locally phased variants from NGS reads. Benchmarking on 3,351 CNV genes including HLA , SMN , and CYP2D6 , and 212 challenging medically-relevant (CMR) genes poorly mapped by NGS, ctyper captures 96.5% of phased variants with ≥99.1% correctness of copy number on CNV genes and 94.8% of phased variants on CMR genes. Applying alignment-free algorithms, ctyper takes only 1.5 hours to genotype a genome on a single CPU. Its results improve predictions of gene expression compared to known eQTL variants. Allele-specific expression quantified divergent expression on 7.94% of paralogs and tissue-specific biases on 4.68% of paralogs. We found reduced expression of SMN -2 due to SMN1 -conversions, potentially affecting spinal muscular atrophy, and increased expression of translocated duplications of AMY2B . Overall, ctyper enables biobank-scale genotyping of CNV and CMR genes.
Similar articles
-
Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.Genes (Basel). 2020 Jan 29;11(2):141. doi: 10.3390/genes11020141. Genes (Basel). 2020. PMID: 32013076 Free PMC article.
-
Arabidopsis thaliana population analysis reveals high plasticity of the genomic region spanning MSH2, AT3G18530 and AT3G18535 genes and provides evidence for NAHR-driven recurrent CNV events occurring in this location.BMC Genomics. 2016 Nov 8;17(1):893. doi: 10.1186/s12864-016-3221-1. BMC Genomics. 2016. PMID: 27825302 Free PMC article.
-
Tissue-Specific eQTL in Zebrafish.Methods Mol Biol. 2020;2082:239-249. doi: 10.1007/978-1-0716-0026-9_17. Methods Mol Biol. 2020. PMID: 31849020
-
Copy Number Variations in Adult-onset Neuropsychiatric Diseases.Curr Genomics. 2018 Sep;19(6):420-430. doi: 10.2174/1389202919666180330153842. Curr Genomics. 2018. PMID: 30258274 Free PMC article. Review.
-
Copy number variants, diseases and gene expression.Hum Mol Genet. 2009 Apr 15;18(R1):R1-8. doi: 10.1093/hmg/ddp011. Hum Mol Genet. 2009. PMID: 19297395 Review.
Publication types
LinkOut - more resources
Full Text Sources
Research Materials