Consensus higher order repeats and frequency of string distributions in human genome
- PMID: 18660848
- PMCID: PMC2435359
- DOI: 10.2174/138920207780368169
Consensus higher order repeats and frequency of string distributions in human genome
Abstract
Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR type. Subsequently, for particular fragment lengths equal to each of these repeat sizes we compute the partial frequency distribution of r-bp key strings; the key string with highest frequency is a dominant key string, optimal for segmentation of a given genomic sequence into repeat units. We illustrate how a wide class of 3-bp key strings leads to a key-string-dependent periodic cell which enables a simple identification and consensus length determinations of HORs, or any other highly convergent repeat of monomeric or HOR type, both tandem or dispersed. We illustrated KSA application for HORs in human genome and determined consensus HORs in the Build 35.1 assembly. In the next step we compute suprachromosomal family classification and CENP-B box / pJalpha distributions for HORs. In the case of less convergent repeats, like for example monomeric alpha satellite (20-40% divergence), we searched for optimal compact key string using frequency method and developed a concept of composite key string (GAAAC--CTTTG) or flexible relaxation (28 bp key string) which provides both monomeric alpha satellites as well as alpha monomer segmentation of internal HOR structure. This method is convenient also for study of R-strand (direct) / S-strand (reverse complement) alpha monomer alternations. Using KSA we identified 16 alternating regions of R-strand and S-strand monomers in one contig in choromosome 7. Use of CENP-B box and/or pJalpha motif as key string is suitable both for identification of HORs and monomeric pattern as well as for studies of CENP-B box / pJalpha distribution. As an example of application of KSA to sequences outside of HOR regions we present our finding of a tandem with highly convergent 3434-bp Long monomer in chromosome 5 (divergence less then 0.3%).
Keywords: CENP-B box; Human genome; alpha satellite; alphoid; consensus higher order repeat; frequency distribution of strings; higher order repeat (HOR); key string algorithm - KSA; pJα motif; suprachromosomal families.
Figures
Similar articles
-
Key-string algorithm--novel approach to computational analysis of repetitive sequences in human centromeric DNA.Croat Med J. 2003 Aug;44(4):386-406. Croat Med J. 2003. PMID: 12950141 Review.
-
CENP-B box and pJalpha sequence distribution in human alpha satellite higher-order repeats (HOR).Chromosome Res. 2006;14(7):735-53. doi: 10.1007/s10577-006-1078-x. Epub 2006 Nov 22. Chromosome Res. 2006. PMID: 17115329
-
ColorHOR--novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome.Bioinformatics. 2005 Apr 1;21(7):846-52. doi: 10.1093/bioinformatics/bti072. Epub 2004 Oct 27. Bioinformatics. 2005. PMID: 15509609
-
Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly.Data Brief. 2019 Mar 8;24:103708. doi: 10.1016/j.dib.2019.103708. eCollection 2019 Jun. Data Brief. 2019. PMID: 30989093 Free PMC article.
-
Key-string segmentation algorithm and higher-order repeat 16mer (54 copies) in human alpha satellite DNA in chromosome 7.J Theor Biol. 2003 Mar 7;221(1):29-37. doi: 10.1006/jtbi.2003.3165. J Theor Biol. 2003. PMID: 12634041
Cited by
-
Organization and evolution of Gorilla centromeric DNA from old strategies to new approaches.Sci Rep. 2015 Sep 21;5:14189. doi: 10.1038/srep14189. Sci Rep. 2015. PMID: 26387916 Free PMC article.
-
Precise identification of cascading alpha satellite higher order repeats in T2T-CHM13 assembly of human chromosome 3.Croat Med J. 2024 Jun 13;65(3):209-219. doi: 10.3325/cmj.2024.65.209. Croat Med J. 2024. PMID: 38868967 Free PMC article.
-
Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15.Int J Mol Sci. 2024 Apr 16;25(8):4395. doi: 10.3390/ijms25084395. Int J Mol Sci. 2024. PMID: 38673983 Free PMC article.
-
Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics.Cells. 2020 Dec 18;9(12):2714. doi: 10.3390/cells9122714. Cells. 2020. PMID: 33352976 Free PMC article. Review.
-
Novel Cascade Alpha Satellite HORs in Orangutan Chromosome 13 Assembly: Discovery of the 59mer HOR-The largest Unit in Primates-And the Missing Triplet 45/27/18 HOR in Human T2T-CHM13v2.0 Assembly.Int J Mol Sci. 2024 Jul 11;25(14):7596. doi: 10.3390/ijms25147596. Int J Mol Sci. 2024. PMID: 39062839 Free PMC article.
References
-
- Maio JJ. DNA strand reassociation and polyribonucleotide binding in the African green monkey. Cercopithecus aethiops. J MolBiol. 1971;56:579–595. - PubMed
-
- Manuelidis L. Complex and simple sequences in human repeated DNAs. Chromosoma. 1978;66:1–21. - PubMed
-
- Manuelidis L. Chromosomal localization of complex and simple repeated human DNAs. Chromosoma. 1978;66:23–32. - PubMed
-
- Rosenberg H, Singer M, Rosenberg M. Highly iterated sequences of SIMIANSIMIANSIMIANSIMIANSIMIAN. Science. 1978;200:394–402. - PubMed
LinkOut - more resources
Full Text Sources
Research Materials