Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Apr;8(2):93-111.
doi: 10.2174/138920207780368169.

Consensus higher order repeats and frequency of string distributions in human genome

Affiliations

Consensus higher order repeats and frequency of string distributions in human genome

Vladimir Paar et al. Curr Genomics. 2007 Apr.

Abstract

Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR type. Subsequently, for particular fragment lengths equal to each of these repeat sizes we compute the partial frequency distribution of r-bp key strings; the key string with highest frequency is a dominant key string, optimal for segmentation of a given genomic sequence into repeat units. We illustrate how a wide class of 3-bp key strings leads to a key-string-dependent periodic cell which enables a simple identification and consensus length determinations of HORs, or any other highly convergent repeat of monomeric or HOR type, both tandem or dispersed. We illustrated KSA application for HORs in human genome and determined consensus HORs in the Build 35.1 assembly. In the next step we compute suprachromosomal family classification and CENP-B box / pJalpha distributions for HORs. In the case of less convergent repeats, like for example monomeric alpha satellite (20-40% divergence), we searched for optimal compact key string using frequency method and developed a concept of composite key string (GAAAC--CTTTG) or flexible relaxation (28 bp key string) which provides both monomeric alpha satellites as well as alpha monomer segmentation of internal HOR structure. This method is convenient also for study of R-strand (direct) / S-strand (reverse complement) alpha monomer alternations. Using KSA we identified 16 alternating regions of R-strand and S-strand monomers in one contig in choromosome 7. Use of CENP-B box and/or pJalpha motif as key string is suitable both for identification of HORs and monomeric pattern as well as for studies of CENP-B box / pJalpha distribution. As an example of application of KSA to sequences outside of HOR regions we present our finding of a tandem with highly convergent 3434-bp Long monomer in chromosome 5 (divergence less then 0.3%).

Keywords: CENP-B box; Human genome; alpha satellite; alphoid; consensus higher order repeat; frequency distribution of strings; higher order repeat (HOR); key string algorithm - KSA; pJα motif; suprachromosomal families.

PubMed Disclaimer

Figures

Fig. (1)
Fig. (1)
Total frequency f6 as fraction of the fragment length computed for contig NT_011295.10 in chromosome 19 using all possible 6-bp strings (For description see the text).

Similar articles

Cited by

References

    1. Maio JJ. DNA strand reassociation and polyribonucleotide binding in the African green monkey. Cercopithecus aethiops. J MolBiol. 1971;56:579–595. - PubMed
    1. Manuelidis L. Repeating restriction fragments of human DNA. Nucleic Acids Res. 1976;3:3063–3076. - PMC - PubMed
    1. Manuelidis L. Complex and simple sequences in human repeated DNAs. Chromosoma. 1978;66:1–21. - PubMed
    1. Manuelidis L. Chromosomal localization of complex and simple repeated human DNAs. Chromosoma. 1978;66:23–32. - PubMed
    1. Rosenberg H, Singer M, Rosenberg M. Highly iterated sequences of SIMIANSIMIANSIMIANSIMIANSIMIAN. Science. 1978;200:394–402. - PubMed