Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec 22:10:442.
doi: 10.1186/1471-2105-10-442.

G+C content dominates intrinsic nucleosome occupancy

Affiliations

G+C content dominates intrinsic nucleosome occupancy

Desiree Tillo et al. BMC Bioinformatics. .

Abstract

Background: The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome in vitro. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences.

Results: We find that a simple linear combination of only 14 simple DNA sequence attributes (G+C content, two transformations of dinucleotide composition, and the frequency of eleven 4-bp sequences) explains nucleosome occupancy in vitro and in vivo in a manner comparable to the Kaplan model. G+C content and frequency of AAAA are the most important features. G+C content is dominant, alone explaining approximately 50% of the variation in nucleosome occupancy in vitro.

Conclusions: Our findings provide a dramatically simplified means to predict and understand intrinsic nucleosome occupancy. G+C content may dominate because it both reduces frequency of poly-A-like stretches and correlates with many other DNA structural characteristics. Since G+C content is enriched or depleted at many types of features in diverse eukaryotic genomes, our results suggest that variation in nucleotide composition may have a widespread and direct influence on chromatin structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Model feature weights selected by Lasso for eleven different training data sets. Chromosomes from which 1,000,000 random nucleotide positions were taken are given at bottom. Correlation coefficients are given in the middle, using a test set that does not include any of the random nucleotide positions used in the training set. The top panel is a zoom-in of the 16 features that were weighted in more than half of the eleven runs. Weights do not directly reflect importance or proportion of the data that a feature explains, because features are unit-normalized prior to analysis, and can have dissimilar distributions.
Figure 2
Figure 2
Performance of a 14 feature linear model of intrinsic nucleosome sequence preference. (A) Scatter plot vs. test set (yeast chromosomes 10-16), shown as a heat-map. Axis values are log2 normalized nucleosome occupancy (see Methods). (B) Model scores (probabilistic[8] and linear) and in vivo and in vitro nucleosome occupancy[8] within a 20 kb region of chromosome 14. (C) and (D) Correlation of the 14 feature model score with measured in vivo nucleosome occupancy in yeast (C) and with the Kaplan model across chr10-16 (test set) (D).
Figure 3
Figure 3
Correlation of each of the 14 features with nucleosome occupancy. (A) Graphic illustration of the correlation of each of the 14 sequence features with nucleosome occupancy in vitro and in vivo across the yeast genome (data from Kaplan et al.[8]). (B-D) Scatter plots showing performance of linear models on test set using only G+C content (B), AAAA occurrence (C), or both (D) as inputs. (E) Kaplan model score vs. proportion of G+C over all 150 bp tiling windows in the yeast genome.
Figure 4
Figure 4
Correlation of DNA structural parameters, calculated as the average over a 150-base window, with nucleosome occupancy in vitro and in vivo. Calculations were made using dinucleotide and other coefficients obtained from the PROPERTY database http://srs6.bionet.nsc.ru/srs6bin/cgi-bin/wgetz?-page+LibInfo+-newId+-lib+PROPERTY. Nucleosome occupancy data are from Kaplan et al.[8] and Lee et al.[2]. Pearson correlation is shown.
Figure 5
Figure 5
Relative nucleosome preference of different subsets of synthetic 150-mers. (A) and (B) Dependence of relative nucleosome preference (as log2(occupancy ratio)) on G+C content (A) and maximum poly-A length (B). Oligonucleotides categorized as "Neutral %G+C" in (B) are those with 45-55% G+C. Graph below shows the frequency of the selected attribute in the oligonucleotides analyzed, and also the human and yeast genomes. (C) Dependence of relative occupancy on poly-A content and CpG status. Poly-A containing oligonucleotides are defined as containing at least four consecutive adenine bases. CpG oligonucleotides are defined as having a G+C content ≥50%, with an observed/expected CpG ratio ≥0.6 (Obs/Exp CpG = Number of CpG * N/(num G * num C), where N = length of sequence[37]). The sequencing readout (rather than array readout) data from the Kaplan paper was used in this analysis. On all box plots, whiskers indicate 10th and 90th percentiles.

Similar articles

Cited by

References

    1. Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature. 1997;389(6648):251–260. doi: 10.1038/38444. - DOI - PubMed
    1. Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C. A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet. 2007;39(10):1235–1244. doi: 10.1038/ng2117. - DOI - PubMed
    1. Groth A, Rocha W, Verreault A, Almouzni G. Chromatin challenges during DNA replication and repair. Cell. 2007;128(4):721–733. doi: 10.1016/j.cell.2007.01.030. - DOI - PubMed
    1. Li B, Carey M, Workman JL. The role of chromatin during transcription. Cell. 2007;128(4):707–719. doi: 10.1016/j.cell.2007.01.015. - DOI - PubMed
    1. Yuan GC, Liu YJ, Dion MF, Slack MD, Wu LF, Altschuler SJ, Rando OJ. Genome-scale identification of nucleosome positions in S. cerevisiae. Science. 2005;309(5734):626–630. doi: 10.1126/science.1112178. - DOI - PubMed

Publication types

LinkOut - more resources