Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jul 1;23(13):i195-204.
doi: 10.1093/bioinformatics/btm200.

Optimized design and assessment of whole genome tiling arrays

Affiliations

Optimized design and assessment of whole genome tiling arrays

Stefan Gräf et al. Bioinformatics. .

Abstract

Motivation: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential.

Results: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs.

Availability: Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: none declared.

Figures

Fig. 1
Fig. 1
Design strategy. (A) The genomic sequence is subdivided in unit-sized windows. Within each window, all minimum unique substrings with length ≤K are determined. These are the basis for the uniqueness scoring to design optimized probes. (B) Uniqueness scoring function (exemplified by Mus musculus, chr17:3028401-3028500). The shown sequences represent all the minimum unique prefixes for a unit window. In each window of seed length h, the uniqueness score is calculated by counting the number of minimum unique substrings. For the windows shown, the uniqueness scores are 7, 9, 7, 4, 1, 0. The minimum unique substrings that add to the score are indicated by stars.
Fig. 2
Fig. 2
The distribution of U with respect to the number of genome-wide hybridization-quality BLAT alignments for a large set of 50mer probes. The box-and-whiskers plot represents the median value of U by a bold line and the first and third quartiles of the U distribution are represented by the outline of the box. Whiskers represent the largest and smallest values of U within 1.5 × IQR (inter quartile range).
Fig. 3
Fig. 3
Probe selection algorithm.
Fig. 4
Fig. 4
Density plots of optimized characteristics for our high-coverage and high-uniqueness tiling array designs and comparison to commercial whole-genome tiling arrays. (A) The full design uniqueness score per base, Tm distribution and the uniqueness score per base for the disjoint subsets represented by the non-repetitive and repetitive portions of the mouse genome for our high-coverage U>0 design containing 19 343 498 probes in the entire design of which 10 565 728 probes are in regions not identified as repetitive and 8 777 770 probes are in repetitive regions; (B) The full design uniqueness score per base, Tm distribution and the uniqueness score per base for the disjoint subsets represented by the non-repetitive and repetitive portions of the mouse genome for our high-uniqueness U>15 design containing 15 658 735 probes in the entire design of which 10 213 493 probes are in regions not identified as repetitive and 5 445 242 probes are in repetitive regions; (C) The full design uniqueness score per base and the Tm distribution for the NimbleGen 50mers in 100 bp windows whole-genome design containing 14 579 139 probes designed to the non-repetitive portion of the genome and (D) The full design uniqueness score per base and the Tm distribution for the Affymetrix 25mers in 35 bp windows whole-genome design containing 38 346 501 probes designed to the non-repetitive portion of the genome. See Table 1 for additional design information.

Similar articles

Cited by

  • Ensembl 2009.
    Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P. Hubbard TJ, et al. Nucleic Acids Res. 2009 Jan;37(Database issue):D690-7. doi: 10.1093/nar/gkn828. Epub 2008 Nov 25. Nucleic Acids Res. 2009. PMID: 19033362 Free PMC article.
  • Sequence characteristics define trade-offs between on-target and genome-wide off-target hybridization of oligoprobes.
    Matveeva OV, Ogurtsov AY, Nazipova NN, Shabalina SA. Matveeva OV, et al. PLoS One. 2018 Jun 21;13(6):e0199162. doi: 10.1371/journal.pone.0199162. eCollection 2018. PLoS One. 2018. PMID: 29928000 Free PMC article.
  • Molecular maps of the reorganization of genome-nuclear lamina interactions during differentiation.
    Peric-Hupkes D, Meuleman W, Pagie L, Bruggeman SW, Solovei I, Brugman W, Gräf S, Flicek P, Kerkhoven RM, van Lohuizen M, Reinders M, Wessels L, van Steensel B. Peric-Hupkes D, et al. Mol Cell. 2010 May 28;38(4):603-13. doi: 10.1016/j.molcel.2010.03.016. Mol Cell. 2010. PMID: 20513434 Free PMC article.
  • Early life adversity alters normal sex-dependent developmental dynamics of DNA methylation.
    Massart R, Nemoda Z, Suderman MJ, Sutti S, Ruggiero AM, Dettmer AM, Suomi SJ, Szyf M. Massart R, et al. Dev Psychopathol. 2016 Nov;28(4pt2):1259-1272. doi: 10.1017/S0954579416000833. Epub 2016 Sep 30. Dev Psychopathol. 2016. PMID: 27687908 Free PMC article.
  • Custom design and analysis of high-density oligonucleotide bacterial tiling microarrays.
    Thomassen GO, Rowe AD, Lagesen K, Lindvall JM, Rognes T. Thomassen GO, et al. PLoS One. 2009 Jun 17;4(6):e5943. doi: 10.1371/journal.pone.0005943. PLoS One. 2009. PMID: 19536279 Free PMC article.

References

    1. Bertone P, et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004;306:2242–2246. - PubMed
    1. Bertone P, et al. Design optimization methods for genomic DNA tiling arrays. Genome Res. 2006;16:271–281. - PMC - PubMed
    1. Bloomfield VA, et al. Nucleic Acids: Structures, Properties, and Functions. University Science Books; Herndon, VA, USA: 2000.
    1. Buck MJ, Lieb JD. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics. 2004;83:349–360. - PubMed
    1. Buck MJ, et al. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 2005;6:R97. - PMC - PubMed

Publication types