Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Sep;24(9):1151-61.
doi: 10.1038/nbt1239.

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements

MAQC Consortium  1 Leming ShiLaura H ReidWendell D JonesRichard ShippyJanet A WarringtonShawn C BakerPatrick J CollinsFrancoise de LonguevilleErnest S KawasakiKathleen Y LeeYuling LuoYongming Andrew SunJames C WilleyRobert A SetterquistGavin M FischerWeida TongYvonne P DraganDavid J DixFelix W FruehFrederico M GoodsaidDamir HermanRoderick V JensenCharles D JohnsonEdward K LobenhoferRaj K PuriUwe SchrfJean Thierry-MiegCharles WangMike WilsonPaul K WolberLu ZhangShashi AmurWenjun BaoCatalin C BarbacioruAnne Bergstrom LucasVincent BertholetCecilie BoysenBud BromleyDonna BrownAlan BrunnerRoger CanalesXiaoxi Megan CaoThomas A CebulaJames J ChenJing ChengTzu-Ming ChuEugene ChudinJohn CorsonJ Christopher CortonLisa J CronerChristopher DaviesTimothy S DavisonGlenda DelenstarrXutao DengDavid DorrisAron C EklundXiao-hui FanHong FangStephanie Fulmer-SmentekJames C FuscoeKathryn GallagherWeigong GeLei GuoXu GuoJanet HagerPaul K HajeJing HanTao HanHeather C HarbottleStephen C HarrisEli HatchwellCraig A HauserSusan HesterHuixiao HongPatrick HurbanScott A JacksonHanlee JiCharles R KnightWinston P KuoJ Eugene LeClercShawn LevyQuan-Zhen LiChunmei LiuYing LiuMichael J LombardiYunqing MaScott R MagnusonBotoul MaqsodiTim McDanielNan MeiOla MyklebostBaitang NingNatalia NovoradovskayaMichael S OrrTerry W OsbornAdam PapalloTucker A PattersonRoger G PerkinsElizabeth H PetersRon PetersonKenneth L PhilipsP Scott PineLajos PusztaiFeng QianHongzu RenMitch RosenBarry A RosenzweigRaymond R SamahaMark SchenaGary P SchrothSvetlana ShchegrovaDave D SmithFrank StaedtlerZhenqiang SuHongmei SunZoltan SzallasiZivana TezakDanielle Thierry-MiegKarol L ThompsonIrina TikhonovaYaron TurpazBeena VallanatChristophe VanStephen J WalkerSue Jane WangYonghong WangRuss WolfingerAlex WongJie WuChunlin XiaoQian XieJun XuWen YangLiang ZhangSheng ZhongYaping ZongWilliam Slikker Jr
Affiliations
Comparative Study

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements

MAQC Consortium et al. Nat Biotechnol. 2006 Sep.

Abstract

Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Repeatability of expression signal within test sites. For the one-color platforms, the CV of the expression signal values between site replicates of the same sample type was calculated for all generally detected genes. The distributions of these replicate CVs are presented in a series of twelve box and whiskers plots for each microarray platform: one for each of the four sample types at the three test sites. The plots are highlighted to distinguish the sample replicates: sample A (white), sample B (light blue), sample C (light purple) and sample D (dark blue). The twelve plots showing results from the platforms with three test sites are presented in the following order from left to right: A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2 and D3. For the two-color NCI platform, the CV of the expression Cy3/Cy5 ratios between site replicates of the same sample type was similarly calculated. The distributions of these replicate CVs are presented in a series of eight box and whiskers plots from the two NCI test sites in the following order from left to right: A1, A2, B1, B2, C1, C2, D1, and D2. The median (gap), interquartile range as well as the 10th and 90th percentile values are indicated in each plot. Only genes from the 12,091 common set that were detected in at least three of the replicates were included in the box plots and CV calculations. This number varies by platform/sample/test site and is noted as the line plot with the secondary axis and as Table S6 in Supplementary Data online. The platforms and sample types are labeled according to the nomenclature presented in Table 1.
Figure 2
Figure 2
Signal variation within and between test sites. For each of the four sample types, the replicate CV of signal within a test site (blue bar) and the total CV of signal across and within sites (red bar) are presented. As in Figure 1, genes detected in at least three of the replicates of a sample type at a single test site are included in the replicate CV calculation. Genes present in the intersection of these gene lists are included in the total CV calculation. (These gene lists are therefore slightly different than those in Figure 1.) The number of such genes within each platform and sample type is noted by blue dots connected by lines and is read on the secondary axis. It is also reported as Table S6 in Supplementary Data online. Intrasite normalization was performed according to default settings for each manufacturer, and intersite normalization was performed by scaling between sites (see main text). The NCI platform is omitted because data from only two test sites was available in the main study so intersite reproducibility measures may not be representative. The platforms and sample types are labeled according to the nomenclature presented in Table 1.
Figure 3
Figure 3
Concordance of detection calls within and between test sites. For the 12,091 common genes, detection calls within each platform were categorized as either ‘detected’ or ‘not detected.’ For each sample type within each platform, the percentage of genes with calls that were perfectly concordant as ‘detected’ within the replicates for a given site is plotted as blue dots, and the corresponding percentage of genes with calls perfectly concordant as ‘detected’ across all sites are plotted as the blue bars. The total percentage of genes with perfectly concordant calls (detected and not detected) within a site is plotted as the yellow dots, and the corresponding percentage of genes with calls perfectly concordant across all sites is plotted as the top of the yellow bars. The bars are split between perfectly detected genes (blue portion) and perfectly not detected genes (yellow portion) across all test sites. It is not expected that detected genes are concordant across sample types. The number of perfectly detected genes for each test site is provided as Table S6 in Supplementary Data online. As described in the main text, the stringency with which individual platforms determine that the data for a gene is sufficiently reliable to be called detected has different manufacturer defaults, leading to altered concordance percentages. Changes in the settings for sensitivity/specificity may shift the proportion of the bar assigned to each detection category. Because reliability depends on platform-specific details, detected calls do not correspond directly to relative abundance and may vary between platforms. Note: as some platforms have removed outlier hybridizations, the number of replicates within (n ≤ 5) and between sites (n ≤ 15) varies for determining concordance.
Figure 4
Figure 4
Agreement of gene lists. This graph indicates the concordance of genes identified as differentially expressed for pairs of test sites, labeled as X and Y. A list of differentially expressed genes between sample type A replicates versus sample type B replicates was generated for each test site (using the 12,091 common genes with ≥ twofold change and P < 0.001 thresholds) and compared for commonality to other test sites. The size of these gene lists is reported as Table S7 in Supplementary Data online. No filtering related to the qualitative detection call was performed. The color of the square in the matrix reflects the percent overlap of genes on the list for the test site Y (listed in row) that are also present on the list for the test site X (listed in column). A light-colored square indicates a high percent overlap between the gene lists at both test sites. A dark-colored square indicates a low percent overlap, suggesting that most genes identified in site Y were not identified in site X. Numerical values for the percent overlap are presented as Table S9 in Supplementary Data online. Note: the graph is asymmetric and not complementary. Only the six high-density microarray platforms are presented. As described in the text, data from some platforms were omitted from these calculations because of quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site location.
Figure 5
Figure 5
Agreement of log ratios across platforms and test sites. (a) Log ratio compression/expansion. This graph indicates the percent difference from equivalency between platform/sites (corresponding to a slope value 1 for the best fitted line using orthogonal regression) of the log ratio differential expression using A and B replicates. A dark spot implies equivalency (slope = 1 → percent difference = 0). A positive percent difference in slope from the ideal line (aqua) indicates compression of log signal for test site Y relative to test site X. A negative percent difference in the ideal line (magenta) indicates expansion. Read as “What is the difference from equivalence in slope (m = 1) for the test site Y versus test site X ?” Only genes detected by both test sites in at least three replicates of sample type A and three replicates of sample type B are included in the calculation, and the number for each pair is reported as Table S8 in Supplementary Data online. Numerical values for the percent difference are presented as Table S10 in Supplementary Data online. Note: the graph is asymmetric, but approximately complementary. As described in the text, data from some platforms were omitted from these calculations due to quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site location, (b) Rank correlation of log ratios. This graph indicates the correlation of the log ratio differential expression values (using A versus B replicates) when we examine their rank. Large positive log ratio values would be ranked high and large negative log ratio values would be ranked low. Read as “What is the correlation of the rank log ratio values between the test site Y and the test site X?” Only genes generally detected in both sample types A and B and by both test sites are included in the calculation, and the number for each pair is reported as Table S8 in Supplementary Data online. Numerical values for the rank correlation are presented as Table S11 in Supplementary Data online. Note: the graph is symmetric. As described in the text, data from some platforms were omitted from these calculations due to quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The _1, _2 and _3 suffixes refer to test site location.
Figure 6
Figure 6
Correlation between microarray and TaqMan data. The scatter plots compare the log ratio differential expression values (using A versus B replicates) from each microarray platform relative to values obtained by TaqMan assays. Each point represents a gene that was measured on both the microarray and TaqMan assays. The spot coloring indicates whether the data were generated in test site 1 (black), test site 2 (blue) or test site 3 (red) for the microarray platform. Only genes that were generally detected in sample type A replicates and sample type B replicates were used in the comparisons. The exact number of probes analyzed for each test site and its correlation to TaqMan assays are listed in the bottom right corner of each plot. As described in the text, data from some platforms were omitted from these calculations because of quality issues. The platforms and sample types are labeled according to the nomenclature presented in Table 1. The line shown is the ideal 45° line.

Comment in

Similar articles

Cited by

References

    1. Lesko LJ, Woodcock J. Translation of pharmacogenomics and pharmacogenetics: a regulatory perspective. Nat. Rev. Drug Discov. 2004;3:763–769. - PubMed
    1. Frueh FW. Impact of microarray data quality on genomic data submissions to the FDA. Nat. Biotechnol. 2006;24:1105–1107. - PubMed
    1. Dix DJ, et al. A framework for the use of genomics data at the EPA. Nat. Biotechnol. 2006;24:1108–1111. - PubMed
    1. Tan PK, et al. Evaluation of gene expression measurements from commercial micro-array platforms. Nucleic Acids Res. 2003;31:5676–5684. - PMC - PubMed
    1. Ramalho-Santos M, Yoon S, Matsuzaki Y, Mulligan RC, Melton DA. “Stemness”: transcriptional profiling of embryonic and adult stem cells. Science. 2002;298:597–600. - PubMed

Publication types

MeSH terms

Associated data