Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Jul;156(3):1300-15.
doi: 10.1104/pp.110.167809. Epub 2011 Apr 29.

DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes

Affiliations
Comparative Study

DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes

Czuee Morey et al. Plant Physiol. 2011 Jul.

Abstract

The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A and B, AFE profiles in the vicinity of the TSS for all five chromosomes of Arabidopsis (A) and six representative (even numbered) chromosomes of rice (B). The AFE values for upstream, downstream, and full-length shuffled sequences are shown as dashed lines. C, Comparison of free energy profiles (shown in red) and percentage AT occurrence (shown in green) over the region −500 to +500 bp with respect to (w.r.t.) TSS for chromosome 1 of Arabidopsis and rice.
Figure 2.
Figure 2.
Percentage frequency distribution plots showing the distance of promoter predictions from TSS in 50-nucleotide bins for protein-coding genes (A) and ncRNA genes (B). For protein-coding genes, the predictions within −500 to +100 bp with respect to (w.r.t.) TSS, and for ncRNA genes, predictions within −1,000 to 0 bp with respect to TSS, are considered where position 0 corresponds to the TSS. [See online article for color version of this figure.]
Figure 3.
Figure 3.
AFE plots for sequences from each frequency class from Figure 2 for Arabidopsis (A) and rice (B). It is seen that the predictions occurring in each frequency class correspond to peaks in AFE profiles at a particular distance from the TSS. The plots depict the AFE for sequences with the closest prediction present at a given distance (−500 to +100 bp with respect to [w.r.t.]). The color code used to depict the AFE profile, for sequences with predictions in each 50-nucleotide bin, is indicated in the box at right.
Figure 4.
Figure 4.
FP prediction distribution. The frequency distribution of FPpred. is shown from each score category found in various regions of the primary transcript as a percentage of the total FPpred. in each category for Arabidopsis (A) and rice (B) genomes. The majority of predictions for each category lie in the intronic region.
Figure 5.
Figure 5.
Genes without a prediction in the TP region (FNgenes). AFE profile comparison is shown between FNgenes and all genes of chromosome 1 with respect to TSS for Arabidopsis (A) and rice (B). The number of genes considered in each case is indicated in parentheses. [See online article for color version of this figure.]
Figure 6.
Figure 6.
Classification of gene families, metabolic pathway genes, and genes from specific GO terms for Arabidopsis (A) and rice (B) according to the TP with the highest prediction score present within −500 to +100 bp of the TSS. The distribution of the score categories is presented as a percentage of the TPgenes present in that category. The number adjacent to each bar indicates the number of TPgenes. [See online article for color version of this figure.]
Figure 7.
Figure 7.
Correlation between prediction scores for orthologous genes. The highest prediction scores corresponding to 11,941 TPgenes in Arabidopsis have been plotted against the scores for their 10,275 TPgene orthologs in rice. Since there is more than one Arabidopsis gene ortholog for some rice genes, 12,359 pairs of orthologous genes were formed. The 9,976 orthologous gene pairs with scores in the same class or differing by one level in the two genomes (crosses) give a Pearson correlation coefficient of 0.51 (dotted-dashed best fit line), while a value of 0.23 is obtained for all gene pairs (crosses and dots; solid best fit line). [See online article for color version of this figure.]
Figure 8.
Figure 8.
Promoter predictions for six orthologous genes are shown for Arabidopsis (blue) and rice (red). The TSSs of the orthologs are aligned and correspond to nucleotide position 0 on the x axis. The orthologous genes are shown schematically at the bottom. Gray bars represent UTRs, thin black bars correspond to introns, and brown bars represent exons. The y axis indicates the Dmax score of the prediction. Only predictions within −500 to +100 bp of the TSS are true positives in each case. The six representative genes shown are Asp aminotransferase (A), copper/zinc (Cu/Zn) superoxide dismutase (B), Dof gene family (C), P-type ATPase (D), FAD2 (E), and PRF1 (F). The first intron for Arabidopsis genes in E and F has been shown to have regulatory functions. A ncRNA gene coincides with the first intron of the rice FAD2 gene as shown in E.

Similar articles

Cited by

References

    1. Abeel T, Saeys Y, Bonnet E, Rouzé P, Van De Peer Y. (2008a) Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res 18: 310–323 - PMC - PubMed
    1. Abeel T, Saeys Y, Rouzé P, Van De Peer Y. (2008b) ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24: i24–i31 - PMC - PubMed
    1. Abeel T, Van De Peer Y, Saeys Y. (2009) Toward a gold standard for promoter prediction evaluation. Bioinformatics 25: i313–i320 - PMC - PubMed
    1. Alexandrov N, Troukhan M, Brover V, Tatarinova T, Flavell R, Feldmann K. (2006) Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol Biol 60: 69–85 - PubMed
    1. Allawi HT, Santalucia J. (1997) Thermodynamics and NMR of internal G-T mismatches in DNA. Biochemistry 36: 10581–10594 - PubMed

Publication types

LinkOut - more resources