Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 16:9:440.
doi: 10.1186/1471-2105-9-440.

PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data

Affiliations

PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data

Laurent Bréhélin et al. BMC Bioinformatics. .

Abstract

Background: Of the 5,484 predicted proteins of Plasmodium falciparum, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes.

Results: We present PlasmoDraft http://atgc.lirmm.fr/PlasmoDraft/, a database of Gene Ontology (GO) annotation predictions for P. falciparum genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a Guilt By Association method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all P. falciparum genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2,434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (e.g. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1,905 and 1,540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%).

Conclusion: All predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The neighbors view. Profiles of the K nearest characterized neighbors that support (white), or does not support (gray), the prediction of gene PFL0020w in term Adhesion to other organism during symbiotic interaction (GO:0051825) for the Leroch et al. (2003) data source [14]. For comparison purpose, profiles of the K nearest uncharacterized neighbors (yellow) are also reported.
Figure 2
Figure 2
An extract of the Biological Process global view. This view presents a summary of all of the best GDBs and TDRs that are associated with each GO term and data source. Clicking on any term opens the corresponding GO term view.
Figure 3
Figure 3
An extract of the predictions achieved in term "adhesion to other organism during symbiotic interaction" (GO:0051825). The "no" entry indicates that the data source does not support the prediction, while "-" means that no data are available in the source for this gene. By clicking on a TDR, the K characterized nearest neighbors that support/do not support this prediction are shown (see Figure 1). Clicking on any gene opens the corresponding gene view.
Figure 4
Figure 4
An extract of predictions achieved for gene PFD1015c in the BP ontology. The "no" entry indicates that the data source does not support the prediction, while "-" means that no data are available in the source for this gene. By clicking on a TDR, the K characterized nearest neighbors that support/do not support this prediction are shown (Figure 1). Clicking on any term opens the corresponding GO term view.
Figure 5
Figure 5
Gonna performance on yeast. Gonna was applied to the transcriptomic data set published by Spellman et al. (1998) [34] using experimental evidence code annotations only as prior knowledge database. TDRs of all BP GO terms where predictions are proposed by Gonna are plotted as a function of the prior probability of the terms. Red and black points indicate significant and non-significant TDRs, respectively.
Figure 6
Figure 6
Gonna performances on the transcriptomic data set published in Bozdech et al. (2003) [27]. TDRs of all BP GO terms where predictions are proposed by Gonna are plotted as a function of the prior-probability of the terms. Red and black points indicate significant and non-significant TDRs, respectively.
Figure 7
Figure 7
Estimate of the amount of new information supplied in PlasmoDraft. Estimates for the BP (up) CC (middle) and MF (down) ontologies. Red, blue and green lines represent the results achieved with GDB thresholds of 75%, 50% and 25%, respectively. The x-axis gives the prior probabilities of the terms, while the y-axis (in log scale) reports the number of uncharacterized genes in the ontology that have been predicted with a GDB above the threshold, on a GO term with prior probability below x.
Figure 8
Figure 8
Estimate of the amount of new information supplied by the transcriptomic data source of Bozdech et al. (2003) [27] and the interactomic data source of LaCount et al. (2005) [33]. Red, blue and green lines represent the results achieved with TDR thresholds of 75%, 50% and 25%, respectively. The x-axis gives the prior probabilities of the terms, while the y-axis (in log scale) reports the number of uncharacterized genes in the ontology that have been predicted with a TDR above the threshold, on a GO term with prior probability below x.

Similar articles

Cited by

References

    1. Sachs J, Malancy P. The economic and social burden of malaria. Nature. 2002;415:680–685. doi: 10.1038/415680a. - DOI - PubMed
    1. Gardner M, Hall N, Fung E, White O, Berriman M, Hyman R, Carlton J, Pain A, Nelson K, Bowman S, Paulson I, James K, Eisen J, Rutherford K, Salzberg S, Craig A, Kyes S, Chan M, None V, Shallom S, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather M, Vaidya A, Martin D, Fairlamb A, Fraunholz M, Roos D, Ralph S, McFadden G, Cummings L, Subramanian G, Mungall C, Venter J, Carucci D, Hoffman S, Newbold C, Davis R, Fraser C, Barrell B. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419:498–511. doi: 10.1038/nature01097. - DOI - PMC - PubMed
    1. Pizzi E, Frontali C. Low-complexity regions in Plasmodium falciparum proteins. Genome Res. 2001;11:218–229. doi: 10.1101/gr.GR-1522R. - DOI - PMC - PubMed
    1. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Sonnhammer E, Eddy S, Birney E, Bateman A, Durbin R. Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 1998;26:320–322. doi: 10.1093/nar/26.1.320. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources