Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998;14(1):48-54.
doi: 10.1093/bioinformatics/14.1.48.

Combining evidence using p-values: application to sequence homology searches

Affiliations

Combining evidence using p-values: application to sequence homology searches

T L Bailey et al. Bioinformatics. 1998.

Abstract

Motivation: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches.

Results: In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

PubMed Disclaimer

Comment in

  • Concerning the accuracy of MAST E-values.
    Bailey TL, Gribskov M. Bailey TL, et al. Bioinformatics. 2000 May;16(5):488-9. doi: 10.1093/bioinformatics/16.5.488. Bioinformatics. 2000. PMID: 10871274 No abstract available.

Similar articles

Cited by

Publication types

LinkOut - more resources