Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Aug 31:6:217.
doi: 10.1186/1471-2105-6-217.

Automation of gene assignments to metabolic pathways using high-throughput expression data

Affiliations

Automation of gene assignments to metabolic pathways using high-throughput expression data

Liviu Popescu et al. BMC Bioinformatics. .

Abstract

Background: Accurate assignment of genes to pathways is essential in order to understand the functional role of genes and to map the existing pathways in a given genome. Existing algorithms predict pathways by extrapolating experimental data in one organism to other organisms for which this data is not available. However, current systems classify all genes that belong to a specific EC family to all the pathways that contain the corresponding enzymatic reaction, and thus introduce ambiguity.

Results: Here we describe an algorithm for assignment of genes to cellular pathways that addresses this problem by selectively assigning specific genes to pathways. Our algorithm uses the set of experimentally elucidated metabolic pathways from MetaCyc, together with statistical models of enzyme families and expression data to assign genes to enzyme families and pathways by optimizing correlated co-expression, while minimizing conflicts due to shared assignments among pathways. Our algorithm also identifies alternative ("backup") genes and addresses the multi-domain nature of proteins. We apply our model to assign genes to pathways in the Yeast genome and compare the results for genes that were assigned experimentally. Our assignments are consistent with the experimentally verified assignments and reflect characteristic properties of cellular pathways.

Conclusion: We present an algorithm for automatic assignment of genes to metabolic pathways. The algorithm utilizes expression data and reduces the ambiguity that characterizes assignments that are based only on EC numbers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pathway graphs. Left: the pathway relation graph. Each pathway is represented as a node, and an edge is drawn between two pathways for each reaction that they share in common. Middle: the pathway conflict graph. Thick edges represent conflicts (i.e. the same gene was assigned to catalyze the same reaction in both pathways connected by the edge). Right: the final conflict graph. The edge between pathways P9 and P10 is a flat edge (no alternative assignments exist for that reaction) and therefore it is unmarked. At the end we are left with only two connected components with possibly solvable conflicts.
Figure 2
Figure 2
The Isoleucine Biosynthesis pathway diagram. The pathway layout is retrieved from the MetaCyc database. For each reaction we list the genes that can catalyze the reaction. A plus or minus sign indicates if the gene was assigned to the pathway in SGD. The expression profiles and their similarity score are shown for selected pairs of genes. Mapping between gene names and Biozon identifiers is given in Table 6.
Figure 3
Figure 3
The Isoleucine Biosynthesis pathway from the reconstructed metabolic network of Saccharomyces Cerevisiae [18]. Reproduced with permission from Cold Spring Harbor Laboratory ©2004 (Duarte et al. 2004 [18]). The EC numbers and the genes associated with the reactions were added to diagram. The part that overlaps with the MetaCyc isoleucine biosynthesis pathway is circled.
Figure 4
Figure 4
The folic acid biosynthesis pathway diagram. See Figure 2 for description. Note that FOL1, ADE3 and MIS1 are multi-functional enzymes.
Figure 5
Figure 5
The folic acid biosynthesis pathway from the reconstructed metabolic network of Saccharomyces Cerevisiae [18]. Reproduced with permission from Cold Spring Harbor Laboratory ©2004 (Duarte et al. 2004 [18]). The EC numbers and the genes associated with the reactions were added to diagram. The parts that overlap with the MetaCyc folic acid biosynthesis pathway are circled. The green circles indicate consistency while the red one indicates inconsistency.
Figure 6
Figure 6
The asparagine biosynthesis pathway. See Figure 2 for description. Both ASN1 and ASN2 are correlated with AAT2 but are anti-correlated with AAT1 (selected pairwise similarities are shown). The later is localized to a different cellular compartment than the others, and is likely to be involved in other pathways (see text for details).
Figure 7
Figure 7
The asparagine biosynthesis pathway from the reconstructed metabolic network of Saccharomyces Cerevisiae 18. Reproduced with permission from Cold Spring Harbor Laboratory ©2004 (Duarte et al. 2004 [18]). The part that overlaps with the MetaCyc asparagine biosynthesis pathway is circled.

Similar articles

Cited by

References

    1. Selkov E, Galimova M, Goryanin I, Gretchkin Y, Ivanova N, Komarov Y, Maltsev N, Mikhailova N, Nenashev V, Overbeek R, Panyushkina E, Pronevitch L, Selkov JE. The metabolic pathway collection: an update. Nucleic Acids Res. 1997;25:37–38. - PMC - PubMed
    1. Selkov JE, Grechkin Y, Mikhailova N, Selkov E. MPW: the Metabolic Pathways Database. Nucleic Acids Res. 1998;26:43–45. - PMC - PubMed
    1. Overbeek R, Larsen N, Pusch GD, D'Souza M, Jr ES, Kyrpides N, Fonstein M, Maltsev N, Selkov E. WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 2000;28:123–125. - PMC - PubMed
    1. Ellis LBM, Hou BK, Kang W, Wackett LP. The University of Minnesota Biocatalysis/Biodegradation Database: post-genomic data mining. Nucleic Acids Res. 2003;31:262–265. - PMC - PubMed
    1. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004:D277–280. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources