Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jul 11;15(7):e1007084.
doi: 10.1371/journal.pcbi.1007084. eCollection 2019 Jul.

Machine and deep learning meet genome-scale metabolic modeling

Affiliations
Review

Machine and deep learning meet genome-scale metabolic modeling

Guido Zampieri et al. PLoS Comput Biol. .

Abstract

Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Omic data–integration methods in machine learning.
Multiview omic data–integration methods can be classified into three main domains. (a) Concatenation-based (early-stage) integration involves combining all omic data into one large matrix before applying ML methods to obtain a data-driven model. (b) Transformation-based (intermediate-stage) integration involves applying data transformations to obtain a uniform format, which can then permit the combination into one fused dataset. (c) Model-based (late-stage) integration involves obtaining individual machine learning models separately for each dataset before combining the outcomes rather than combining data prior to the learning phase. ML, machine learning.
Fig 2
Fig 2. Constraint-based data integration and fluxome generation.
(a) Constraint-based metabolic modeling begins with the construction of a manually curated GSMM recording all reactions taking place in the network. (b) Coded within the structure of a GSMM is the stoichiometric matrix S, denoting the involvement of metabolites in each reaction. Constraints are applied to the model to identify a given metabolic goal, represented as the objective function c, and linear or quadratic optimization is used to maximize or minimize this objective. The steady-state assumption (Sv = 0) sets the product of the stoichiometric matrix S and flux vector v as invariant. (c) To compute a unique flux distribution, the objective function can be regularized by subtracting a concave function from it. In addition to v being restricted between default lower and upper limits (vmin and vmax), external multiomic data θ can be used to further constrain fluxes using the mapping function φ(θ), hence driving the output toward condition-dependent solutions. GSMM, genome-scale metabolic model.
Fig 3
Fig 3. Multiomic data analysis by combination of constraint-based modeling with machine learning.
(a) Fluxomic analysis involves FBA or related techniques performed on a general-purpose GSMM, from which the flux data obtained can be used as input for unsupervised or supervised machine learning. (b) To improve the accuracy of machine learning predictions, multiomic datasets are obtained using high-throughput analytics—e.g., transcriptomics (DNA microarrays, RNA sequencing), proteomics (2D gel electrophoresis, stable isotope labeling, mass spectrometry), or metabolomics (NMR spectroscopy, isotopic labeling, LC-MS, GC-MS). As these datasets are obtained from different sources, they must undergo several preprocessing stages such as filtration and normalization to maintain synchronicity, account for variance, and reduce noise. Condition-specific knowledge-based models are generated by introducing these multiple datasets into GSMMs to obtain more precise flux estimations, from which machine learning techniques can be applied to infer biologically relevant patterns in the data. (c) Alternatively, machine learning can be directly applied to single- or multiomic datasets to produce or improve GSMMs or fluxomic data. FBA, flux balance analysis; GC-MS, gas chromatography–mass spectroscopy; GSMM, genome-scale metabolic model; LC-MS, liquid chromatography–mass spectroscopy; NMR, nuclear magnetic resonance.

Similar articles

Cited by

References

    1. Joyce AR, Palsson BØ. The model organism as a system: integrating 'omics' data sets. Nature reviews Molecular cell biology. 2006;7(3):198 10.1038/nrm1857 - DOI - PubMed
    1. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype–phenotype interactions. Nature Reviews Genetics. 2015;16(2):85 10.1038/nrg3868 - DOI - PubMed
    1. Macaulay IC, Ponting CP, Voet T. Single-cell multiomics: multiple measurements from single cells. Trends in Genetics. 2017;33(2):155–168. 10.1016/j.tig.2016.12.003 - DOI - PMC - PubMed
    1. Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nature Reviews Genetics. 2015;16(6):321 10.1038/nrg3920 - DOI - PMC - PubMed
    1. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, et al. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface. 2018;15(141):20170387. - PMC - PubMed

Publication types