PathModel: inferring new biochemical reactions and metabolite structures to understand metabolic pathway drift
PathModel is a prototype to infer new biochemical reactions and new metabolite structures. The biological motivation for developing it is described in this article, published in iScience. You can also watch the associated presentation at the JOBIM2020 conference.
There is no guarantee that this script will work, it is a Work In Progress in early state.
Table of contents
Metabolic Pathway Drift hypothesizes that metabolic pathways can be conserved even if their biochemical reactions undergo variations. These variations can be non-orthologous displacement of genes or changes in enzyme order.
To test the hypothesis on the enzyme order drift, we develop PathModel to infer possible enzyme order changes in metabolic pathways.
PathModel is developed in ASP using the clingo grounder and solver. It is divided in three ASP scripts.
The first one, ReactionSiteExtraction.lp creates biochemical transformation from reactions. The biochemical transformation of a reaction corresponds to the atoms and bonds changes between the reactant and the product of the reaction.
When a reaction occurred between two molecules, the script will compare atoms and bonds of the two molecules of the reaction and will extract a reaction site before the reaction (composed of atoms and bonds that are present in the reactant but absent in the product) and a reaction site after the reaction (composed of atoms and bonds present in the product but absent in the reactant).
ReactionSiteExtraction produces two sites for each reaction (one before and one after the reaction). This corresponds to the biochemical transformation induced by the reaction.
A second script, MZComputation.lp will compute the MZ for each known molecule. It also computes the MZ changes between the reactant and the product of a reaction.
These data will be used by the third script: PathModel.lp.
PathModel uses the incremental mode from Clingo. Using a source molecule, it will apply two inference methods until it reaches a goal (another molecules).
PathModel is a Python3 package using Answer Set Programming (ASP) to infer new biochemical reactions and new metabolites structures. It is divided in two parts:
- a wrapper (pathmodel_wrapper.py) for the ASP programs (MZComputation.lp, ReactionSiteExtraction.lp and PathModel.lp).
- a plotting script (molecule_creation.py) to create pictures of molecules and pathways.
PathModel requires:
- clingo: which must be installed with Lua compatibility (a good way to have it is with conda).
- clyngor package (can be installed with clingo with clyngor-with-clingo package).
- networkx (with graphviz and pygraphviz).
- matplotlib package.
- rdkit package.
You can use the container from Singularity Hub.
# Choose your preference to pull the container from Singularity Hub (once)
singularity pull shub://pathmodel/pathmodel-singularity
# Enter it
singularity run pathmodel-singularity_latest.sif
pathmodel test -o output_folder
pathmodel_plot -i output_folder/MAA
pathmodel_plot -i output_folder/sterol
# Or use as a command line
singularity exec pathmodel-singularity_latest.sif pathmodel test -o output_folder
singularity exec pathmodel-singularity_latest.sif pathmodel_plot -i output_folder/MAA
singularity exec pathmodel-singularity_latest.sif pathmodel_plot -i output_folder/sterol
This container is build from this Singularity recipe. If you prefer, you can use this recipe:
singularity build pathmodel.sif Singularity
A docker image of pathmodel is available at dockerhub. This image is based on the pathmodel Dockerfile.
docker run -ti -v /path/shared/container:/shared --name="mycontainer" pathmodel/pathmodel bash
This command will download the image and create a container with a shared path. It will launch a bash terminal where you can use the command pathmodel (see Commands and Python import and Tutorial).
The package can be installed either using python setup or pip install (see below)
git clone https://github.com/pathmodel/pathmodel.git
cd PathModel
python setup.py install
If you have all the dependencies on your system, you can just download Pathmodel using pip.
pip install pathmodel
Due to all the dependencies required by the scripts of Pathmodel, we create a conda environment file that contains all dependencies.
First you need Conda. To avoid conflict between the conda python and your system python, you could use a conda environment and Miniconda.
If you want to test this, the first thing is to install miniconda:
# Download Miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Give the permission to the installer.
chmod +x Miniconda3-latest-Linux-x86_64.sh
# Install it at the path that you choose.
./Miniconda3-latest-Linux-x86_64.sh -p /path/where/miniconda/will/be/installed/ -b
# Delete installer.
rm Miniconda3-latest-Linux-x86_64.sh
# Add conda path to you bash settings.
echo '. /path/where/miniconda/is/installed/etc/profile.d/conda.sh' >> ~/.bashrc
# Will activate the environment.
# For more information: https://github.com/conda/conda/blob/master/CHANGELOG.md#440-2017-12-20
echo 'conda activate base' >> ~/.bashrc
After this you need to restart your terminal or use: source ~/.bashrc
Then you will get our conda environment file:
# Download our conda environment file from Pathmodel github page.
wget https://raw.githubusercontent.com/pathmodel/pathmodel/master/conda/pathmodel_env.yaml
# Use the file to create the environment and install all dependencies.
conda env create -f pathmodel.yaml
If no error occurs, you can now access a conda environment with pathmodel:
# Activate the environment.
conda activate pathmodel
# Launch the help of Pathmodel.
(pathmodel) pathmodel -h
You can exit the environment with:
# Deactivate the environment.
conda deactivate
Molecules are modelled with atoms (hydrogen excluded) and bonds (single and double).
atom("Molecule1",1,carb). atom("Molecule1",2,carb).
bond("Molecule1",single,1,2).
atom("Molecule2",1,carb). atom("Molecule2",2,carb). atom("Molecule2",3,carb).
bond("Molecule2",single,1,2). bond("Molecule2",single,2,3).
Reactions between molecules are represented as link between two molecules with a name:
reaction(reaction1,"Molecule1","Molecule2").
A common domain is needed to find which molecules share structure with others:
atomDomain(commonDomainName,1,carb). atomDomain(commonDomainName,2,carb).
bondDomain(commonDomainName,single,1,2).
A molecule source is defined:
source("Molecule1").
Initiation and goal of the incremental grounding must be defined:
init(pathway("Molecule1","Molecule2")).
goal(pathway("Molecule1","Molecule3")).
M/Z ratio can be added to check whether there is a metabolite that can be predict with this ratio. M/Z ratio must be multiplied by 10 000 because Clingo doesn't use decimals. An example with a M/Z of 270,272:
mzfiltering(2702720).
Molecules absent in the organism of study can be specified. They will not be used by the inference method.
absentmolecules("Molecule1").
Run PathModel prediction:
pathmodel infer -i data.lp -o output_folder -s 100
PathModel arguments:
- -i: input file
- -o: output folder
- -s: number of maximal steps before PathModel stops (to avoid endless run), by default at 100
If PathModel does not find the goal molecules before it reaches the number of maximal steps, it will send an error message.
Create picture representing the results (like new molecules inferred from M/Z ratio):
pathmodel_plot -i output_folder_from_pathmodel
In python (pathmodel_plot is not available in import call):
import pathmodel
pathmodel.pathmodel_analysis('data.lp', output_folder, step_limit=100)
With the infer command, pathmodel will use the data file and try to create an output folder:
output_folder
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp
data_pathmodel.lp contains intermediary files for PathModel. Specifically, it contains the input data and the results of ReactionSiteExtraction.lp (diffAtomBeforeReaction, diffAtomAfterReaction, diffBondBeforeReaction, diffBondAfterReaction, siteBeforeReaction, siteAfterReaction) and of MZComputation.lp (domain, moleculeComposition, moleculeNbAtoms, numberTotalBonds, moleculeMZ, reactionMZ). The python wrapper gives this file to PathModel.lp as input.
pathmodel_data_transformations.tsv contains all the transformation inferred from the input data and the ReactionSiteExtraction.lp script.
pathmodel_incremental_inference.tsv shows the step of the incremental mode of clingo when a new reaction has been inferred using a known transformation. It does not show the step when passing through a known reaction, so the first step number in the file scan be superior to 1.
pathmodel_output.lp is the output lp file of PathModel.lp (newreaction, predictatom, predictbond, reaction, inferred).
Then if you use the pathmodel_plot command on the output_folder, pathmodel will create the following structure:
output_folder
├── ...
├── molecules
├── Molecule1
├── Molecule2
├── ...
├── newmolecules_from_mz
├── Prediction_...
├── Prediction_...
├── ...
├── pathmodel_output.svg
molecules contains the structures of each molecules in the input data file.
newmolecules_from_mz contains the structures of inferred molecules using the MZ. It will be empty if no MZ were given or if no molecules were inferred.
pathmodel_output.svg shows the pathway containing the molecules and the reactions (in green) from the input files and the newly inferred molecules and reactions (in blue).
For this tutorial, we have created fictitious data available at test/pathmodel_test_data.lp.
In this file there is 5 molecules:
atom("molecule_1",1..4,carb). bond("molecule_1",single,1,2). bond("molecule_1",single,1,3). bond("molecule_1",single,2,3). bond("molecule_1",single,2,4). |
atom("molecule_2",1..4,carb). bond("molecule_2",single,1,2). bond("molecule_2",single,1,3). bond("molecule_2",single,2,3). bond("molecule_2",double,2,4). |
One reaction:
reaction(reduction, "molecule_1", "molecule_2"). |
One known MZ:
92,1341 (so 921341 for Clingo) | mzfiltering(921341). |
pathmodel infer -i pathmodel_test_data.lp -o output_folder
pathmodel_plot -i output_folder
By calling the command:
pathmodel infer -i pathmodel_test_data.lp -o output_folder
Pathmodel will create output files:
output_folder
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp
As explained in Output, data_pathmodel.lp is an intermediary file for Pathmodel.
pathmodel_data_transformations.tsv contains the transformation inferred from the knonw reactions, here:
reaction_id | reactant_substructure | product_substructure |
reduction | [('single', '2', '4')] | [('double', '2', '4')] |
This means that the reduction transforms a single bond between atoms 2 and 4 into a double bond. These transformations are used by the deductive and analogical reasoning of PathModel.
pathmodel_incremental_inference.tsv shows the new reactions inferred by PathModel and the step in Clingo incremental mode when the new reaction has been inferred.
infer_turn | new_reaction | reactant | product |
2 | reduction | "molecule_3" | "molecule_4" |
2 | reduction | "molecule_5" | "Prediction_921341_reduction" |
Two new reduction variant reactions have been inferred at step two of incremenetal mode:
- one between Molecule3 and Molecule4 inferred from the reduction between Molecule1 and Molecule2. This is a demonstration of the deductive reasoning of PathModel:
- one between Molecule5 and a newly inferred metabolite with the MZ of 92,1341. To find this, PathModel computes the MZ of Molecule5 (94,1489). Then it applies each transformations from its knowledge database (here reduction) to each molecules from the knowledge database. With this, PathModel computes the MZ of hypothetical molecules and compared them to the MZ given by the user (here 92,1341). And if a match is found then the reaction and the molecule are inferred. This is an example of the analogical reasoning:
Then it is possible to have access to graphic representations of molecules and reactions:
pathmodel_plot -i output_folder
output_folder
├── ...
├── molecules
├── molecule_1.svg
├── molecule_2.svg
├── molecule_3.svg
├── molecule_4.svg
├── molecule_5.svg
├── newmolecules_from_mz
├── Prediction_921341_reduction.svg
├── pathmodel_output.svg
There is a structure inferred by PathModel for the MZ 92.1341:
PathModel creates also a picture showing all the reactions (known reactions in green, inferred reaction variant in blue and blue square for predicted molecules):
Tutorial on iScience Article data (Chondrus crispus sterol and Mycosporine-like Amino Acids pathways)
PathModel contains script to reproduce the experience run in the article: analysis of Chondrus crispus sterol and Mycosporine-like Amino Acids (MAA) pathways.
Input data for sterol pathway are in pathmodel/pathmodel/data/sterol_pwy.lp.
For this pathway, known reactions were extracted from:
- MetaCyc cholesterol biosynthesis (plants) PWY18C3-1.
- MetaCyc cholesterol biosynthesis III (via desmosterol) PWY66-4.
- MetaCyc phytosterol biosynthesis (plants) PWY-2541.
- simplification of multistep C24-C29 demethylation from Sonawane et al. (2016).
The source molecule is the cycloartenol and the goal molecules are: 22-dehydrocholesterol, brassicasterol and sitosterol.
Input data for MAA pathway are in pathmodel/pathmodel/data/MAA_pwy.lp.
For this pathway, known reactions were extracted from:
- MetaCyc shinorine biosynthesis PWY-7751.
- Extended reaction from serine to threonine as proposed in Brawley et al. (2017).
- Reactions hypothesized by Carreto and Carignan (2011).
Two unknown M/Z ratios were given as input for MAA pathway:
- 270,2720
- 302,3117
The source molecule is the sedoheptulose-7-phosphate and the goal molecule is the palythine.
Article data are stored in PathModel code. By calling the 'test' command, you can reproduce PathModel article experience. First run the inference on the sterol and MAA pathways:
pathmodel test -o output_folder
Then, it is possible to create pictures representation of the results:
pathmodel_plot -i output_folder/sterol
pathmodel_plot -i output_folder/MAA
This prototype has been used to analyse to pathways from the red alga Chondrus crispus, the sterol and the Mycopsorine-like Amino-Acids.
pathmodel test -o output_folder
This test command will create an output folder containing the inference results for the sterol and the MAA pathways:
output_folder
├── MAA
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp
├── sterol
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp
Then you can create pictures representation of the results (pathways and molecules) for the sterol pathway:
pathmodel_plot -i output_folder/sterol
output_folder
├── sterol
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp
├── pathmodel_output.svg
├── molecules
├── 22-dehydrocholesterol.svg
├── 24-epicampesterol.svg
├── 24-ethylidenelophenol.svg
├── 24-methyldesmosterol.svg
├── 24-methylenecholesterol.svg
├── 24-methylenecycloartanol.svg
├── 24-methylenelophenol.svg
├── 31-norcycloartanol.svg
├── 31-norcycloartenol.svg
├── 4α,14α-dimethyl-cholesta-8-enol.svg
├── 4α,14α-dimethylcholest-8,24-dien-3β-ol.svg
├── 4α-methyl-5α-cholest-7-en-3β-ol.svg
├── 4α-methyl-5α-cholesta-7,24-dienol.svg
├── 4α-methyl-5α-cholesta-8-en-3-ol.svg
├── 4α-methyl-cholesta-8,14-dienol.svg
├── 4α-methylcholest-8(9),14,24-trien-3β-ol.svg
├── 4α-methylzymosterol.svg
├── 5α-cholesta-7,24-dienol.svg
├── 7-dehydrocholesterol.svg
├── 7-dehydrodesmosterol.svg
├── brassicasterol.svg
├── campesterol.svg
├── cholesterol.svg
├── cycloartanol.svg
├── cycloartenol.svg
├── desmosterol.svg
├── lathosterol.svg
├── sitosterol.svg
├── stigmasterol.svg
├── newmolecules_from_mz
(empty)
In the molecules folder, each input molecules are represented as a svg file.
No M/Z ratio were given as input for the sterol so there is no new molecules from M/Z.
'pathmodel_output.svg' shows the predicted reactions in blue and the predicted molecules in blue (the picture form can change but it contains the same result):
Inferred reactions are listed in 'pathmodel_incremental_inference.tsv', with the step of the incremental mode from the source molecule (cycloartenol) to the goal molecules:
infer_step | new_reaction | reactant | product |
2 | c24_c29_demethylation | "cycloartenol" | "31-norcycloartenol" |
2 | rxn66_28 | "cycloartenol" | "cycloartanol" |
3 | rxn_4282 | "31-norcycloartenol" | "31-norcycloartanol" |
3 | rxn_20436 | "31-norcycloartenol" | "4α,14α-dimethylcholest-8,24-dien-3β-ol" |
4 | rxn_4282 | "4α,14α-dimethylcholest-8,24-dien-3β-ol" | "4α,14α-dimethyl-cholesta-8-enol" |
4 | rxn_20438 | "4α,14α-dimethylcholest-8,24-dien-3β-ol" | "4α-methylcholest-8(9),14,24-trien-3β-ol" |
5 | rxn_4282 | "4α-methylcholest-8(9),14,24-trien-3β-ol" | "4α-methyl-cholesta-8,14-dienol" |
5 | rxn_20439 | "4α-methylcholest-8(9),14,24-trien-3β-ol" | "4α-methylzymosterol" |
6 | rxn_4286 | "4α-methylzymosterol" | "4α-methyl-5α-cholesta-7,24-dienol" |
6 | rxn_4282 | "4α-methylzymosterol" | "4α-methyl-5α-cholesta-8-en-3-ol" |
7 | rxn_4282 | "4α-methyl-5α-cholesta-7,24-dienol" | "4α-methyl-5α-cholest-7-en-3β-ol" |
7 | c24_c28_demethylation | "4α-methyl-5α-cholesta-7,24-dienol" | "5α-cholesta-7,24-dienol" |
8 | rxn_1_14_21_6 | "5α-cholesta-7,24-dienol" | "7-dehydrodesmosterol" |
8 | rxn_4282 | "5α-cholesta-7,24-dienol" | "lathosterol" |
9 | rxn_4282 | "7-dehydrodesmosterol" | "7-dehydrocholesterol" |
9 | rxn66_323 | "7-dehydrodesmosterol" | "desmosterol" |
10 | rxn_4021 | "desmosterol" | "24-methylenecholesterol" |
10 | rxn_4282 | "desmosterol" | "cholesterol" |
11 | c22_desaturation | "cholesterol" | "22-dehydrocholesterol" |
12 | rxn_2_1_1_143 | "campesterol" | "sitosterol" |
And the pictures for the MAA pathway are created with:
pathmodel_plot -i output_folder/MAA
output_folder
├── MAA
├── data_pathmodel.lp
├── pathmodel_data_transformations.tsv
├── pathmodel_incremental_inference.tsv
├── pathmodel_output.lp
├── pathmodel_output.svg
├── molecules
├── asterina-330.svg
├── mycosporin-glycine.svg
├── palythene.svg
├── palythine.svg
├── palythinol.svg
├── porphyra-334.svg
├── R-4-deoxygadusol.svg
├── R-demethyl-4-deoxygadusol.svg
├── S-4-deoxygadusol.svg
├── S-demethyl-4-deoxygadusol.svg
├── sedoheptulose-7-phosphate.svg
├── shinorine.svg
├── z-palythenic acid.svg
├── newmolecules_from_mz
├── Prediction_2702720_dehydration.svg
├── Prediction_3023117_decarboxylation_1.svg
├── Prediction_3023117_decarboxylation_2.svg
pathmodel_output.svg contains the pathway with the known reactions (green), the reactions inferred by PathModel (blue) and the metabolites inferred (blue).
Inferred reactions are listed in 'pathmodel_incremental_inference.tsv', with the step of the incremental mode from the source molecule (sedoheptulose-7-phosphate) to the goal molecule (palythine).
Incremental step 2 is not showed because it is already known (between 'sedoheptulose-7-phosphate' and 'R-demethyl-4-deoxygadusol') and no new predictions have been inferred.
infer_step | new_reaction | reactant | product |
3 | rxn_17896 | "R-demethyl-4-deoxygadusol" | "R-4-deoxygadusol" |
3 | rxn_17370 | "R-demethyl-4-deoxygadusol" | "S-demethyl-4-deoxygadusol" |
4 | rxn_17895 | "R-4-deoxygadusol" | "S-4-deoxygadusol" |
4 | rxn_17366 | "S-demethyl-4-deoxygadusol" | "S-4-deoxygadusol" |
7 | dehydration | "Prediction_3023117_decarboxylation_1" | "palythene" |
7 | dehydration | "Prediction_3023117_decarboxylation_2" | "palythene" |
7 | decarboxylation_2 | "porphyra-334" | "Prediction_3023117_decarboxylation_1" |
7 | decarboxylation_2 | "porphyra-334" | "Prediction_3023117_decarboxylation_2" |
7 | decarboxylation_1 | "shinorine" | "asterina-330" |
8 | dehydration | "asterina-330" | "Prediction_2702720_dehydration" |
The structures of the predicted molecules from M/Z can be found in newmolecules_from_mz:
- Prediction_2702720_dehydration corresponds to MAA1 of the article:
- Prediction_3023117_decarboxylation_1 and Prediction_3023117_decarboxylation_2 (which are the same molecule) correspond to MAA2.
This molecule has been identified as the Aplysiapalythine A found in Aplysia californica [Kamio2011]. Furthermore, Aplysiapalythine A has been detected in red algae (the group in which Chondrus crispus is classified) [Orfanoudaki2019].
In the GitHub repository (pathmodel/pathmodel/data/), there are 4 data files:
MAA_pwy.lp
: Mycosporine Amino-Acids Like pathways according to data from Chondrus crispus (Belcour et al, 2020).sterol_pwy.lp
: Sterol pathways according to data from Chondrus crispus (Belcour et al, 2020).brown_sterols_pwy.lp
: Sterol pathways in Brown algae (Girard et al., 2021).mozukulins_pwy.lp
: Mozukulins and sterol pathway in the brown alga Cladosiphon okamuranus (Girard et al., 2021).
Arnaud Belcour, Jean Girard, Méziane Aite, Ludovic Delage, Camille Trottier, Charlotte Marteau, Cédric Leroux, Simon M. Dittami, Pierre Sauleau, Erwan Corre, Jacques Nicolas, Catherine Boyen, Catherine Leblanc, Jonas Collén, Anne Siegel, Gabriel V. Markov. (2020). Inferring biochemical reactions and metabolite structures to understand metabolic pathway drift, iScience, 2020, 23(2): 100849, https://doi.org/10.1016/j.isci.2020.100849.
[Kamio2011] | Kamio, M., Kicklighter, C.E., Nguyen, L., Germann, M.W. and Derby, C.D. (2011). Isolation and Structural Elucidation of Novel Mycosporine‐Like Amino Acids as Alarm Cues in the Defensive Ink Secretion of the Sea Hare Aplysia californica. Helvetica Chimica Acta, 94: 1012-1018. doi:10.1002/hlca.201100117. |
[Orfanoudaki2019] | Orfanoudaki, M., Hartmann, A., Karsten, U. and Ganzera, M. (2019). Chemical profiling of mycosporine‐like amino acids in twenty‐three red algal species. Journal of Phycology, 55: 393-403. doi:10.1111/jpy.12827. |