Propinquity is a make-based pipeline for constructing a synthetic tree of life. See https://peerj.com/articles/3058/ for a description and comparison of its performance to the previous tool used to build the Open Tree of Life's summary tree.
It takes as input a collection of phylogenetic trees from the Open Tree of Life datastore and a local copy of the Open Tree of Life Taxonomy. See the collections documentation for input on putting trees into collections.
If PROPINQUITY_OUT_DIR
is in your environment when you build with
propinquity, then that directory will be used as an output for
the synthetic tree and all of the other artifacts.
If that option is used, then the Makefile
will expect the output
directory to contain a file called config
that holds your build-specific
configuration settings (see below)
There are 3 small scripts that let you accomplish some common tasks without modifying your environment. These scripts take two arguments: a configuration filepath and an output file path. They copy the configuration file into the correct spot in the output directory, and then trigger the build operation with the appropriate output directory in the env. These scripts are:
bin/build_at_dir.sh cfg out
to call thebin/opentree_rebuild_from_latest.sh
script (which pulls the latest studies and collections from GitHub and then does a complete build)bin/make_at_dir.sh cfg out
which just runs a call tomake
after setting up the env, andbin/clean_at_dir.sh cfg out
which cleans theout
directory
Before you set up other prequisite software, you'll need to initialize
the global config file. The global config file should be placed in your home
directory and called .opentree
.
The global config file contains sections, each of which contain a list of
variables (See INI file format). The
global config file should contain an [opentree]
section, as in the
file config.global.example
:
[opentree]
home = /home/USER/OpenTree
peyotl = %(home)s/peyotl
phylesystem = %(home)s/phylesystem
ott = %(home)s/ott/ott2.9draft12/
collections = %(home)s/collections
If you do install multiple packages under the same parent directory
(as above),
you can define a variable such as home
to point to the parent directory.
Then you can reference that directory by writing %(home)s
in other
variables in the same section.
You can initialize the global config file by editing a copy of
globalconfig.example
and then moving it to your home directory:
$ cp config.global.example config.global
... edit config.global here ...
$ mv config.global ~/.opentree
Before running propinquity, you'll also need to initialize a synthesis
config
file that is expected to exist in your PROPINQUITY_OUT_DIR. The
default PROPINQUITY_OUT_DIR is the current directory. So the default location
is simply ./config
.
The synthesis config file does not need to contain the [opentree] section. However, variables set in the synthesis config file will override any values set in the global config file.
If you do NOT want to build to an output directory (see above), then you'll need the
configuration file to be called config
in the top or your propinquity directory.
You can do this by copying an example config file:
$ cp config.example config
If you are using an output file, you can simply use:
$ cp config.example "${PROPINQUITY_OUT_DIR}/config"
(if you are calling make from your command line) or
$ cp config.example myconfig
(if you going to use one of the bin/build_at_dir.sh myconfig ${PROPINQUITY_OUT_DIR}
invocations
mentioned above).
-
A local version of the OTT taxonomy. See http://files.opentreeoflife.org/ott/ with a config entry in
~/.opentree
pointing to it in the[opentree]
section.[opentree] ... ott = %(home)s/ott/ott2.9draft12 ...
-
peyotl should be downloaded and installed with a config entry in
~/.opentree
pointing to it:[opentree] ... peyotl = %(home)s/peyotl ...
Note that (as of 2015-Oct-10) the master branch of peyotl on mtholder's GitHub page (not the Open Tree of Life group). See the link above.
-
A local copy of the phylesystem-1 repo with a config entry in
~/.opentree
pointing to the parent of the shards directory[opentree] ... phylesystem = %(home)s/phylesystem ...
The actual phylesystem-1
repo cloned from git should be in a
directory {opentree.phylesystem}/shards/phylesystem-1
, where
{opentree.phylesystem}
is the directory referred to above.
-
A local copy of the collections-1 repo with a config entry in
~/.opentree
pointing to the parent of the shards directory[opentree] ... collections = %(home)s/collections ...
The actual collections-1
repo cloned from git should be in a
directory {opentree.collections}/shards/collections-1
, where
{opentree.collections}
is the directory referred to above.
See the instructions for installing otcetera.
A short version (which might work) is to do:
$ cd otcetera
$ ./bootstrap.sh
$ ./configure --prefix=$HOME/local
$ make install
Now set your PATH to include $HOME/local/bin.
-
md5sum
Not installed by default on OS X. If using homebrew, brew install md5sha1sum
$ cat config
[taxonomy]
cleaning_flags = major_rank_conflict,major_rank_conflict_inherited,environmental,unclassified_inherited,unclassified,viral,barren,not_otu,incertae_sedis,incertae_sedis_inherited,hidden,unplaced,unplaced_inherited,was_container,inconsistent,hybrid,merged,extinct
[synthesis]
collections = opentreeoflife/plants opentreeoflife/metazoa opentreeoflife/fungi opentreeoflife/safe-microbes
root_ott_id = 93302
$ cat ~/.opentree
[opentree]
home = /home/USER/OpenTree
peyotl = %(home)s/peyotl
phylesystem = %(home)s/phylesystem
ott = %(home)s/ott/ott2.9draft12/
collections = %(home)s/collections
$ bin/config_checker.py opentree.peyotl config ~/.opentree
/home/USER/OpenTree/peyotl
$ ls $(bin/config_checker.py opentree.peyotl config ~/.opentree)
...
peyotl
...
setup.py
...
$ bin/config_checker.py opentree.ott config ~/.opentree
/home/USER/OpenTree/ott/ott2.9draft12/
$ ls $(bin/config_checker.py opentree.ott config ~/.opentree)
...
taxonomy.tsv
...
$ bin/config_checker.py opentree.phylesystem config ~/.opentree
/home/USER/OpenTree/phylesystem
$ ls $(bin/config_checker.py opentree.phylesystem config ~/.opentree)/shards/phylesystem-1
next_study_id.json README.md study/
After you have installed the software and tweaked the setting in your config
file as described above, you may run synthesis just by typing
$ make
$ make check
If you have Chameleon installed, then you can run
$ make && make check && make html
to create a series of index.html
files in the output directories that document and
summarize the outputs produced by the pipeline.
These html files are created using templates and information from JSON files which
are produced either by the make
run or using information gleaned from existing
outputs.
In the latter case, the calculated summaries used in the templating step are also
stored in a corresponding index.json
in the same directory as the index.html
to make it easy for you to get the data needed to summarize the outputs in a
different manner.
The pipeline produces artifacts at each step of the pipeline. Click on any link below to see more information about the output files in these directories.
- phylo_induced_taxonomy
- phylo_snapshot
- cleaned_ott
- phylo_input
- phylo_snapshot
- cleaned_phylo
- exemplified_phylo
- subproblems
- subproblem_solutions
- grafted_solution
- labelled_supertree
- annotated_supertree
The final output of synthesis consists of
- labelled_supertree/labelled_supertre.tre
- annotated_supertree/annotations.json
For a tree with only tips that occur in study trees:
- grafted_solution/grafted_solution.tre
For versions of the above trees that include both OTT ids and taxon names,
- labelled_supertree/labelled_supertree_ottnames.tre
- grafted_supertree/grafted_supertree_ottnames.tre
These optional artifacts can be built with the command
make extra
.
A cartoon of the pipeline with peyotl tools in pink and otcetera tools in blue. Input from other components of Open Tree of Life are ovals. Settings of the propinquity config file are in diamonds.
The products of this repo are contents of directories, and the rectangles show these directories. Don't take this too literally. In some cases, there are multiple targets created with different tools in a subdirectory. In these cases this sketch just shows the most interesting tool.
cd exemplified_phylo
otc-displayed-stats ../exemplified_phylo/regraft_cleaned_ott.tre ../labelled_supertree/labelled_supertree.tre $(cat nonempty_trees.txt)
cd exemplified_phylo
otc-displayed-stats ../exemplified_phylo/regraft_cleaned_ott.tre draftversion4.tre $(cat nonempty_trees.txt)
-
Follow the setup steps mentioned above to install the prerequisites
-
It is helpful to have a
~/.opentree
file that has only local filepaths for the prerequisites. for instance on MTH's machine this file contains just:
[opentree] home = /home/opentree/Documents peyotl = %(home)s/peyotl phylesystem = %(home)s/phylesystem ott = %(home)s/ott/ott2.9draft12/ collections = %(home)s/phylesystem
-
Edit
config.opentree.synth
to increment thesynth_id
to the next version. -
git commit -m "version X.XX of synth tree" config.opentree.synth
will create a git commit of the configuration to assist in provenance tracking of the build. -
To build the tree at directory
../opentreeX.XX
use:
./bin/build_at_dir.sh config.opentree.synth ../opentreeX.XX
That script uses bin/opentree_rebuild_from_latest.sh
to configure the environment
to run bin/rebuild_from_latest.sh
without you having
to modify your shell's environment.
To run the monophyly tests that are a part of the germinator repo and create the output documentation with links to those tests, you need to have 2 environmental variables in your env when you build the tree.
MONOPHYLY_TEST_CSV_FILE
should be the absolute path to themonophyly.csv
on your local machine.MONOPHYLY_TEST_SOURCE_NAME
is the URL of file on github (this is just used in making links in the documentation generation)
For example:
MONOPHYLY_TEST_CSV_FILE=/home/user/germinator/taxa/monophyly.csv
MONOPHYLY_TEST_SOURCE_NAME=https://github.com/OpenTreeOfLife/germinator/blob/master/taxa/monophyly.csv