DAGrunner serves as a Directed Acyclic Graph (DAG) runner, primarily designed to ensure a clear distinction between a project's graph definition (typically in native networkx format) and its execution method. In essence, DAGrunner offers various schedulers for executing the graph, but it firmly separates these operational concepts from the scientific configuration or recipe, i.e., the graph itself. Consequently, while DAGrunner currently provides convenient scheduling options, it remains adaptable to future changes or alternative solutions, ensuring that the scientific configuration can persist regardless of the technologies or tools employed, whether DAGrunner is utilized or not.
DAGrunner takes advantage of the native markdown rendering support provided by github. To that end, all documentation of DAGrunner resides in markdown files.
See DAGrunner API
(C) Crown Copyright, Met Office. All rights reserved.
This file is part of 'DAGrunner' and is released under the BSD 3-Clause license. See LICENSE in the root of the repository for full licensing details.
The package is pip installable.
pip install .
(uninstall: pip uninstall dagrunner
)
This will also make an executable script available to the PATH: dagrunner-execute-graph
usage: dagrunner-execute-graph [-h] [--scheduler SCHEDULER] [--num-workers NUM_WORKERS] [--profiler-filepath PROFILER_FILEPATH] [--dry-run] [--verbose] networkx-graph
see dagrunner-execute-graph --help
for more information.
See docs/demo.ipynb
DAGrunner concerns itself with graph execution and does not strictly require processing modules (plugins) to take any particular form. That is, you may or may not choose to use or subclass the plugins provided by DAGrunner. However, for convenience, DAGrunner does define some plugins which fall into two broad categories, some abstract and some for use as they are.
See here for more information.
The dagrunner-execute-graph
script exposes a scheduler argument for specifying our preferred scheduler. DAGRunner provides a layer of abstraction for schedulers. This enables a range of schedulers to be selected as per requirement.
These range from dask, ray to our own in-house multiprocessing asynchronous scheduler (built upon the multiprocessing library). See command help for further details.
DAGrunner provides a script dagrunner-logger
for running a TCP server. This enables logging to function across the network. Additionally, it will write logs to an sqlite database to aid in realtime monitoring from external tools.
See logger for more information.