Skip to content

leokolln/cpp-taskflow

 
 

Repository files navigation

Cpp-Taskflow

Codacy Badge Linux Build Status Windows Build status Wiki Cite

A fast C++ header-only library to help you quickly write parallel programs with complex task dependencies

Why Cpp-Taskflow?

Cpp-Taskflow is by far faster, more expressive, and easier for drop-in integration than existing task programming libraries such as OpenMP Tasking and Intel TBB FlowGraph in handling complex parallel workloads.

Cpp-Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient work-stealing scheduler to optimize your multithreaded performance.

Without Cpp-Taskflow With Cpp-Taskflow

Cpp-Taskflow has a unified interface for both static tasking and dynamic tasking, allowing users to quickly master our parallel task programming model in a natural idiom.

Static Tasking Dynamic Tasking

Cpp-Taskflow supports conditional tasking for you to implement cyclic and dynamic control flows that are otherwise difficult to do with existing task programming frameworks.

Conditional Tasking

Cpp-Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize.

Graph Composition

Cpp-Taskflow let you easily monitor the thread activities and analyze their programs' performance through chrome://tracing.

We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out Who is Using Cpp-Taskflow and what our users say:

  • "Cpp-Taskflow is the cleanest Task API I've ever seen." damienhocking
  • "Cpp-Taskflow has a very simple and elegant tasking interface. The performance also scales very well." totalgee
  • "Cpp-Taskflow lets me handle parallel processing in a smart way." Hayabusa
  • "Best poster award for open-source parallel programming library." Cpp Conference 2018
  • "Second Prize of Open-source Software Competition." ACM Multimedia Conference 2019

See a quick presentation and visit the documentation to learn more about Cpp-Taskflow. Technical details can be referred to our IEEE IPDPS19 paper.

Table of Contents

Get Started with Cpp-Taskflow

The following example simple.cpp shows the basic Cpp-Taskflow API you need in most applications.

#include <taskflow/taskflow.hpp>  // Cpp-Taskflow is header-only

int main(){
  
  tf::Executor executor;
  tf::Taskflow taskflow;

  auto [A, B, C, D] = taskflow.emplace(
    [] () { std::cout << "TaskA\n"; },               //  task dependency graph
    [] () { std::cout << "TaskB\n"; },               // 
    [] () { std::cout << "TaskC\n"; },               //          +---+          
    [] () { std::cout << "TaskD\n"; }                //    +---->| B |-----+   
  );                                                 //    |     +---+     |
                                                     //  +---+           +-v-+ 
  A.precede(B);  // A runs before B                  //  | A |           | D | 
  A.precede(C);  // A runs before C                  //  +---+           +-^-+ 
  B.precede(D);  // B runs before D                  //    |     +---+     |    
  C.precede(D);  // C runs before D                  //    +---->| C |-----+    
                                                     //          +---+          
  executor.run(taskflow).wait();

  return 0;
}

Compile and run the code with the following commands:

~$ g++ simple.cpp -std=c++1z -O2 -lpthread -o simple
~$ ./simple
TaskA
TaskC  <-- concurrent with TaskB
TaskB  <-- concurrent with TaskC
TaskD

Create a Taskflow Application

Cpp-Taskflow defines a very expressive API to create task dependency graphs. Most applications are developed through the following three steps.

Step 1: Create a Taskflow

Create a taskflow object to build a task dependency graph:

tf::Taskflow taskflow;

A task is a callable object for which std::invoke is applicable. Use the method emplace to create a task:

tf::Task A = taskflow.emplace([](){ std::cout << "Task A\n"; });

You can create multiple tasks at one time:

auto [A, B, C, D] = taskflow.emplace(
  [] () { std::cout << "Task A\n"; },
  [] () { std::cout << "Task B\n"; },
  [] () { std::cout << "Task C\n"; },
  [] () { std::cout << "Task D\n"; }
);

Step 2: Define Task Dependencies

You can add dependency links between tasks to enforce one task to run before or after another.

A.precede(B);  // A runs before B.

Step 3: Execute a Taskflow

To execute a taskflow, you need to create an executor. An executor manages a set of worker threads to execute a taskflow through an efficient work-stealing algorithm.

tf::Executor executor;

The executor provides a rich set of methods to run a taskflow. You can run a taskflow multiple times, or until a stopping criteria is met. These methods are non-blocking with a std::future return to let you query the execution status. Executor is thread-safe.

executor.run(taskflow);       // runs the taskflow once
executor.run_n(taskflow, 4);  // runs the taskflow four times

// keeps running the taskflow until the predicate becomes true
executor.run_until(taskflow, [counter=4](){ return --counter == 0; } );

You can call wait_for_all to block the executor until all associated taskflows complete.

executor.wait_for_all();  // block until all associated tasks finish

Notice that the executor does not own any taskflow. It is your responsibility to keep a taskflow alive during its execution, or it can result in undefined behavior. In most applications, you need only one executor to run multiple taskflows each representing a specific part of your parallel decomposition.

Dynamic Tasking

Another powerful feature of Taskflow is dynamic tasking. Dynamic tasks are those tasks created during the execution of a taskflow. These tasks are spawned by a parent task and are grouped together to a subflow graph. To create a subflow for dynamic tasking, emplace a callable with one argument of type tf::Subflow.

// create three regular tasks
tf::Task A = tf.emplace([](){}).name("A");
tf::Task C = tf.emplace([](){}).name("C");
tf::Task D = tf.emplace([](){}).name("D");

// create a subflow graph (dynamic tasking)
tf::Task B = tf.emplace([] (tf::Subflow& subflow) {
  tf::Task B1 = subflow.emplace([](){}).name("B1");
  tf::Task B2 = subflow.emplace([](){}).name("B2");
  tf::Task B3 = subflow.emplace([](){}).name("B3");
  B1.precede(B3);
  B2.precede(B3);
}).name("B");
            
A.precede(B);  // B runs after A 
A.precede(C);  // C runs after A 
B.precede(D);  // D runs after B 
C.precede(D);  // D runs after C 

By default, a subflow graph joins its parent node. This ensures a subflow graph finishes before the successors of its parent task. You can disable this feature by calling subflow.detach(). For example, detaching the above subflow will result in the following execution flow:

// create a "detached" subflow graph (dynamic tasking)
tf::Task B = tf.emplace([] (tf::Subflow& subflow) {
  tf::Task B1 = subflow.emplace([](){}).name("B1");
  tf::Task B2 = subflow.emplace([](){}).name("B2");
  tf::Task B3 = subflow.emplace([](){}).name("B3");
  B1.precede(B3);
  B2.precede(B3);

  // detach the subflow to form a parallel execution line
  subflow.detach();
}).name("B");

A subflow can be nested or recursive. You can create another subflow from the execution of a subflow and so on.

tf::Task A = tf.emplace([] (tf::Subflow& sbf) {
  std::cout << "A spawns A1 & subflow A2\n";
  tf::Task A1 = sbf.emplace([] () { 
    std::cout << "subtask A1\n"; 
  }).name("A1");

  tf::Task A2 = sbf.emplace([] (tf::Subflow& sbf2) {
    std::cout << "A2 spawns A2_1 & A2_2\n";
    tf::Task A2_1 = sbf2.emplace([] () { 
      std::cout << "subtask A2_1\n"; 
    }).name("A2_1");

    tf::Task A2_2 = sbf2.emplace([] () { 
      std::cout << "subtask A2_2\n"; 
    }).name("A2_2");
    A2_1.precede(A2_2);
  }).name("A2");

  A1.precede(A2);
}).name("A");

Conditional Tasking

Taskflow supports conditional tasking for users to implement dynamic and cyclic control flows. You can create highly versatile and efficient parallel patterns through condition tasks.

Step 1: Create a Condition Task

A condition task evalutes a set of instructions and returns an integer index of the next immediate successor to execute. The index is defined with respect to the order of its successor construction.

tf::Task init = tf.emplace([](){ }).name("init");
tf::Task stop = tf.emplace([](){ }).name("stop");

// creates a condition task that returns 0 or 1
tf::Task cond = tf.emplace([](){
  std::cout << "flipping a coin\n";
  return rand() % 2;
}).name("cond");

// creates a feedback loop
init.precede(cond);
cond.precede(cond, stop);  // cond--0-->cond, cond--1-->stop

executor.run(tf).wait();

If the return value from cond is 0, it loops back to itself, or otherwise to stop. Cpp-Taskflow terms the preceding link from a condition task a weak dependency (dashed lines above). Others are strong depedency (solid lines above).

Step 2: Scheduling Rules for Condition Tasks

When you submit a taskflow to an executor, the scheduler starts with tasks of zero dependency (both weak and strong dependencies) and continues to execute successive tasks whenever strong dependencies are met. However, the scheduler skips this rule for a condition task and jumps directly to its successor indexed by the return value.

It is users' responsibility to ensure a taskflow is properly conditioned. Top things to avoid include no source tasks to start with and task race. The figure shows common pitfalls and their remedies. In the risky scenario, task X may not be raced if P and M is exclusively branching to X.

A good practice for avoiding mistakes of conditional tasking is to infer the execution flow of your graphs based on our scheduling rules. Make sure there is no task race.

Taskflow Composition

A powerful feature of tf::Taskflow is composability. You can create multiple task graphs from different parts of your workload and use them to compose a large graph through the composed_of method.

tf::Taskflow f1, f2;

auto [f1A, f1B] = f1.emplace( 
  []() { std::cout << "Task f1A\n"; },
  []() { std::cout << "Task f1B\n"; }
);
auto [f2A, f2B, f2C] = f2.emplace( 
  []() { std::cout << "Task f2A\n"; },
  []() { std::cout << "Task f2B\n"; },
  []() { std::cout << "Task f2C\n"; }
);
auto f1_module_task = f2.composed_of(f1);

f2A.precede(f1_module_task);
f2B.precede(f1_module_task);
f1_module_task.precede(f2C);

Similarly, composed_of returns a task handle and you can use precede to create dependencies. You can compose a taskflow from multiple taskflows and use the result to compose a larger taskflow and so on.

Visualize a Taskflow Graph

You can dump a taskflow in GraphViz format using the method dump.

tf::Taskflow taskflow;

tf::Task A = taskflow.emplace([] () {}).name("A");
tf::Task B = taskflow.emplace([] () {}).name("B");
tf::Task C = taskflow.emplace([] () {}).name("C");
tf::Task D = taskflow.emplace([] () {}).name("D");
tf::Task E = taskflow.emplace([] () {}).name("E");

A.precede(B, C, E); 
C.precede(D);
B.precede(D, E); 

taskflow.dump(std::cout);

There are a number of free GraphViz tools you could find online to visualize your Taskflow graph.

// Taskflow with five tasks and six dependencies
digraph Taskflow {
  rankdir="TB"
  "A" -> "B"
  "A" -> "C"
  "A" -> "E"
  "B" -> "D"
  "B" -> "E"
  "C" -> "D"
}

When you have dynamic tasks (subflows), you cannot simply use the dump method because it displays only the static portion. Instead, you need to execute the graph first to spawn dynamic tasks.

tf::Executor executor;
tf::Taskflow taskflow;

tf::Task A = taskflow.emplace([](){}).name("A");

// create a subflow of two tasks B1->B2
tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) {
  tf::Task B1 = subflow.emplace([](){}).name("B1");
  tf::Task B2 = subflow.emplace([](){}).name("B2");
  B1.precede(B2);
}).name("B");

A.precede(B);

executor.run(tf).wait();  // run the taskflow to spawn subflows
tf.dump(std::cout);       // dump the graph including dynamic tasks

Monitor Thread Activities

Understanding thread activities is very important for performance analysis. Cpp-Taskflow provides a default observer of type tf::ExecutorObserver that lets users observe when a thread starts or stops participating in task scheduling.

auto observer = executor.make_observer<tf::ExecutorObserver>();

When you are running a task dependency graph, the observer will automatically record the start and end timestamps of each executed task. You can dump the entire execution timelines into a JSON file.

executor.run(taskflow1);               // run a task dependency graph 1
executor.run(taskflow2);               // run a task dependency graph 2
executor.wait_for_all();               // block until all tasks finish

std::ofstream ofs("timestamps.json");
observer->dump(ofs);                   // dump the timeline to a JSON file

You can open the chrome browser to visualize the execution timelines through the chrome://tracing developer tool. In the tracing view, click the Load button to read the JSON file. You shall see the tracing graph.

Each task is given a name of i_j where i is the thread id and j is the task number. You can pan or zoom in/out the timeline to get into a detailed view.

API Reference

The official documentation explains a complete list of Cpp-Taskflow API. Here, we highlight commonly used methods.

Taskflow API

The class tf::Taskflow is the main place to create a task dependency graph. The table below summarizes a list of commonly used methods.

Method Argument Return Description
emplace callables tasks creates a task with a given callable(s)
placeholder none task inserts a node without any work; work can be assigned later
parallel_for beg, end, callable, chunk task pair concurrently applies the callable chunk by chunk to the result of dereferencing every iterator in the range
parallel_for beg, end, step, callable, chunk task pair concurrently applies the callable chunk by chunk to an index-based range with a step size
num_workers none size queries the number of working threads in the pool
dump ostream none dumps the taskflow to an output stream in GraphViz format

emplace/placeholder

You can use emplace to create a task from a target callable.

tf::Task task = tf.emplace([] () { std::cout << "my task\n"; });

When a task cannot be determined beforehand, you can create a placeholder and assign the callable later.

tf::Task A = tf.emplace([](){});
tf::Task B = tf.placeholder();
A.precede(B);
B.work([](){ /* do something */ });

parallel_for

The method parallel_for creates a subgraph that applies the callable to each item in the given range of a container.

auto v = {'A', 'B', 'C', 'D'};
auto [S, T] = tf.parallel_for(
  v.begin(),    // iterator to the beginning
  v.end(),      // iterator to the end
  [] (int i) { 
    std::cout << "parallel " << i << '\n';
  }
);
// add dependencies via S and T.

You can specify a chunk size (default one) in the last argument to force a task to include a certain number of items.

auto v = {'A', 'B', 'C', 'D'};
auto [S, T] = tf.parallel_for(
  v.begin(),    // iterator to the beginning
  v.end(),      // iterator to the end
  [] (int i) { 
    std::cout << "AB and CD run in parallel" << '\n';
  },
  2  // at least two items at a time
);

In addition to iterator-based construction, parallel_for has another overload of index-based loop. The first three argument of this overload indicates starting index, ending index (exclusive), and step size.

// [0, 11) with a step size of 2
auto [S, T] = tf.parallel_for(
  0, 11, 2, 
  [] (int i) {
    std::cout << "parallel_for on index " << i << std::endl;
  }, 
  2  // at least two items at a time
);
// will print 0, 2, 4, 6, 8, 10 (three partitions, {0, 2}, {4, 6}, {8, 10})

You can also do opposite direction with negative step size.

// [10, -1) with a step size of -2
auto [S, T] = tf.parallel_for(
  10, -1, 2, 
  [] (int i) {
    std::cout << "parallel_for on index " << i << std::endl;
  }
);
// will print 10, 8, 6, 4, 2, 0

Task API

Each time you create a task, the taskflow object adds a node to the present task dependency graph and return a task handle to you. A task handle is a lightweight object that defines a set of methods for users to access and modify the attributes of the associated task. The table below summarizes a list of commonly used methods.

Method Argument Return Description
name string self assigns a human-readable name to the task
work callable self assigns a work of a callable object to the task
precede task list self enables this task to run before the given tasks
succeed task list self enables this task to run after the given tasks
num_dependents none size returns the number of dependents (inputs) of this task
num_successors none size returns the number of successors (outputs) of this task
empty none bool returns true if the task points to a graph node or false otherwise
has_work none bool returns true if the task points to a graph node with a callable assigned

name

The method name lets you assign a human-readable string to a task.

A.name("my name is A");

work

The method work lets you assign a callable to a task.

A.work([] () { std::cout << "hello world!"; });

precede

The method precede lets you add a preceding link from self to other tasks.

// A runs before B, C, D, and E
A.precede(B, C, D, E);

succeed

The method succeed lets you add a preceding link from other tasks to self.

// A runs after B, C, D, and E
A.succeed(B, C, D, E);

empty/has_work

A task is empty is it is not associated with any graphs.

tf::Task task;  // assert(task.empty());

A placeholder task is associated with a graph node but has no work assigned yet.

tf::Task task = taskflow.placeholder();  // assert(!task.has_work());

Executor API

The class tf::Executor is used for execution of one or multiple taskflow objects. The table below summarizes a list of commonly used methods.

Method Argument Return Description
Executor N none constructs an executor with N worker threads
run taskflow future runs the taskflow once
run_n taskflow, N future runs the taskflow N times
run_until taskflow, binary predicate future keeps running the taskflow until the predicate becomes true
wait_for_all none none blocks until all running tasks finish
make_observer arguments to forward to user-derived constructor pointer to the observer creates an observer to monitor the thread activities of the executor

Executor

The constructor of tf::Executor takes an unsigned non-zero integer to initialize the executor with N worker threads.

tf::Executor executor(8);  // create an executor of 8 worker threads

The default value uses std::thread::hardware_concurrency to decide the number of worker threads.

run/run_n/run_until

The run series are non-blocking call to execute a taskflow graph. Issuing multiple runs on the same taskflow will automatically synchronize to a sequential chain of executions.

executor.run(taskflow);             // runs a graph once
executor.run_n(taskflow, 5);        // runs a graph five times
executor.run_n(taskflow, my_pred);  // keeps running until the my_pred becomes true
executor.wait_for_all();            // blocks until all tasks finish

The first run finishes before the second run, and the second run finishes before the third run.

System Requirements

To use the latest Cpp-Taskflow, you only need a C++14 compiler.

  • GNU C++ Compiler at least v5.0 with -std=c++14
  • Clang C++ Compiler at least v4.0 with -std=c++14
  • Microsoft Visual Studio Version 15.7 (MSVC++ 19.14); see vcpkg guide

See the C++ compiler support status.

Compile Unit Tests, Examples, and Benchmarks

Cpp-Taskflow uses CMake to build examples and unit tests. We recommend using out-of-source build.

~$ cmake --version  # must be at least 3.9 or higher
~$ mkdir build
~$ cd build
~$ cmake ../
~$ make 

Unit Tests

Cpp-Taskflow uses Doctest for unit tests.

~$ ./unittest/taskflow

Alternatively, you can use CMake's testing framework to run the unittest.

~$ cd build
~$ make test

Examples

The folder example/ contains several examples and is a great place to learn to use Cpp-Taskflow.

Example Description
simple.cpp uses basic task building blocks to create a trivial taskflow graph
debug.cpp inspects a taskflow through the dump method
parallel_for.cpp parallelizes a for loop with unbalanced workload
reduce.cpp performs reduce operations over linear containers
subflow.cpp demonstrates how to create a subflow graph that spawns three dynamic tasks
run_variants.cpp shows multiple ways to run a taskflow graph
composition.cpp demonstrates the decomposable interface of taskflow
observer.cpp demonstrates how to monitor the thread activities in scheduling and running tasks
condition.cpp creates a conditional tasking graph with a feedback loop control flow

Benchmarks

Please visit benchmarks to learn to compile the benchmarks.

Get Involved

Who is Using Cpp-Taskflow?

Cpp-Taskflow is being used in both industry and academic projects to scale up existing workloads that incorporate complex task dependencies.

  • OpenTimer: A High-performance Timing Analysis Tool for Very Large Scale Integration (VLSI) Systems
  • DtCraft: A General-purpose Distributed Programming Systems using Data-parallel Streams
  • Firestorm: Fighting Game Engine with Asynchronous Resource Loaders (developed by ForgeMistress)
  • Shiva: An extensible engine via an entity component system through scripts, DLLs, and header-only (C++)
  • PID Framework: A Global Development Methodology Supported by a CMake API and Dedicated C++ Projects
  • NovusCore: An emulating project for World of Warraft (Wrath of the Lich King 3.3.5a 12340 client build)
  • SA-PCB: Annealing-based Printed Circuit Board (PCB) Placement Tool
  • LPMP: A C++ framework for developing scalable Lagrangian decomposition solvers for discrete optimization problems
  • Heteroflow: A Modern C++ Parallel CPU-GPU Task Programming Library
  • OpenPhySyn: A plugin-based physical synthesis optimization kit as part of the OpenRoad flow

More...

Contributors

Cpp-Taskflow is being actively developed and contributed by the these people. Meanwhile, we appreciate the support from many organizations for our developments.

License

Cpp-Taskflow is licensed under the MIT License.


About

A Modern C++ Parallel Task Programming Library

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 79.3%
  • C 17.1%
  • CMake 2.9%
  • Other 0.7%