$$\ $$\
$$ | $$ |
$$$$$$$ | $$$$$$$\ $$\ $$\ $$$$$$$\
$$ __$$ | $$ __$$\ $$ | $$ | $$ _____|
$$ / $$ | $$ | $$ | $$ | $$ | \$$$$$$\
$$ | $$ | $$ | $$ | $$ | $$ | \____$$\
\$$$$$$$ | $$$$$$$ | \$$$$$$ | $$$$$$$ |
\_______| \_______/ \______/ \_______/
dbus = distributed data bus
It is yet another lightweight versatile databus system that transfer/transform pipeline data between plugins.
dbus works by building a DAG of structured data out of the different plugins: from data input, via filter(optional), to the output.
Similar projects
- logstash
- flume
- nifi
- camel
- beats
- kettle
- zapier
- google cloud dataflow
- canal
- storm
- yahoo pipes (dead)
dbus is not yet a 1.0. We're writing more tests, fixing bugs, working on TODOs.
- mysql binlog dispatcher
- multiple DC kafka mirror
dbus supports powerful and scalable directed graphs of data routing, transformation and system mediation logic.
- Designed for extension
- plugin architecture
- build your own plugins and more
- enables rapid development and effective testing
- Data Provenance
- track dataflow from beginning to end
- visualized dataflow
- rich metrics feed into tsdb
- online manual mediation of the dataflow
- RESTful API
- monitoring with alert
- Distributed Deployment
- shard/balance/auto rebalance
- linear scale
- Delivery Guarantee
- loss tolerant
- high throuput vs low latency
- back pressure
- Robustness
- race condition detected
- edge cases fully covered
- network jitter tested
- dependent components failure tested
- Systemic Quality
- hot reload
- dryrun throughput 1.9M packets/s
- Cluster Support
- modelling borrowed from helix+kafka controller
- currently only leader/standby with sharding, without replica
- easy to write a distributed plugin
To start using dbus, install Go and run go get
:
$ go get -u github.com/funkygao/dbus
Please find sample config files in etc/ directory.
$ $GOPATH/dbusd -conf $myfile
dbus uses zookeeper for sharding/balance/election.
More plugins are listed under dbus-plugin.
- MysqlbinlogInput
- KafkaInput
- MockInput
- StreamInput
- MysqlbinlogFilter
- MockFilter
- KafkaOutput
- ESOutput
- MockOutput
- StreamOutput
- KafkaOutput async mode with batch=1024/500ms, ack=WaitForAll
- KafkaOutput retry=3, retry.backoff=350ms
- Mysql binlog positioner commit every 1s, channal buffer 100
-
is it totally data loss tolerant?
if the binlog exceeds 1MB, it will be discarded(lost)
- no Delivery Guarantee
- no Data Provenance
- no integration with kafka
- only hot standby deployment mode, we need sharding load
- dbus is a dataflow engine, while canal only support mysql binlog pipeline
- logstash has better ecosystem
- dbus is cluster aware, provides delivery guarantee, data provenance
Yes.
For example, 3 participants with 1 being the leader. Then 1 is network partitioned and zk session expires, [2, 3] found this event and re-elect 2 as new leader. Before 1 regain new zk session, [1] and [2] are leaders both. If [1] and [2] both found resources changes, they will both rebalance the cluster.
dbus uses epoch to solve this issue.
-
zookeeper crash
dbus continues to work, but Ack will not be able to persist