-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding scalability to hazelcast when exporting records from zeebe #261
Comments
Good question. 🤔 Currently, the exporter can't access this kind of Zeebe configuration. Only during the exporting, the exporter can read the partition id from the record. One solution could be to initialize the ring-buffer lazy. When reading the first record, initialize the ring-buffer for this partition. Another question is, how does the consumer know about the existing partitions to subscribe from?
|
We started the code changes and deployed some basic code that creates multiple ring buffers (as many as the zeebe partitions). The cluster configuration is 2 zeebe-gateways, 8 brokers, 5 hazelcast instances, 1 zeeqs, 1 postgres. The number of hazelcast instance restarts have reduced but we still see over-utilised and under-utilised hazelcast instances. Please find the resource utilisation screenshots attached. Quick question - @saig0 - Since ringbuffer is a replicated data structure, should the memory usage of all hazelcast instances be same? Some instances show the memory usage as 300 MB while the overutilised ones show it as 6 GB. |
@saig0 - We progressed and took a separate track. We exported the data to different ring buffers (1:1 relationship with partitions) and added zeeqs replicas. Each zeeqs instance reads from one ring buffer. We had make changes in zeeqs to support this. This improved the import speed but not much. When running the camunda 8 benchmarking tool we observed that the import speed is 0.05x for 5 partitions, 5 ring buffers and 5 zeeqs instances. As the number of partitions increase, we have to add those many zeeqs instances which doesn't sound great. The main problem looks like With slow import speed, zeeqs takes more than a day or two to sync up data from hazelcast which leads to other problems like hazelcast crashes etc. Any inputs on how can we improve the zeeqs import speed? Thanks. |
I can't give you a definite answer because I never optimized these tools for performance. First, you should measure where these tools spend the most time. It could be on exporting: transform to Protobuf, add to Hazelcast ring-buffer, or importing: read from Hazelcast ring-buffer, transform to record DTO, insert in Postgres database. Depending on the measurement result, you could optimize this part. For example,
|
Right, I have been trying to measure the code and troubleshoot what operation(s) is taking time. Given that we would have to modify the import code, I tried executing the
|
@shahamit I don't know why the test is not working for you. I suggest checking the log output. Maybe, something is wrong with the Zeebe Testcontainer or the Hazelcast client. |
Continuing this discussion on slack about the enhancement we want to add to this exporter of being able to scale when zeebe engine is configured with multiple partitions. Right now when hazelcast cluster is deployed with a zeebe cluster of around 8 brokers (with 8 partitions) and 2 gateways we observed that the hazelcast nodes crash when the zeebe cluster is under load. In our discussion we realised that this is due to the fact that the exporter is exporting zeebe records to a single ringbuffer.
This issue is created to discuss the fix for the enhancement of exporting as many ringbuffers as the number of zeebe partitions so that we get the scalability benefits of hazelcast.
Code Question @saig0 - While digging into the exporter code, I tried to convert the
ringbuffer
instance variable to aMap<partitionId, Ringbuffer>
but can you please throw insights into how do we get to know the number of zeebe partitions in the exporter code? Do we accept it as part of the configuration (probably not right to do so)?Thanks.
The text was updated successfully, but these errors were encountered: