-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[performance] - Remove bulk and streams use cases from UO and TO. Scallability test added. #10138
[performance] - Remove bulk and streams use cases from UO and TO. Scallability test added. #10138
Conversation
/packit test --labels performance-topic-operator-capacity |
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testCapacityCreateAndUpdateTopics --env=STRIMZI_USE_KRAFT_IN_TESTS=true |
|
❌ Test Summary ❌TEST_PROFILE: performance ❗ Test Failures ❗
Re-run command: |
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testCapacityCreateAndUpdateTopics --env=STRIMZI_USE_KRAFT_IN_TESTS=true |
|
❌ Test Summary ❌TEST_PROFILE: performance ❗ Test Failures ❗
Re-run command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @see-quick, thanks for working on this.
In order to simulate a busy shared cluster and possibly catch some edge cases, I think we should try to include all 3 kind of topic events (creations, updates and deletes) and run them in parallel.
In my custom test, I'm taking the number of events I want to test as input, then I divide them by 3 to get the number of tasks I have to run in parallel (you would have 1/2 spare events that you can simply consume as noop, that's fine). Each task executes topic creation, update (partition increase and config change) and deletion serially. Wdyt?
Okay, but that way we would not be able to see the upper bound (i.e., how many KafkaTopics is TO able to handle in creation and modification). Maybe such information is not so important....and if we make all these three operations what is our termination condition? Do we want to create a specific number of topics (e.g., 1000) and see how TO is performing on different configurations? What are then the most important OUT metrics to check? Also, should we execute these tasks incrementally and divide them into batches (i.e., every 100 KafkaTopics?) as we do capacity or should we run all 1000 topics at once? |
I think the objective here is not see the upper bound, but to assess performance in a fixed size of events. For example, I'm running test with the following batch of events: 50, 100, 150, ..., 1000. That way, you see how it scales, by simply putting the end-to-end reconciliation (we only care about this one here) time on a line graph, and you can compare with a previous implementation on the very same graph. With e2e reconciliation time in seconds I mean the time from creation/update to ready, or deletion duration. This is how an example graph looks like (note: we only need number, then you can generate the graph with whatever tool you prefer): |
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true |
1 similar comment
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true |
|
❗ Systemtests Failed (no tests results are present) ❗ |
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true |
|
systemtest/src/main/java/io/strimzi/systemtest/Environment.java
Outdated
Show resolved
Hide resolved
✔️ Test Summary ✔️TEST_PROFILE: null |
So I have tried 6 configurations here: a) with internal metric - strimzi max reconciliation b) with external metric - duration of all operations (i.e., create, modify and delete + readiness) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@see-quick nice work.
I left some improvement suggestions, but the base logic is there.
I would also try with BS 100 and LMS 10.
systemtest/src/test/java/io/strimzi/systemtest/performance/TopicOperatorPerformance.java
Outdated
Show resolved
Hide resolved
systemtest/src/test/java/io/strimzi/systemtest/performance/TopicOperatorPerformance.java
Outdated
Show resolved
Hide resolved
systemtest/src/test/java/io/strimzi/systemtest/performance/TopicOperatorPerformance.java
Outdated
Show resolved
Hide resolved
...est/src/main/java/io/strimzi/systemtest/performance/utils/TopicOperatorPerformanceUtils.java
Show resolved
Hide resolved
...est/src/main/java/io/strimzi/systemtest/performance/utils/TopicOperatorPerformanceUtils.java
Show resolved
Hide resolved
...est/src/main/java/io/strimzi/systemtest/performance/utils/TopicOperatorPerformanceUtils.java
Outdated
Show resolved
Hide resolved
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.15 --install-type=bundle --profile=performance --testcase=TopicOperatorPerformance#testPerformanceInFixedSizeOfEvents --env=STRIMZI_USE_KRAFT_IN_TESTS=true |
|
Signed-off-by: see-quick <maros.orsak159@gmail.com>
e50f635
to
cbe3e47
Compare
Signed-off-by: see-quick <maros.orsak159@gmail.com>
/packit test --labels performance |
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
/packit test --labels performance |
Signed-off-by: see-quick <maros.orsak159@gmail.com>
/packit test --labels performance |
Signed-off-by: see-quick <maros.orsak159@gmail.com>
/packit test --labels performance |
Signed-off-by: see-quick <maros.orsak159@gmail.com>
/packit test --labels performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Signed-off-by: see-quick <maros.orsak159@gmail.com>
Signed-off-by: see-quick <maros.orsak159@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just several nits
systemtest/src/main/java/io/strimzi/systemtest/performance/PerformanceConstants.java
Show resolved
Hide resolved
...src/main/java/io/strimzi/systemtest/performance/report/TopicOperatorPerformanceReporter.java
Show resolved
Hide resolved
...est/src/main/java/io/strimzi/systemtest/performance/utils/TopicOperatorPerformanceUtils.java
Show resolved
Hide resolved
...est/src/main/java/io/strimzi/systemtest/performance/utils/TopicOperatorPerformanceUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: see-quick <maros.orsak159@gmail.com>
/packit test --labels performance |
Type of change
Description
This PR focuses on exploring the impact of different configurations on the efficiency of creating, modifying, and deleting Kafka topics. I've played around with a range of batch sizes and linger durations to see how they affect performance across different scales of topic counts.
Based on this graph (KRaft):
One can see that I have tried multiple configurations with batch sizes and linger settings stretching from 1ms to 2000ms. Moreover, the range of topics which I tested was from 50 to 1000 to see some pattern if such configuration scaling well or if there are some problems....(that could be viewed on each curve). This could help us understand the capabilities of UTO with various settings and scale with the best configuration.
I have also implemented the way how we create the events. Currently, we are doing it sequentially and now I have modified it and used ExecutorService to manage and process batches concurrently. More on that in the Javadoc...
[1] - #10050 (review)
Update (19.9.2024):
After a few modifications, also we have decided to remove two use cases from TO and UO (i.e., Alice bulk and bob's streaming). We do not think it adds much value so we would currently stick to capacity and scalability tests, which are now present in those test suites.
Checklist