Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Kafka input performance for small messages #1967

Merged
merged 3 commits into from
Jul 5, 2024
Merged

Improve Kafka input performance for small messages #1967

merged 3 commits into from
Jul 5, 2024

Conversation

blp
Copy link
Member

@blp blp commented Jun 28, 2024

Please see individual commits for details.

See #964 for background.

Is this a user-visible change (yes/no): no

@blp blp requested a review from ryzhyk June 28, 2024 18:43
@blp blp added performance adapters Issues related to the adapters crate rust labels Jun 28, 2024
Copy link
Contributor

@ryzhyk ryzhyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!
I wonder if 16 threads is too high as a default. In use cases with many Kafka connectors this can cause problems with POSIX threads.

@ryzhyk
Copy link
Contributor

ryzhyk commented Jun 28, 2024

@blp , would it be helpful to include some of the benchmarking data in this PR in some form, e.g., as a comment in Rust?

@blp
Copy link
Member Author

blp commented Jun 28, 2024

Looks good, thanks! I wonder if 16 threads is too high as a default. In use cases with many Kafka connectors this can cause problems with POSIX threads.

The default is 3.

blp added 3 commits July 3, 2024 09:11
…sh`.

Before, the benchmark used 1 partition.  Now it uses 16 by default, and
`--partitions` can change that.

The current difference in performance in my testing, for 1 versus 16
partitions with 10M events with 512 kB messages (which is what our
generator uses), is within the margin of error.

Signed-off-by: Ben Pfaff <blp@feldera.com>
…ges.

Kafka input performance for small messages was previously much worse than
`rpk`, the Redpanda utility for working with Kafka topics.  This commit
introduces the ability for the input adapter to use multiple threads to
read an input topic.  This substantially improves the performance for
small messages, bringing it much closer to `rpk` performance.

The following table shows the performance with default settings before and
after this commit with various message sizes and with `rpk`:

```
╭──────────────────────────────┬──────────────────╮
│                              │      method      │
│                              ├──────┬─────┬─────┤
│                              │before│after│ rpk │
├──────────────────────────────┼──────┼─────┼─────┤
│partitions 1  size 256    time│ 40.81│13.89│10.68│
│                   512    time│  8.63│ 6.26│ 5.34│
│                   4096   time│  1.68│ 1.76│ 2.15│
│                   65536  time│  2.01│ 1.99│ 1.96│
│                   524288 time│  1.62│ 1.64│ 2.08│
│          ╶───────────────────┼──────┼─────┼─────┤
│           16 size 256    time│ 51.97│14.32│     │
│                   512    time│ 13.69│ 7.08│     │
│                   4096   time│  1.81│ 2.12│     │
│                   65536  time│  1.65│ 1.74│     │
│                   524288 time│  1.64│ 1.87│     │
╰──────────────────────────────┴──────┴─────┴─────╯
```

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp blp merged commit c770d8f into main Jul 5, 2024
5 checks passed
@blp blp deleted the kafka branch July 5, 2024 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapters Issues related to the adapters crate performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants