Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add net.ipv4.tcp_rmem and net.ipv4.tcp_wmem into safe sysctl list #125234

Closed
SataQiu opened this issue May 31, 2024 · 10 comments · Fixed by #127489
Closed

Add net.ipv4.tcp_rmem and net.ipv4.tcp_wmem into safe sysctl list #125234

SataQiu opened this issue May 31, 2024 · 10 comments · Fixed by #127489
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@SataQiu
Copy link
Member

SataQiu commented May 31, 2024

What would you like to be added?

net.ipv4.tcp_rmem and net.ipv4.tcp_wmem have been namespaced since torvalds/linux@356d183 (Linux Kernal Version >= 4.15)

It would be helpful to allow config these sysctls for each Pod(Application).

Why is this needed?

The good performance of some applications depends on these sysctls, for example https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/install/installRecommendSettings.html#TCPsettings.

To handle thousands of concurrent connections used by Cassandra, DataStax recommends these settings to optimize the Linux network stack. We should config these sysctls as the following:

net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

It would be helpful to be able to configure this for applications without needing to modify the kubelet args.

@SataQiu SataQiu added the kind/feature Categorizes issue or PR as related to a new feature. label May 31, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 31, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@SataQiu
Copy link
Member Author

SataQiu commented May 31, 2024

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 31, 2024
@HirazawaUi
Copy link
Contributor

/sig network
/cc @thockin

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label May 31, 2024
@nikzayn
Copy link
Contributor

nikzayn commented Jun 1, 2024

Hey @SataQiu, I would like to take this up after triage-accepted label. Thanks

/assign

nikzayn added a commit to nikzayn/kubernetes that referenced this issue Jun 1, 2024
…list kubernetes#125234

Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
@mauri870
Copy link
Member

mauri870 commented Jun 1, 2024

@nikzayn I had started working on this yesterday, but I'm fine passing the ball to you. Please take a look at my changes, perhaps some of them are worth porting over to your PR. Thanks.

https://github.com/kubernetes/kubernetes/compare/master...mauri870:kubernetes:feature/sysctl-tcp-rmem-wmem?expand=1

@mauri870
Copy link
Member

mauri870 commented Jun 1, 2024

Related work for net.ipv4.tcp_keepalive_time at #118846 can be used as inspiration as well.

@nikzayn
Copy link
Contributor

nikzayn commented Jun 1, 2024

Thanks a lot @mauri870 my bad I didn't know you were working. I should have to wait for someone to assign from next time. Thanks for passing the ball. Definitely, I will take a look and after a successful build pass will push the respective changes.

@mauri870
Copy link
Member

mauri870 commented Jun 1, 2024

Thanks a lot @mauri870 my bad I didn't know you were working. I should have to wait for someone to assign from next time. Thanks for passing the ball. Definitely, I will take a look and after a successful build pass will push the respective changes.

No worries, it's my fault for not assigning the issue to myself earlier. I'm glad you were able to work on it.

nikzayn added a commit to nikzayn/kubernetes that referenced this issue Jun 1, 2024
@nikzayn
Copy link
Contributor

nikzayn commented Jun 1, 2024

Thanks a lot @mauri870 my bad I didn't know you were working. I should have to wait for someone to assign from next time. Thanks for passing the ball. Definitely, I will take a look and after a successful build pass will push the respective changes.

No worries, it's my fault for not assigning the issue to myself earlier. I'm glad you were able to work on it.

Thanks, I have made the changes. You can check that out. Do checkout of If I am missing anything. Thanks!!

nikzayn added a commit to nikzayn/kubernetes that referenced this issue Jun 2, 2024
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
@thockin thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 6, 2024
@wenjianhn
Copy link

What is the typical TCP RTT between those Cassandra nodes?

BDP (bits) = bandwidth (bits/second) * RTT (seconds)
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 $MaxExpectedPathBDP"
See https://cloud.google.com/compute/docs/networking/tcp-optimization-for-network-performance-in-gcp-and-hybrid

TCP memory is not that safe per https://lpc.events/event/16/contributions/1212/

Potentially a low priority job can hog all the available TCP memory and starve the high priority jobs collocated with it. Indeed we have seen production incidences of low priority jobs negatively impacting the network performance of collocated high priority jobs.

pacoxu pushed a commit to pacoxu/kubernetes that referenced this issue Sep 20, 2024
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
pacoxu pushed a commit to pacoxu/kubernetes that referenced this issue Sep 20, 2024
…list kubernetes#125234

Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
pacoxu pushed a commit to pacoxu/kubernetes that referenced this issue Sep 20, 2024
pacoxu pushed a commit to pacoxu/kubernetes that referenced this issue Sep 20, 2024
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
pacoxu pushed a commit to pacoxu/kubernetes that referenced this issue Sep 20, 2024
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
pacoxu pushed a commit to pacoxu/kubernetes that referenced this issue Sep 23, 2024
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
richabanker pushed a commit to richabanker/kubernetes that referenced this issue Oct 29, 2024
…list kubernetes#125234

Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
richabanker pushed a commit to richabanker/kubernetes that referenced this issue Oct 29, 2024
richabanker pushed a commit to richabanker/kubernetes that referenced this issue Oct 29, 2024
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
beordie pushed a commit to beordie/kubernetes that referenced this issue Jan 6, 2025
…list kubernetes#125234

Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
beordie pushed a commit to beordie/kubernetes that referenced this issue Jan 6, 2025
beordie pushed a commit to beordie/kubernetes that referenced this issue Jan 6, 2025
Signed-off-by: nikzayn <nikhilvaidyar1997@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
7 participants