-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP improve kube-proxy conntrack resilience #92122
Conversation
/cc @thockin @BenTheElder is just the first one and the scaffolding, replacing a goroutine by a pod that creates the udp traffic with the same source port (netcat is awesome 😹 ) and parsing the logs of the pods.
If this works well, I need to brainstorm about the different scenarios now, my thinking is that we need to cover:
|
/retest |
/test pull-kubernetes-e2e-kind-ipv6 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
no failures /test pull-kubernetes-e2e-gce |
deflake current e2e test "should be able to preserve UDP traffic when server pod cycles for a NodePort service" and reorganize the code in the e2e framework Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
kube-proxy was only flushing the conntrack entries for stale services and endpoints after it synced the iptables rules. However, if iptables fails the stale entries are not flushed and the next resync period is scheduled after 30 secs by default. Kube-proxy can flush the conntrack stale entries before syncing the iptables rules. Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
@aojea: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
conntrack failure on clusterIP only, not in nodePort |
I think I was wrong about this, if the iptables rules are not installed, there is no risk we drop traffic to the new endpoints. And if we keep sending to the old endpoint the result is the same for the client, because, there is no way the new endpoints will receive it |
What type of PR is this?
/kind cleanup
/kind failing-test
/kind flake
What this PR does / why we need it:
Analyzing current conntrack test failure
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind/1272167341655855104
We can see that one pod client can not connect to a NodePort UDP service, despite the backend is running.
If we see the kube-proxy logs https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind/1272103421071069186/artifacts/logs/kind-worker2/containers/kube-proxy-vxqmd_kube-system_kube-proxy-771c07013332c57aee7a54f659eee28428265c6bdc33c3c80155c30ea112991e.log
we can see that kube-proxy skips the iptables cycle, hence the logic that delete the conntrack stale entries
The current e2e test "should be able to preserve UDP traffic when server pod cycles for a
NodePort service"
Signed-off-by: Antonio Ojea antonio.ojea.garcia@gmail.com
Which issue(s) this PR fixes:
Fixes #91236
Special notes for your reviewer:
related to #92076
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: