WIP improve kube-proxy conntrack resilience #92122

aojea · 2020-06-14T09:47:22Z

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug

/kind cleanup

/kind deprecation
/kind design
/kind documentation

/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:

Analyzing current conntrack test failure
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind/1272167341655855104

We can see that one pod client can not connect to a NodePort UDP service, despite the backend is running.
If we see the kube-proxy logs https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind/1272103421071069186/artifacts/logs/kind-worker2/containers/kube-proxy-vxqmd_kube-system_kube-proxy-771c07013332c57aee7a54f659eee28428265c6bdc33c3c80155c30ea112991e.log
we can see that kube-proxy skips the iptables cycle, hence the logic that delete the conntrack stale entries

020-06-14T10:41:16.713324865Z stderr F Trace[1260496449]: [2.833057484s] [2.833057484s] END
2020-06-14T10:41:37.655915121Z stderr F I0614 10:41:37.629312       1 trace.go:116] Trace[1898108080]: "iptables restore" (started: 2020-06-14 10:41:34.859334571 +0000 UTC m=+1248.192582056) (total time: 2.769920001s):
2020-06-14T10:41:37.6559627Z stderr F Trace[1898108080]: [2.769920001s] [2.769920001s] END
2020-06-14T10:44:53.090771324Z stderr F E0614 10:44:53.087163       1 proxier.go:866] Failed to ensure that filter chain INPUT jumps to KUBE-EXTERNAL-SERVICES: error checking rule: exit status 4: Another app is currently holding the xtables lock; still 4s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:44:53.090804341Z stderr F Another app is currently holding the xtables lock; still 3s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:44:53.090812193Z stderr F Another app is currently holding the xtables lock; still 2s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:44:53.090817632Z stderr F Another app is currently holding the xtables lock; still 1s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:44:53.090822825Z stderr F Another app is currently holding the xtables lock; still 0s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:44:53.090831003Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s.
2020-06-14T10:44:53.09083729Z stderr F I0614 10:44:53.087196       1 proxier.go:850] Sync failed; retrying in 30s
2020-06-14T10:45:07.958821003Z stderr F E0614 10:45:07.957005       1 proxier.go:858] Failed to ensure that filter chain KUBE-FORWARD exists: error creating chain "KUBE-FORWARD": exit status 4: Another app is currently holding the xtables lock; still 4s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:45:07.958863181Z stderr F Another app is currently holding the xtables lock; still 3s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:45:07.958870983Z stderr F Another app is currently holding the xtables lock; still 2s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:45:07.958879755Z stderr F Another app is currently holding the xtables lock; still 1s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:45:07.958885596Z stderr F Another app is currently holding the xtables lock; still 0s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:45:07.958893286Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s.
2020-06-14T10:45:07.958900667Z stderr F I0614 10:45:07.957038       1 proxier.go:850] Sync failed; retrying in 30s
2020-06-14T10:46:10.783689592Z stderr F E0614 10:46:10.692885       1 proxier.go:866] Failed to ensure that filter chain FORWARD jumps to KUBE-SERVICES: error checking rule: exit status 4: Another app is currently holding the xtables lock; still 4s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:46:10.783766723Z stderr F Another app is currently holding the xtables lock; still 3s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:46:10.783776996Z stderr F Another app is currently holding the xtables lock; still 2s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:46:10.783786709Z stderr F Another app is currently holding the xtables lock; still 1s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:46:10.783793103Z stderr F Another app is currently holding the xtables lock; still 0s 100000us time ahead to have a chance to grab the lock...
2020-06-14T10:46:10.783799786Z stderr F Another app is currently holding the xtables lock. Stopped waiting after 5s.
2020-06-14T10:46:10.783809779Z stderr F I0614 10:46:10.692921       1 proxier.go:850] Sync failed; retrying in 30s
2020-06-14T10:47:28.158771573Z stderr F E0614 10:47:28.157681       1 proxier.go:858] Failed to ensure that filter chain KUBE-EXTERNAL-SERVICES exists: error creating chain "KUBE-EXTERNAL-SERVICES": exit status 4: Another app is currently holding the xtables lock; still 4s 100000us time ahead to have a chance to grab the lock...

The current e2e test "should be able to preserve UDP traffic when server pod cycles for a
NodePort service"

Signed-off-by: Antonio Ojea antonio.ojea.garcia@gmail.com

Which issue(s) this PR fixes:

Fixes #91236

Special notes for your reviewer:

related to #92076

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

aojea · 2020-06-14T09:55:15Z

/cc @thockin @BenTheElder

is just the first one and the scaffolding, replacing a goroutine by a pod that creates the udp traffic with the same source port (netcat is awesome 😹 ) and parsing the logs of the pods.
This is an example of the pod log output,

STEP: client pod connecting to the backend 2 on 172.18.0.3
Jun 14 11:41:51.826: INFO: Pod client logs: Sun Jun 14 09:41:02 UTC 2020
Try: 1

Try: 2

Try: 3

Try: 4

Try: 5

Try: 6

Try: 7

Try: 8
pod-server-1
Try: 9
pod-server-1
Try: 10
pod-server-1
Try: 11
pod-server-1
Try: 12
pod-server-2
Try: 13
pod-server-2
Try: 14
pod-server-2
Try: 15
pod-server-2
Try: 16

If this works well, I need to brainstorm about the different scenarios now, my thinking is that we need to cover:

Services: ClusterIP and NodePort
Topologies (assuming 2 nodes for simplicity):
Client Node1 Backends Node 2
Client and Backend1 Node1, Backend2 Node2
Client and Backends Node1
Network traffic
Client -> service without endpoints , create an endpoint and it should receive the traffic Services (kube-proxy): conntrack for UDP needs to be flushed when set of endpoints goes from empty to non-empty #48370
Client to service and endpoints are created or deleted
Ref:
Flush conntrack state for removed/changed UDP Services #22573
Incoming UDP packets not reach newly deployed services #31983
Bug fix. Incoming UDP packets not reach newly deployed services #32561
Failed to receive UDP traffic after container restart moby/moby#8795

aojea · 2020-06-14T11:43:57Z

🤔 heh
failed on
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind/1272103421071069186

passed
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind-ipv6/1272103421071069185
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-gce/1272103421054291969/

anyway, this seems a much better approach

aojea · 2020-06-14T13:56:20Z

/retest

aojea · 2020-06-14T14:01:37Z

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-ubuntu-containerd

k8s-ci-robot · 2020-06-14T17:39:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please assign danwinship
You can assign the PR to them by writing /assign @danwinship in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2020-06-14T20:48:41Z

no failures

/test pull-kubernetes-e2e-gce
/test pull-kubernetes-e2e-gce-ubuntu-containerd
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-kind-ipv6

deflake current e2e test "should be able to preserve UDP traffic when server pod cycles for a NodePort service" and reorganize the code in the e2e framework Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>

kube-proxy was only flushing the conntrack entries for stale services and endpoints after it synced the iptables rules. However, if iptables fails the stale entries are not flushed and the next resync period is scheduled after 30 secs by default. Kube-proxy can flush the conntrack stale entries before syncing the iptables rules. Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>

k8s-ci-robot · 2020-06-15T10:37:16Z

@aojea: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-kind	`d169dfb`	link	`/test pull-kubernetes-e2e-kind`
pull-kubernetes-e2e-gce-ubuntu-containerd	`d169dfb`	link	`/test pull-kubernetes-e2e-gce-ubuntu-containerd`
pull-kubernetes-verify	`d169dfb`	link	`/test pull-kubernetes-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

aojea · 2020-06-15T11:34:09Z

conntrack failure on clusterIP only, not in nodePort

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/92122/pull-kubernetes-e2e-kind/1272446179866578945

aojea · 2020-06-16T15:38:04Z

I think I was wrong about this, if the iptables rules are not installed, there is no risk we drop traffic to the new endpoints. And if we keep sending to the old endpoint the result is the same for the client, because, there is no way the new endpoints will receive it

k8s-ci-robot requested review from andrewsykim and danwinship June 14, 2020 09:49

k8s-ci-robot requested review from BenTheElder and thockin June 14, 2020 09:55

aojea force-pushed the e2econntrack branch from e2d941b to d35a168 Compare June 14, 2020 11:44

aojea force-pushed the e2econntrack branch from d35a168 to 813ec1b Compare June 14, 2020 15:11

aojea changed the title ~~WIP e2e conntrack tests~~ WIP improve kube-proxy conntrack resilience Jun 14, 2020

aojea force-pushed the e2econntrack branch from 8da4e6e to 44abd02 Compare June 14, 2020 19:32

aojea added 2 commits June 15, 2020 10:28

e2e conntrack tests

c5a2912

deflake current e2e test "should be able to preserve UDP traffic when server pod cycles for a NodePort service" and reorganize the code in the e2e framework Signed-off-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>

aojea force-pushed the e2econntrack branch from 44abd02 to d169dfb Compare June 15, 2020 08:29

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 15, 2020

aojea closed this Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP improve kube-proxy conntrack resilience #92122

WIP improve kube-proxy conntrack resilience #92122

aojea commented Jun 14, 2020 •

edited

Loading

aojea commented Jun 14, 2020 •

edited

Loading

aojea commented Jun 14, 2020

aojea commented Jun 14, 2020

aojea commented Jun 14, 2020

k8s-ci-robot commented Jun 14, 2020

aojea commented Jun 14, 2020

k8s-ci-robot commented Jun 15, 2020

aojea commented Jun 15, 2020

aojea commented Jun 16, 2020 •

edited

Loading

WIP improve kube-proxy conntrack resilience #92122

WIP improve kube-proxy conntrack resilience #92122

Conversation

aojea commented Jun 14, 2020 • edited Loading

aojea commented Jun 14, 2020 • edited Loading

aojea commented Jun 14, 2020

aojea commented Jun 14, 2020

aojea commented Jun 14, 2020

k8s-ci-robot commented Jun 14, 2020

aojea commented Jun 14, 2020

k8s-ci-robot commented Jun 15, 2020

aojea commented Jun 15, 2020

aojea commented Jun 16, 2020 • edited Loading

aojea commented Jun 14, 2020 •

edited

Loading

aojea commented Jun 14, 2020 •

edited

Loading

aojea commented Jun 16, 2020 •

edited

Loading