Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix network connection problem on Azure #72879

Closed
wants to merge 1 commit into from

Conversation

marwinski
Copy link

@marwinski marwinski commented Jan 14, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Add a reject iptables rule for loadbalancer type serivces without endpoints to behave consistently for the transparent Azure load balancer.

This PR avoids the creation of stale conntrack entries in case there is temporarily no endpoint (which happens when components crash or are updated). This indeed causes a denial of service as source port numbers on Azue are sometimes aggressively re-used in between 15 and 20 seconds. This will reset the 120 second timeout (default) on the stale conntrack table entries. We have seen this for 7+ hours which causes requests to randomly fail due to timeouts although the service had been restored almost immediately. Once there is a conntrack entry iptables rules will not be processed and the action that was valid at that time will be executed. In this case it if "forward".

This resolves a serious issue with the Gardener project. In Gardener the API servers are behind a load balancer. The shoot clusters are accessing the kube-apiserver via a nat gateway. In this setup we do see many failures of components trying to access the API resulting in dodgy and sometimes unresponsive clusters.

Which issue(s) this PR fixes:

Fixes #48719

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Add a reject iptables rule for loadbalancer type serivces without endpoints to behave consistently for transparent load balancers (e.g. Azure).

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 14, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @marwinski. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 14, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marwinski
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: matchstick

If they are not already assigned, you can assign the PR to them by writing /assign @matchstick in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 14, 2019
@mvladev
Copy link

mvladev commented Jan 14, 2019

/assign @matchstick @thockin

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also see #72534 - especially the last commit. Would be nice to test your patch on top of that one.

Do we need an equivalent IPVS mode fix?

@m1093782566 for consultation - these are starting to pile up :)

pkg/proxy/iptables/proxier.go Outdated Show resolved Hide resolved
// (immediate reject instead of connection timeout). This also avoids a stale entry in the
// contrack table.
writeLine(proxier.filterRules,
"-A", string(kubeForwardChain),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wants to be kubeServicesChain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubeServicesChain is called from OUTPUT. To prevent the default FORWARD we need to add the rule to the FORWARD chain; it has no effect in the KUBE-SERVICES chain. Not sure whether it would work or should be added to the raw-prerouting or mangele-prerouting chains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we fixed forwarding to be called in both cases - please rebase and retry in services

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somehow contradicts the other comment. Sure, I will change this if the other comment is resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean - what other comment? In My PR I linked Input to kubeServicesChain, too.

pkg/proxy/iptables/proxier.go Outdated Show resolved Hide resolved
@m1093782566
Copy link
Contributor

@thockin

IPVS has no such issue.

@m1093782566
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 15, 2019
Add a reject iptables rule for serivces without endpoints to behave
consistently for the transparent Azure load balancer.

This also avoids the creation of stale conntrack entries which
cause a denial of service as port number on Azue are sometimes
aggressively re-used.
@marwinski marwinski force-pushed the azure_kube_proxy_fix branch from 3e88f39 to 6e01139 Compare January 15, 2019 13:00
@marwinski
Copy link
Author

/retest

1 similar comment
@marwinski
Copy link
Author

/retest

@marwinski
Copy link
Author

@thockin: I tested this fix with #72534 and yes, my fix works the same way as it did before. This said, this is a manual test for my fix but the fact that the cluster is still operational gives some indication that nothing big appears to be broken.

Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2019
@khenidak
Copy link
Contributor

khenidak commented Feb 8, 2019

as discussed here: #48719 (comment) this has to with dsr not just azure can we please rename the title accordingly?

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2019
// Add a reject rule so the behavior is the same for clients going via the load balancer
// (immediate reject instead of connection timeout). This also avoids a stale entry in the
// contrack table.
writeLine(proxier.filterRules,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this rule should be part of of https://github.com/kubernetes/kubernetes/pull/72879/files#diff-d51765b83fe795b469e8a86276b12dc9L911 logic. There is a rule there already for ClusterIP with endpoints.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would work there. This code path is used for the externalIPs feature in services. This is not used for external load balancers (at least we don't see it being used with load balancers on Azure, only with the externalIPs feature). I admit to still have a somehow limited knowledge of the code but it appears to me that the rule is exactly where it should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is the correct path. The problem happens with Services of type LoadBalancer that has ExternalIPs and no endpoints. For Azure (or anything that uses dsr) no traffic will be delivered to node if rule is not in place (and has an ExternalIP

@khenidak
Copy link
Contributor

/LGTM
Please note this is exactly like #74394

So let us merge one of them

@thockin @craiglpeters

@thockin
Copy link
Member

thockin commented Mar 1, 2019

We should not merge this without tests. I am a little buried in PRs and docs right now - help to finish the test work on #74394 would be the way forward.

@thockin
Copy link
Member

thockin commented Mar 5, 2019

Closing in favor of #74394

@thockin thockin closed this Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client TCP connections have to wait full timeout when set of endpoints goes from empty to non-empty
8 participants