-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix network connection problem on Azure #72879
Conversation
Hi @marwinski. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: marwinski If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @matchstick @thockin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also see #72534 - especially the last commit. Would be nice to test your patch on top of that one.
Do we need an equivalent IPVS mode fix?
@m1093782566 for consultation - these are starting to pile up :)
// (immediate reject instead of connection timeout). This also avoids a stale entry in the | ||
// contrack table. | ||
writeLine(proxier.filterRules, | ||
"-A", string(kubeForwardChain), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wants to be kubeServicesChain
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubeServicesChain is called from OUTPUT. To prevent the default FORWARD we need to add the rule to the FORWARD chain; it has no effect in the KUBE-SERVICES chain. Not sure whether it would work or should be added to the raw-prerouting or mangele-prerouting chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we fixed forwarding to be called in both cases - please rebase and retry in services
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This somehow contradicts the other comment. Sure, I will change this if the other comment is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean - what other comment? In My PR I linked Input to kubeServicesChain, too.
IPVS has no such issue. |
/ok-to-test |
Add a reject iptables rule for serivces without endpoints to behave consistently for the transparent Azure load balancer. This also avoids the creation of stale conntrack entries which cause a denial of service as port number on Azue are sometimes aggressively re-used.
3e88f39
to
6e01139
Compare
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
as discussed here: #48719 (comment) this has to with /hold |
// Add a reject rule so the behavior is the same for clients going via the load balancer | ||
// (immediate reject instead of connection timeout). This also avoids a stale entry in the | ||
// contrack table. | ||
writeLine(proxier.filterRules, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this rule should be part of of https://github.com/kubernetes/kubernetes/pull/72879/files#diff-d51765b83fe795b469e8a86276b12dc9L911 logic. There is a rule there already for ClusterIP
with endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would work there. This code path is used for the externalIPs feature in services. This is not used for external load balancers (at least we don't see it being used with load balancers on Azure, only with the externalIPs feature). I admit to still have a somehow limited knowledge of the code but it appears to me that the rule is exactly where it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is the correct path. The problem happens with Services of type LoadBalancer
that has ExternalIPs
and no endpoints. For Azure (or anything that uses dsr) no traffic will be delivered to node if rule is not in place (and has an ExternalIP
/LGTM So let us merge one of them |
We should not merge this without tests. I am a little buried in PRs and docs right now - help to finish the test work on #74394 would be the way forward. |
Closing in favor of #74394 |
What type of PR is this?
What this PR does / why we need it:
Add a reject iptables rule for loadbalancer type serivces without endpoints to behave consistently for the transparent Azure load balancer.
This PR avoids the creation of stale conntrack entries in case there is temporarily no endpoint (which happens when components crash or are updated). This indeed causes a denial of service as source port numbers on Azue are sometimes aggressively re-used in between 15 and 20 seconds. This will reset the 120 second timeout (default) on the stale conntrack table entries. We have seen this for 7+ hours which causes requests to randomly fail due to timeouts although the service had been restored almost immediately. Once there is a conntrack entry iptables rules will not be processed and the action that was valid at that time will be executed. In this case it if "forward".
This resolves a serious issue with the Gardener project. In Gardener the API servers are behind a load balancer. The shoot clusters are accessing the kube-apiserver via a nat gateway. In this setup we do see many failures of components trying to access the API resulting in dodgy and sometimes unresponsive clusters.
Which issue(s) this PR fixes:
Fixes #48719
Special notes for your reviewer:
Does this PR introduce a user-facing change?: