Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a certain ipv4-vs-ipv6 config error non-fatal, for backward compat #121008

Merged

Conversation

danwinship
Copy link
Contributor

What type of PR is this?

/kind bug
/kind regression
/sig network
/priority important-soon

What this PR does / why we need it:

kube-proxy historically allowed badly-mismatched bits of configuration (e.g., specifying an IPv6 --cluster-cidr value on a seemingly dual-stack-IPv4-primary node). In 1.28 I consolidated the existing checks and added a bunch more, but I accidentally made one case fatal which had not been fatal in prior releases. This makes it just a warning again.

(It is still fatal to specify an IPv6 --cluster-cidr on a necessarily single-stack-IPv4 node (e.g., when kernel IPv6 is disabled).)

Which issue(s) this PR fixes:

Fixes #120999

Does this PR introduce a user-facing change?

Fixed a regression in kube-proxy where it might refuse to start if given
single-stack IPv6 configuration options on a node that has both IPv4 and
IPv6 IPs.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/regression Categorizes issue or PR as related to a regression from a prior release. sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 5, 2023
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/kube-proxy labels Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 5, 2023
@danwinship danwinship force-pushed the proxy-startup-error-regression branch from 44b1b35 to b2f0052 Compare October 5, 2023 16:57
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 5, 2023
@aojea
Copy link
Member

aojea commented Oct 5, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 63fd0d1014bda6f45e8b34d204b99c01e7d7ab19

@@ -678,7 +678,7 @@ func checkIPConfig(s *ProxyServer, dualStackSupported bool) (error, bool) {
clusterCIDRs := strings.Split(s.Config.ClusterCIDR, ",")
if badCIDRs(clusterCIDRs, badFamily) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function badCidrs only checks that the first cidr in the array is different than the badFamily, but the comment of the function seems to assume all the cidrs are not badFamily

// badCIDRs returns true if cidrs is a non-empty list of CIDRs, all of wrongFamily.

just for my curiosity, the comment is wrong, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment wording is weird but should be

// badCIDRs returns true if cidrs is a non-empty list of CIDRs and all are of wrongFamily.

essentially if the list is empty, or as soon as it finds a "good" CIDR that's not of wrongFamily it returns false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function badCidrs only checks that the first cidr in the array is different than the badFamily

No, it loops over all of the CIDRs, but bails out early in some cases. The semantics of the function are sort of confusing, so it's hard for the comment to be both concise and non-confusing...

There are three possibilities:

  • cidrs is empty, meaning the admin isn't claiming anything about what the primary IP family is, so we don't learn anything about whether there is a config problem or not.
  • cidrs is non-empty and contains at least one CIDR of the same family as nodeIP, meaning the admin is expecting that IP family to be in use, meaning it's OK that we detected nodeIP to be of that family.
  • cidrs is non-empty and contains only CIDRs of the opposite IP family from nodeIP, meaning the admin is asserting that this is a single-stack cluster of that family, meaning our detected nodeIP is wrong.

badCIDRs returns false in the first two cases and true in the third.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks for clarifying

Copy link
Member

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Is it possible to cherry-pick this to release-1.28?

@aojea
Copy link
Member

aojea commented Oct 5, 2023

lgtm

Is it possible to cherry-pick this to release-1.28?

yes, it is

@CecileRobertMichon
Copy link
Member

/retest

seems like the failures are unrelated (flakes?)

@k8s-ci-robot k8s-ci-robot merged commit 3bfa948 into kubernetes:master Oct 5, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 5, 2023
@danwinship danwinship deleted the proxy-startup-error-regression branch October 7, 2023 12:00
k8s-ci-robot added a commit that referenced this pull request Oct 9, 2023
…121008-origin-release-1.28

Automated cherry pick of #121008: Make a certain ipv4-vs-ipv6 config error non-fatal, for
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-proxy fails to start for IPv6 when underlying infra is dual stack in 1.28
4 participants