Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port hostport code from iptables to nftables #8684

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Oct 16, 2024

What type of PR is this?

/kind cleanup
/kind other

What this PR does / why we need it:

Ports hostport code from iptables to nftables, which will be needed when RHEL 10 drops support for iptables.

Which issue(s) this PR fixes:

None (but it's part of https://issues.redhat.com/browse/OCPSTRAT-940)

Special notes for your reviewer:

I'm not sure exactly what sort of compatibility constraints cri-o has but note that:

  1. As currently written, this PR entirely drops iptables support, requiring all users to use nftables. This is not a problem for OCP (all RHEL9-based releases have working nftables) but it is a new/changed requirement overall that other users may need to consider on upgrade.
  2. nft releases before 1.0.1 are problematic (see discussion here) so the knftables package requires that version or newer. This means using a mid-2021-or-later distro, meaning not RHEL 7, Debian buster or bullseye, or Ubuntu 20.04 (which are all allegedly currently supported according to install.md).
  3. As currently written, this PR does not try to do any cleanup/migration of existing iptables hostport rules, meaning that if you upgraded cri-o on a running node (ie, without draining/rebooting), you could end up with "stranded" iptables hostport rules that would not get cleaned up when their corresponding pods were deleted.

Also, the arm64 integration tests are currently failing because the image used for them doesn't include nft. I tried updating .github/workflows/integration.yml to do that, but the PR seems to be ignoring it. Maybe it has to be updated separately? (Or I did something wrong...)

Does this PR introduce a user-facing change?

Hostports are now implemented using nftables rather than iptables.
Action required: cri-o now depends on the `nft` binary being available (and
does not depend on the iptables binaries being available).

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/other Categorizes issue or PR as not clearly related to any existing kind/* category needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 16, 2024
Copy link
Contributor

openshift-ci bot commented Oct 16, 2024

Hi @danwinship. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@danwinship
Copy link
Contributor Author

  1. As currently written, this PR entirely drops iptables support

(meant to imply: I could rewrite it to support both iptables and nftables (though that would obviously be more complicated). It just didn't occur to me until I was filing the PR that we might want that.)

@kwilczynski
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 16, 2024
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 78.47896% with 133 lines in your changes missing coverage. Please review.

Project coverage is 47.24%. Comparing base (d8220db) to head (610ede1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8684      +/-   ##
==========================================
+ Coverage   46.96%   47.24%   +0.28%     
==========================================
  Files         154      155       +1     
  Lines       21978    22232     +254     
==========================================
+ Hits        10322    10504     +182     
- Misses      10594    10657      +63     
- Partials     1062     1071       +9     

@danwinship danwinship force-pushed the nftables branch 5 times, most recently from d43803d to fd31b5e Compare October 16, 2024 22:34
@danwinship
Copy link
Contributor Author

/test all

@kwilczynski
Copy link
Member

@danwinship, we are making an assumption here that all users of CRI-O are going to be using nftables, isn't it?

@danwinship
Copy link
Contributor Author

danwinship commented Oct 17, 2024

@kwilczynski it assumes that all users of CRI-O 1.32 will be on distros with new-enough nftables (see the "Special notes for your reviewer" section of the initial comment). It doesn't assume they'll be using nftables for kube-proxy or firewalling or anything else. The hostport rules are independent.

(And I can rewrite it to support either iptables or nftables if necessary. It will just be more complicated...)

@danwinship
Copy link
Contributor Author

So there's only one test of HostPorts in the e2e suite ([It] [sig-network] HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol [LinuxOnly] [Conformance]), and it's passing in both ci/prow/e2e-aws-ovn and ci/prow/ci-e2e (which, if I understand correctly, runs against a vanilla upstream k8s environment, not OCP, right?)

@danwinship danwinship marked this pull request as ready for review October 17, 2024 13:54
@danwinship danwinship requested a review from mrunalp as a code owner October 17, 2024 13:54
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2024
@haircommander
Copy link
Member

if I understand correctly, runs against a vanilla upstream k8s environment, not OCP, right

correct!

Can we optimistically use nftables and if that fails fallback to iptables? I'm fine dropping iptables in our docs and recommending nftables but I Think there should be some transition time so we don't pull the rug out from underneath users who use older OSs (I know of some using debian, not sure which version)

@danwinship
Copy link
Contributor Author

@haircommander done, though probably this needs some hand-testing to make sure it really is supporting both modes (and the linter doesn't like what I did with the unit tests...)

Does it need to do both-iptables-and-nftables-cleanup on Remove? (eg, does it need to support the possibility of upgrading and restarting cri-o without rebooting).

@haircommander
Copy link
Member

Does it need to do both-iptables-and-nftables-cleanup on Remove? (eg, does it need to support the possibility of upgrading and restarting cri-o without rebooting).

Unfortunately, yes

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2024
@danwinship
Copy link
Contributor Author

note: #8757 contains the first half of this (unit test updates) which can be reviewed separately

@danwinship danwinship force-pushed the nftables branch 3 times, most recently from eb12f2c to a6842ee Compare November 15, 2024 20:30
It turns out it's more convenient to have an IPv4/IPv6 split for
nftables too, plus we will need to work with both iptables and
nftables at the same time for backward-compatibility, and the meta
manager helps with that too.

This reverts commit 0262d19.

(but includes the up-to-date fixed dual-stack unit test rather than
the original broken original one)

Signed-off-by: Dan Winship <danwinship@redhat.com>
hostport.PodPortMapping is only used to aggregate the arguments to
HostPortManager.Add() and .Remove(), but they could just as easily be
passed separately (which will turn out to be useful later).

(Also, the existence of the struct was hiding the fact that the
toplevel code was not actually passing the Namespace field. Since this
only ended up being used for an iptables comment anyway, I just
removed it and simplified the code to work only with the pod name.)

Signed-off-by: Dan Winship <danwinship@redhat.com>
Signed-off-by: Dan Winship <danwinship@redhat.com>
Signed-off-by: Dan Winship <danwinship@redhat.com>
Put all of the iptables-specific code from hostport.go and
hostport_manager.go into hostport_iptables.go (and all of the
iptables-specific tests into hostport_iptables_test.go), leaving just
the type definitions in hostport_manager.go and test case data in
hostport_manager_test.go.

Signed-off-by: Dan Winship <danwinship@redhat.com>
IPTables will not be supported in RHEL 10, and is deprecated / not
being improved in general. This adds a new nftables-based
HostPortManager (but does not yet make it the default).

Signed-off-by: Dan Winship <danwinship@redhat.com>
If either iptables or nftables is not present, just use only the other
one. If both are present, use nftables for Add, but both for Remove
(so that old iptables rules eventually get cleaned up correctly).

Signed-off-by: Dan Winship <danwinship@redhat.com>
Copy link
Contributor

openshift-ci bot commented Dec 14, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danwinship
Once this PR has been reviewed and has the lgtm label, please assign giuseppe, kwilczynski, littlejawa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Dec 14, 2024

@danwinship: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn 610ede1 link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn 610ede1 link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@haircommander
Copy link
Member

@danwinship do we need this for 1.32 or is it possible to defer to 1.33?

@haircommander haircommander added this to the 1.32 milestone Dec 17, 2024
@danwinship
Copy link
Contributor Author

It can be deferred to 1.33

@saschagrunert saschagrunert modified the milestones: 1.32, 1.33 Dec 18, 2024
@saschagrunert
Copy link
Member

Moved it to the next milestone, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/other Categorizes issue or PR as not clearly related to any existing kind/* category ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants