Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IP_NF_MANGLE to check-config.sh #46667

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

stephan-henningsen
Copy link
Contributor

@stephan-henningsen stephan-henningsen commented Oct 17, 2023

Add IP_NF_MANGLE to "Generally Required" kernel features, since it appears to be necessary for Docker Swarm to work.

Closes #46636

- What I did
Used the check-config.sh script for my custom Linux to be compatible with Docker. And it mostly works, except connecting to local services in swarm mode!

I tested on a single-node swarm cluster and started a simple web service:

docker swarm init --default-addr-pool 10.22.0.0/16
docker service create --publish published=8088,target=80 --name=www nginx:1.25.2-alpine
curl 127.0.0.1:8088

The last command will fail to connect to the published port: curl: (7) Failed to connect to 127.0.0.1 port 8088 after 0 ms: Error

- How I did it
I isolated the cause for the connection refused to a single kernel module, using lots of lsmod and diff between a working system and my broken system, and moving various kernel modules out of the way and back again. It took some time.

- How to verify it
First establish a failing test case:

  1. Boot into a Linux that starts a Docker Engine but any swarm nodes.
  2. sudo mv /lib/modules/6.1.51/kernel/net/ipv4/netfilter/iptable_mangle.ko{,.disabled} (or equivalent) to move kernel modules out of the way.
  3. docker swarm init --default-addr-pool 10.22.0.0/16
  4. docker service create --publish published=8088,target=80 --name=www nginx:1.25.2-alpine
  5. curl 127.0.0.1:8088
    Expected: Connected, HTML is shown .
    Actual: Connection failed.
    Run lsmod | grep iptable_mangle and confirm the module isn't loaded.

Cleanup:

  1. docker service rm www
  2. docker swarm leave --force

Now confirm that the module is indeed required by swarm:

  1. sudo mv /lib/modules/6.1.51/kernel/net/ipv4/netfilter/iptable_mangle.ko{.disabled,} to reinstall kernel module
  2. docker swarm init --default-addr-pool 10.22.0.0/16
  3. lsmod | grep iptable_mangle and confirm the module is loaded but unused.
  4. docker service create --publish published=8088,target=80 --name=www nginx:1.25.2-alpine
  5. Run lsmod | grep iptable_mangle and confirm the module is in fact loaded and in use.
  6. curl 127.0.0.1:8088
    Expected/Actual: iptable_mangle is listed, HTML is shown.

- Description for the changelog
Add IP_NF_MANGLE to the list of "Generally Required" because it is required by Swarm.

- A picture of a cute animal (not mandatory but encouraged)

                               __
                              /\_\
                             _\/_/
                            /\_\
                            \/_/_
                              /\_\
                              \/_/
                             /\_\               .
                             \/_/              ::
                              .:::::::::::::..::::.
                              :::::::::::::::::
                              ;;;;;;.:::::::::
                               ;;;;;;:::::::
                                 ;;;:::::

                        https://lightwhale.asklandd.dk

Signed-off-by: Stephan Henningsen stephan+github@asklandd.dk

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this is the correct place. There are three places that we create mangle rules (and the last one in my list is the one you got bit by):

  • SCTP packet egress:
    if proto == "sctp" {
    // Linux kernel v4.9 and below enables NETIF_F_SCTP_CRC for veth by
    // the following commit.
    // This introduces a problem when conbined with a physical NIC without
    // NETIF_F_SCTP_CRC. As for a workaround, here we add an iptables entry
    // to fill the checksum.
    //
    // https://github.com/torvalds/linux/commit/c80fafbbb59ef9924962f83aac85531039395b18
    args = []string{
    "-p", proto,
    "--sport", strconv.Itoa(destPort),
    "-j", "CHECKSUM",
    "--checksum-fill",
    }
    if err := iptable.ProgramRule(Mangle, "POSTROUTING", action, args); err != nil {
    return err
    }
    }
  • Encrypted overlay networks:
    func programMangle(vni uint32, add bool) error {
    var (
    m = strconv.FormatUint(mark, 10)
    chain = "OUTPUT"
    rule = append(matchVXLAN(overlayutils.VXLANUDPPort(), vni), "-j", "MARK", "--set-mark", m)
    a = iptables.Append
    action = "install"
    )
    // TODO IPv6 support
    iptable := iptables.GetIptable(iptables.IPv4)
    if !add {
    a = iptables.Delete
    action = "remove"
    }
    if err := iptable.ProgramRule(iptables.Mangle, chain, a, rule); err != nil {
    return fmt.Errorf("could not %s mangle rule: %w", action, err)
    }
    return nil
    }
  • IPVS load-balancing:
    for _, iPort := range ingressPorts {
    var (
    protocol = strings.ToLower(PortConfig_Protocol_name[int32(iPort.Protocol)])
    publishedPort = strconv.FormatUint(uint64(iPort.PublishedPort), 10)
    )
    rule := []string{"-t", "mangle", addDelOpt, "PREROUTING", "-p", protocol, "--dport", publishedPort, "-j", "MARK", "--set-mark", fwMarkStr}
    rules = append(rules, rule)
    }

The latter two are only used with the overlay network driver, so I'm somewhat inclined to say that this should be scoped to the driver. On the other hand, we do rely on this for SCTP in certain situations, and other netfilter/xtables features are listed here.

I'll defer to @akerouanton and @corhere on whether this should be moved (or checked twice, once for overlay and one for SCTP, etc.) or not.

@neersighted
Copy link
Member

neersighted commented Oct 18, 2023

Oh, also, forgot:


Thank you for contributing! It appears your commit message is missing a DCO sign-off,
causing the DCO check to fail.

We require all commit messages to have a Signed-off-by line with your name
and e-mail (see "Sign your work" in the CONTRIBUTING.md in this repository), which looks something like:

Signed-off-by: YourFirsName YourLastName <yourname@example.org>

There is no need to open a new pull request, but to fix this (and make CI pass),
you need to amend the commit(s) in this pull request, and "force push" the amended
commit.

Unfortunately, it's not possible to do so through GitHub's web UI, so this needs
to be done through the git commandline.

You can find some instructions in the output of the DCO check (which can be found
in the "checks" tab on this pull request), as well as in the Moby contributing guide.

Steps to do so "roughly" come down to:

  1. Set your name and e-mail in git's configuration:

    git config --global user.name "YourFirstName YourLastName"
    git config --global user.email "yourname@example.org"

    (Make sure to use your real name (not your GitHub username/handle) and e-mail)

  2. Clone your fork locally

  3. Check out the branch associated with this pull request

  4. Sign-off and amend the existing commit(s)

    git commit --amend --no-edit --signoff

    If your pull request contains multiple commits, either squash the commits (if
    needed) or sign-off each individual commit.

  5. Force push your branch to GitHub (using the --force or --force-with-lease flags) to update the pull request.

Sorry for the hassle (I wish GitHub would make this a bit easier to do), and let me know if you need help or more detailed instructions!

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCTP is the transport protocol used in WebRTC so not much of an edge case these days. Just about anyone running a multiplayer browser game, realtime chat or other similar server in a container is going to run into this. I think it makes sense to consider SCTP packet egress a baseline feature, so unless we can work around the kernel issue without it, IP_NF_MANGLE would have to be generally required.

Add IP_NF_MANGLE to "Generally Required" kernel features, since it appears to be necessary for Docker Swarm to work.

Closes moby#46636

Signed-off-by: Stephan Henningsen <stephan-henningsen@users.noreply.github.com>
@stephan-henningsen
Copy link
Contributor Author

Oh, also, forgot:

Thank you for contributing! It appears your commit message is missing a DCO sign-off, causing the DCO check to fail.

Yes, I had a feeling I would fail here. I found out about the sign-off too late, panicked, and added it in the PR comment ;)

I believe it's properly signed off now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

iptables (legacy) doesn't work with swarm published ports; iptables (nf_tables) is required
3 participants