Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea EgressIP does not work if wireGuard is enabled #6190

Open
adolfomaltez opened this issue Apr 4, 2024 · 6 comments
Open

Antrea EgressIP does not work if wireGuard is enabled #6190

adolfomaltez opened this issue Apr 4, 2024 · 6 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@adolfomaltez
Copy link

Describe the bug

When an egressIP is created and applied to pods, it does not work correctly if wireGuard is enabled.
The egressIP works only for the pod that is on the node that takes the egressIP.
The rest of the pods on the nodes that do not have the egressIP lose connectivity to the outside of the cluster.
If wireGuard is disabled, egressIP works correctly for all pods.

To Reproduce
A kubernetes cluster is created with kind.

git clone https://github.com/antrea-io/antrea.git
git checkout release-1.15
docker pull projects.registry.vmware.com/antrea/antrea-ubuntu:v1.15.0
./ci/kind/kind-setup.sh --images projects.registry.vmware.com/antrea/antrea-ubuntu:v1.15.0 create cluster

antrea is installed
kubectl apply -f https://github.com/antrea-io/antrea/releases/download/v1.15.0/antrea.yml

egressIP is enabled (Egress: true)

kubectl edit cm antrea-config -n kube-system
kubectl rollout restart deployment/antrea-controller -n kube-system
kubectl rollout restart daemonset/antrea-agent -n kube-system

A deployment and egressIP are created.

kubectl create -f hello-world-egressIP.yaml

Test connection to an outside service (nginx on laptop).
All 3x pods show egressIP as their source IP.

wireGuard is enabled ( trafficEncryptionMode: "wireGuard" )

kubectl edit cm antrea-config -n kube-system
kubectl rollout restart deployment/antrea-controller -n kube-system
kubectl rollout restart daemonset/antrea-agent -n kube-system

Test connection to an outside service (nginx on laptop).
Only the pod on the node with the egressIP show egressIP as their source IP.
The other 2x pods fail to connect to the outside cluster (Operation timed out).

Expected
It is expected that the 3 pods of the deployment will use the egressIP as the source IP, to a service external to the cluster, even with wireGuard enable.

Actual behavior
Only the pod running on the node that takes the egressIP works correctly.
Pods on different nodes cannot access services outside the cluster.
If wireGuard is disabled, all pods work correctly with the egressIP.

Versions:

Please provide the following information:

  • Antrea version (Docker image tag): v1.15.0

  • Kubernetes version (use kubectl version). If your Kubernetes components have different versions, please provide the version for all of them.
    Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"archive", BuildDate:"2022-04-02T14:49:13Z", GoVersion:"go1.18", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-30T06:34:50Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

  • Container runtime: which runtime are you using (e.g. containerd, cri-o, docker) and which version are you using?
    Kind: kind version 0.18.0, Docker version 20.10.24+dfsg1, build 297e128

  • Linux kernel version on the Kubernetes Nodes (uname -r).
    root@cluster-worker:/# uname -r
    6.1.0-18-amd64

Additional context
hello-world-egressIP.yaml.txt

@adolfomaltez adolfomaltez added the kind/bug Categorizes issue or PR as related to a bug. label Apr 4, 2024
@antoninbas
Copy link
Contributor

Hi @adolfomaltez, and thanks for the detailed bug report.
I was able to confirm what you are observing, i.e. Egress cannot be used when enabling WireGuard.

This is an interesting case. First I want to point out that the way Egress traffic is sent to the "Egress Node" (the Node to which the Egress IP is currently assigned) is using a Geneve (by default) tunnel. Even when WireGuard is enabled, the Egress implementation tries to use the default non-encrypted Geneve tunnel. Starting with Antrea v1.15 however, we are no longer creating the default tunnel port when WireGuard is enabled. This was introduced in this PR: #5885. This is the first reason why Egress is not working with WireGuard. But even if Antrea is downgraded to v1.14.3 (which does not include this patch, and hence still creates the Geneve tunnel port when WireGuard is enabled), Egress is still not working. It is because Linux Reverse Parse Filtering (rp_filter) is dropping the traffic when it gets to the Egress Node (before SNAT). You can disable rp_filter on antrea-gw0, and this will let you reach out your external server, using the Egress IP as source IP as desired. However, at that point the return path is still broken. This is because return traffic will need to take the WireGuard tunnel from the Egress Node back to the source Node (where the client is). As you can see, this creates an asymmetry in the path, and we configure Wireguard to only allow Pod IPs, which is yet another blocker here.

If we want to support WireGuard with Egress, we will need to revert #5885, and make adjustments to the datapath so that return traffic can be forwarded correctly (not through the WireGuard tunnel). This could probably be achieved using a fwmark / ctmark and policy-based routing? cc @tnqn

Something that could be considered is whether WireGuard can be used to encrypt Egress traffic between the Egress Node and the source Node. The kind of source-based routing we need for Egress is probably not easy to achieve with WireGuard, and encryption is not really required IMO since Egress traffic is destined to exit the cluster.

@tnqn
Copy link
Member

tnqn commented Apr 7, 2024

If we want to support WireGuard with Egress, we will need to revert #5885, and make adjustments to the datapath so that return traffic can be forwarded correctly (not through the WireGuard tunnel). This could probably be achieved using a fwmark / ctmark and policy-based routing? cc @tnqn

Yes, the proposal should work. We could allocate one ctmark bit to represent it, set it after matching outgoing Egress traffic, and restoring it to fwmark for traffic coming from interfaces except antrea-gw0, then route it to antrea-gw0 via policy routing.

Something that could be considered is whether WireGuard can be used to encrypt Egress traffic between the Egress Node and the source Node. The kind of source-based routing we need for Egress is probably not easy to achieve with WireGuard, and encryption is not really required IMO since Egress traffic is destined to exit the cluster.

Agreed. And it has been documented that "Antrea can leverage WireGuard to encrypt Pod traffic between Nodes."

@antoninbas
Copy link
Contributor

@luolanzone maybe we could scope this for v2.1?

@tnqn tnqn added this to the Antrea v2.1 release milestone Apr 9, 2024
@tnqn
Copy link
Member

tnqn commented Apr 9, 2024

Created a milestone for v2.1 and added the issue to it.

@antoninbas
Copy link
Contributor

A quick note that we will need to pay attention to the MTU in that case, and account for both WireGuard and Geneve (we should apply the max MTU deduction of the 2).

@withlin
Copy link

withlin commented May 31, 2024

same issue for antrea v1.15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants