Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-flannel pod Failed to create SubnetManager: error retrieving pod spec dial tcp 10.0.0.1:443: i/o timeout #60161

Closed
vincentmli opened this issue Feb 21, 2018 · 22 comments
Assignees
Labels
area/ipvs area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@vincentmli
Copy link

vincentmli commented Feb 21, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

kube-flannel pod on worker node unable to reach to k8s api server through internal k8s cluster service 10.0.0.1:443. kube-flannel pod in master mode is able to reach to 10.0.0.1:443

# cluster/kubectl.sh logs kube-flannel-ds-ws9jf --namespace=kube-system
I0221 21:53:02.318108       1 main.go:488] Using interface with name eth1 and address 192.168.1.169
I0221 21:53:02.411666       1 main.go:505] Defaulting external address to interface address (192.168.1.169)
E0221 21:53:32.413701       1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-ws9jf': Get https://10.0.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-ws9jf: dial tcp 10.0.0.1:443: i/o timeout


What you expected to happen:
pod in worker node or worker node should be able to reach to cluster service 10.0.0.1:443

How to reproduce it (as minimally and precisely as possible):

follow steps from https://github.com/kubernetes/community/blob/master/contributors/devel/running-locally.md
instead of running a locally up all in one k8s cluster, I modified the hack/local-up-cluster.sh script to run one master node and one worker node

modified cluster-master.sh diff based on local-up-cluster.sh

 diff -u hack/local-up-cluster.sh hack/cluster-master.sh
--- hack/local-up-cluster.sh    2018-02-21 09:05:18.713000000 -0800
+++ hack/cluster-master.sh      2018-02-21 10:14:29.844000000 -0800
@@ -15,6 +15,7 @@
 # limitations under the License.

 KUBE_ROOT=$(dirname "${BASH_SOURCE}")/..
+K8S_MASTER="192.168.1.168"

 # This command builds and runs a local kubernetes cluster.
 # You may need to run this as root to allow kubelet to open docker's socket,
@@ -22,7 +23,7 @@
 DOCKER_OPTS=${DOCKER_OPTS:-""}
 DOCKER=(docker ${DOCKER_OPTS})
 DOCKERIZE_KUBELET=${DOCKERIZE_KUBELET:-""}
-ALLOW_PRIVILEGED=${ALLOW_PRIVILEGED:-""}
+ALLOW_PRIVILEGED=${ALLOW_PRIVILEGED:-"true"}
 DENY_SECURITY_CONTEXT_ADMISSION=${DENY_SECURITY_CONTEXT_ADMISSION:-""}
 PSP_ADMISSION=${PSP_ADMISSION:-""}
 NODE_ADMISSION=${NODE_ADMISSION:-""}
@@ -34,7 +35,7 @@
 # many dev environments run with swap on, so we don't fail in this env
 FAIL_SWAP_ON=${FAIL_SWAP_ON:-"false"}
 # Name of the network plugin, eg: "kubenet"
-NET_PLUGIN=${NET_PLUGIN:-""}
+NET_PLUGIN=${NET_PLUGIN:-"cni"}
 # Place the config files and binaries required by NET_PLUGIN in these directory,
 # eg: "/etc/cni/net.d" for config files, and "/opt/cni/bin" for binaries.
 CNI_CONF_DIR=${CNI_CONF_DIR:-""}
@@ -64,7 +65,7 @@
 KUBECTL=${KUBECTL:-cluster/kubectl.sh}
 WAIT_FOR_URL_API_SERVER=${WAIT_FOR_URL_API_SERVER:-60}
 ENABLE_DAEMON=${ENABLE_DAEMON:-false}
-HOSTNAME_OVERRIDE=${HOSTNAME_OVERRIDE:-"127.0.0.1"}
+HOSTNAME_OVERRIDE=${HOSTNAME_OVERRIDE:-"${K8S_MASTER}"}
 EXTERNAL_CLOUD_PROVIDER=${EXTERNAL_CLOUD_PROVIDER:-false}
 EXTERNAL_CLOUD_PROVIDER_BINARY=${EXTERNAL_CLOUD_PROVIDER_BINARY:-""}
 CLOUD_PROVIDER=${CLOUD_PROVIDER:-""}
@@ -227,13 +228,13 @@
 API_SECURE_PORT=${API_SECURE_PORT:-6443}

 # WARNING: For DNS to work on most setups you should export API_HOST as the docker0 ip address,
-API_HOST=${API_HOST:-localhost}
-API_HOST_IP=${API_HOST_IP:-"127.0.0.1"}
+API_HOST=${API_HOST:-"${K8S_MASTER}"}
+API_HOST_IP=${API_HOST_IP:-"${K8S_MASTER}"}
 ADVERTISE_ADDRESS=${ADVERTISE_ADDRESS:-""}
 API_BIND_ADDR=${API_BIND_ADDR:-"0.0.0.0"}
 EXTERNAL_HOSTNAME=${EXTERNAL_HOSTNAME:-localhost}

-KUBELET_HOST=${KUBELET_HOST:-"127.0.0.1"}
+KUBELET_HOST=${KUBELET_HOST:-"${K8S_MASTER}"}
 # By default only allow CORS for requests on localhost
 API_CORS_ALLOWED_ORIGINS=${API_CORS_ALLOWED_ORIGINS:-/127.0.0.1(:[0-9]+)?$,/localhost(:[0-9]+)?$}
 KUBELET_PORT=${KUBELET_PORT:-10250}
@@ -635,6 +636,8 @@
       --kubeconfig "$CERT_DIR"/controller.kubeconfig \
       --use-service-account-credentials \
       --controllers="${KUBE_CONTROLLERS}" \
+      --allocate-node-cidrs=true \
+      --cluster-cidr="10.244.0.0/16" \
       --master="https://${API_HOST}:${API_SECURE_PORT}" >"${CTLRMGR_LOG}" 2>&1 &
     CTLRMGR_PID=$!
 }

modified cluster-worker.sh diff based on local-up-cluster.sh

diff -u hack/local-up-cluster.sh hack/cluster-worker.sh
--- hack/local-up-cluster.sh    2018-02-21 09:06:38.561000000 -0800
+++ hack/cluster-worker.sh      2018-02-21 12:29:42.691000000 -0800
@@ -15,6 +15,8 @@
 # limitations under the License.

 KUBE_ROOT=$(dirname "${BASH_SOURCE}")/..
+K8S_MASTER="192.168.1.168"
+K8S_WORKER="192.168.1.169"

 # This command builds and runs a local kubernetes cluster.
 # You may need to run this as root to allow kubelet to open docker's socket,
@@ -22,7 +24,7 @@
 DOCKER_OPTS=${DOCKER_OPTS:-""}
 DOCKER=(docker ${DOCKER_OPTS})
 DOCKERIZE_KUBELET=${DOCKERIZE_KUBELET:-""}
-ALLOW_PRIVILEGED=${ALLOW_PRIVILEGED:-""}
+ALLOW_PRIVILEGED=${ALLOW_PRIVILEGED:-"true"}
 DENY_SECURITY_CONTEXT_ADMISSION=${DENY_SECURITY_CONTEXT_ADMISSION:-""}
 PSP_ADMISSION=${PSP_ADMISSION:-""}
 NODE_ADMISSION=${NODE_ADMISSION:-""}
@@ -34,7 +36,7 @@
 # many dev environments run with swap on, so we don't fail in this env
 FAIL_SWAP_ON=${FAIL_SWAP_ON:-"false"}
 # Name of the network plugin, eg: "kubenet"
-NET_PLUGIN=${NET_PLUGIN:-""}
+NET_PLUGIN=${NET_PLUGIN:-"cni"}
 # Place the config files and binaries required by NET_PLUGIN in these directory,
 # eg: "/etc/cni/net.d" for config files, and "/opt/cni/bin" for binaries.
 CNI_CONF_DIR=${CNI_CONF_DIR:-""}
@@ -64,7 +66,7 @@
 KUBECTL=${KUBECTL:-cluster/kubectl.sh}
 WAIT_FOR_URL_API_SERVER=${WAIT_FOR_URL_API_SERVER:-60}
 ENABLE_DAEMON=${ENABLE_DAEMON:-false}
-HOSTNAME_OVERRIDE=${HOSTNAME_OVERRIDE:-"127.0.0.1"}
+HOSTNAME_OVERRIDE=${HOSTNAME_OVERRIDE:-"${K8S_WORKER}"}
 EXTERNAL_CLOUD_PROVIDER=${EXTERNAL_CLOUD_PROVIDER:-false}
 EXTERNAL_CLOUD_PROVIDER_BINARY=${EXTERNAL_CLOUD_PROVIDER_BINARY:-""}
 CLOUD_PROVIDER=${CLOUD_PROVIDER:-""}
@@ -88,7 +90,7 @@
 AUTH_ARGS=${AUTH_ARGS:-""}

 # Install a default storage class (enabled by default)
-DEFAULT_STORAGE_CLASS=${KUBE_DEFAULT_STORAGE_CLASS:-true}
+DEFAULT_STORAGE_CLASS=${KUBE_DEFAULT_STORAGE_CLASS:-false}

 # start the cache mutation detector by default so that cache mutators will be found
 KUBE_CACHE_MUTATION_DETECTOR="${KUBE_CACHE_MUTATION_DETECTOR:-true}"
@@ -227,13 +229,13 @@
 API_SECURE_PORT=${API_SECURE_PORT:-6443}

 # WARNING: For DNS to work on most setups you should export API_HOST as the docker0 ip address,
-API_HOST=${API_HOST:-localhost}
-API_HOST_IP=${API_HOST_IP:-"127.0.0.1"}
+API_HOST=${API_HOST:-"${K8S_MASTER}"}
+API_HOST_IP=${API_HOST_IP:-"${K8S_MASTER}"}
 ADVERTISE_ADDRESS=${ADVERTISE_ADDRESS:-""}
 API_BIND_ADDR=${API_BIND_ADDR:-"0.0.0.0"}
 EXTERNAL_HOSTNAME=${EXTERNAL_HOSTNAME:-localhost}

-KUBELET_HOST=${KUBELET_HOST:-"127.0.0.1"}
+KUBELET_HOST=${KUBELET_HOST:-"${K8S_WORKER}"}
 # By default only allow CORS for requests on localhost
 API_CORS_ALLOWED_ORIGINS=${API_CORS_ALLOWED_ORIGINS:-/127.0.0.1(:[0-9]+)?$,/localhost(:[0-9]+)?$}
 KUBELET_PORT=${KUBELET_PORT:-10250}
@@ -741,7 +743,7 @@
         --hostname-override="${HOSTNAME_OVERRIDE}" \
         ${cloud_config_arg} \
         --address="${KUBELET_HOST}" \
-        --kubeconfig "$CERT_DIR"/kubelet.kubeconfig \
+        --kubeconfig "$CERT_DIR"/kubelet-"${K8S_WORKER}".kubeconfig \
         --feature-gates="${FEATURE_GATES}" \
         --cpu-cfs-quota=${CPU_CFS_QUOTA} \
         --enable-controller-attach-detach="${ENABLE_CONTROLLER_ATTACH_DETACH}" \
@@ -815,7 +817,7 @@
 apiVersion: kubeproxy.config.k8s.io/v1alpha1
 kind: KubeProxyConfiguration
 clientConnection:
-  kubeconfig: ${CERT_DIR}/kube-proxy.kubeconfig
+  kubeconfig: ${CERT_DIR}/kube-proxy-${K8S_WORKER}.kubeconfig
 hostnameOverride: ${HOSTNAME_OVERRIDE}
 featureGates: ${FEATURE_GATES}
 mode: ${KUBE_PROXY_MODE}
@@ -827,13 +829,13 @@
       --v=${LOG_LEVEL} 2>&1 &
     PROXY_PID=$!

-    SCHEDULER_LOG=${LOG_DIR}/kube-scheduler.log
-    ${CONTROLPLANE_SUDO} "${GO_OUT}/hyperkube" scheduler \
-      --v=${LOG_LEVEL} \
-      --kubeconfig "$CERT_DIR"/scheduler.kubeconfig \
-      --feature-gates="${FEATURE_GATES}" \
-      --master="https://${API_HOST}:${API_SECURE_PORT}" >"${SCHEDULER_LOG}" 2>&1 &
-    SCHEDULER_PID=$!
+#    SCHEDULER_LOG=${LOG_DIR}/kube-scheduler.log
+#    ${CONTROLPLANE_SUDO} "${GO_OUT}/hyperkube" scheduler \
+#      --v=${LOG_LEVEL} \
+#      --kubeconfig "$CERT_DIR"/scheduler.kubeconfig \
+#      --feature-gates="${FEATURE_GATES}" \
+#      --master="https://${API_HOST}:${API_SECURE_PORT}" >"${SCHEDULER_LOG}" 2>&1 &
+#    SCHEDULER_PID=$!
 }

 function start_kubedns {
@@ -1002,6 +1004,7 @@
         ;;
       Linux)
         start_kubelet
+        start_kubeproxy
         ;;
       *)
         warning "Unsupported host OS.  Must be Linux or Mac OS X, kubelet aborted."

create script hack/kubeconfig.sh to create kubelet/kube-proxy kubeconfig for worker node

#!/bin/bash


KUBE_ROOT=$(dirname "${BASH_SOURCE}")/..
K8S_MASTER="192.168.1.168"
K8S_WORKER="192.168.1.169"

KUBECTL=${KUBECTL:-cluster/kubectl.sh}

source "${KUBE_ROOT}/hack/lib/init.sh"
kube::util::ensure-cfssl

API_PORT=${API_PORT:-8080}
API_SECURE_PORT=${API_SECURE_PORT:-6443}

API_HOST=${API_HOST:-"${K8S_MASTER}"}
API_HOST_IP=${API_HOST_IP:-"${K8S_MASTER}"}

# This is the default dir and filename where the apiserver will generate a self-signed cert
# which should be able to be used as the CA to verify itself
CERT_DIR=${CERT_DIR:-"/var/run/kubernetes"}
ROOT_CA_FILE=${CERT_DIR}/server-ca.crt
ROOT_CA_KEY=${CERT_DIR}/server-ca.key
CLUSTER_SIGNING_CERT_FILE=${CLUSTER_SIGNING_CERT_FILE:-"${ROOT_CA_FILE}"}
CLUSTER_SIGNING_KEY_FILE=${CLUSTER_SIGNING_KEY_FILE:-"${ROOT_CA_KEY}"}


# Ensure CERT_DIR is created for auto-generated crt/key and kubeconfig
mkdir -p "${CERT_DIR}" &>/dev/null || sudo mkdir -p "${CERT_DIR}"
CONTROLPLANE_SUDO=$(test -w "${CERT_DIR}" || echo "sudo -E")


for instance in ${K8S_WORKER}; do
# Create client certs signed with client-ca, given id, given CN and a number of groups
kube::util::create_client_certkey "${CONTROLPLANE_SUDO}" "${CERT_DIR}" 'client-ca' kubelet-${instance} system:node:${instance} system:nodes
kube::util::create_client_certkey "${CONTROLPLANE_SUDO}" "${CERT_DIR}" 'client-ca' kube-proxy-${instance} system:kube-proxy system:nodes

done

for instance in ${K8S_WORKER}; do
  ${KUBECTL} config set-cluster local-up-cluster \
    --certificate-authority="${CERT_DIR}"/server-ca.crt \
    --embed-certs=true \
    --server=https://${K8S_MASTER}:6443 \
    --kubeconfig="${CERT_DIR}"/kubelet-${instance}.kubeconfig

  ${KUBECTL} config set-credentials system:node:${instance} \
    --client-certificate="${CERT_DIR}"/client-kubelet-${instance}.crt \
    --client-key="${CERT_DIR}"/client-kubelet-${instance}.key \
    --embed-certs=true \
    --kubeconfig="${CERT_DIR}"/kubelet-${instance}.kubeconfig

  ${KUBECTL} config set-context local-up-cluster \
    --cluster=local-up-cluster \
    --user=system:node:${instance} \
    --kubeconfig="${CERT_DIR}"/kubelet-${instance}.kubeconfig

  ${KUBECTL} config use-context local-up-cluster --kubeconfig="${CERT_DIR}"/kubelet-${instance}.kubeconfig
done


for instance in ${K8S_WORKER}; do
  ${KUBECTL} config set-cluster local-up-cluster \
    --certificate-authority="${CERT_DIR}"/server-ca.crt \
    --embed-certs=true \
    --server=https://${K8S_MASTER}:6443 \
    --kubeconfig="${CERT_DIR}"/kube-proxy-${instance}.kubeconfig

  ${KUBECTL} config set-credentials system:node:${instance} \
    --client-certificate="${CERT_DIR}"/client-kube-proxy-${instance}.crt \
    --client-key="${CERT_DIR}"/client-kube-proxy-${instance}.key \
    --embed-certs=true \
    --kubeconfig="${CERT_DIR}"/kube-proxy-${instance}.kubeconfig

  ${KUBECTL} config set-context local-up-cluster \
    --cluster=local-up-cluster \
    --user=system:node:${instance} \
    --kubeconfig="${CERT_DIR}"/kube-proxy-${instance}.kubeconfig

  ${KUBECTL} config use-context local-up-cluster --kubeconfig="${CERT_DIR}"/kube-proxy-${instance}.kubeconfig
done

run k8s cluster master as

# ENABLE_DAEMON=true KUBE_PROXY_MODE=ipvs ETCD_HOST=0.0.0.0 hack/cluster-master.sh -O
WARNING : The kubelet is configured to not fail if swap is enabled; production deployments should disable swap.
skipped the build.
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Kubelet cgroup driver defaulted to use: systemd
API SERVER insecure port is free, proceeding...
API SERVER secure port is free, proceeding...
Detected host and ready to start services.  Doing some housekeeping first...
Using GO_OUT /home/kubernetes/_output/bin
Starting services now!
Starting etcd
etcd --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/tmp.oZSKe90Cbr --listen-client-urls http://0.0.0.0:2379 --debug > "/dev/null" 2>/dev/null
Waiting for etcd to come up.
+++ [0221 13:21:16] On try 1, etcd: : http://0.0.0.0:2379
{"action":"set","node":{"key":"/_test","value":"","modifiedIndex":4,"createdIndex":4}}
Generating a 2048 bit RSA private key
..............................................................+++
..+++
writing new private key to '/var/run/kubernetes/server-ca.key'
-----
Generating a 2048 bit RSA private key
....................................................................+++
.............................................+++
writing new private key to '/var/run/kubernetes/client-ca.key'
-----
Generating a 2048 bit RSA private key
.+++
...........CUTTED...............
Waiting for apiserver to come up
+++ [0221 13:21:32] On try 8, apiserver: : ok
Cluster "local-up-cluster" set.
use 'kubectl --kubeconfig=/var/run/kubernetes/admin-kube-aggregator.kubeconfig' to use the aggregated API server
services "kube-dns" created
serviceaccounts "kube-dns" created
configmaps "kube-dns" created
deployments "kube-dns" created
Kube-dns addon successfully deployed.
kubelet ( 24991 ) is running.
Create default storage class for
storageclasses "standard" created
Local Kubernetes cluster is running.

Logs:
  /tmp/kube-apiserver.log
  /tmp/kube-controller-manager.log

  /tmp/kube-proxy.log
  /tmp/kube-scheduler.log
  /tmp/kubelet.log

To start using your cluster, run:

  export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
  cluster/kubectl.sh

Alternatively, you can write to the default kubeconfig:

  export KUBERNETES_PROVIDER=local

  cluster/kubectl.sh config set-cluster local --server=https://192.168.1.168:6443 --certificate-authority=/var/run/kubernetes/server-ca.crt
  cluster/kubectl.sh config set-credentials myself --client-key=/var/run/kubernetes/client-admin.key --client-certificate=/var/run/kubernetes/client-admin.crt
  cluster/kubectl.sh config set-context local --cluster=local --user=myself
  cluster/kubectl.sh config use-context local
  cluster/kubectl.sh

create kubelet/kube-proxy kubeconfig for worker node and copy to worker node

[root@centos-k8s kubernetes]# hack/kubeconfig.sh
..........................CUTTED...............
Cluster "local-up-cluster" set.
User "system:node:192.168.1.169" set.
Context "local-up-cluster" modified.
Switched to context "local-up-cluster".
Cluster "local-up-cluster" set.
User "system:node:192.168.1.169" set.
Context "local-up-cluster" modified.
Switched to context "local-up-cluster".

[root@centos-k8s kubernetes]#  scp /var/run/kubernetes/*192*.kubeconfig 10.3.72.169:/var/run/kubernetes/
root@10.3.72.169's password:
kubelet-192.168.1.169.kubeconfig                                                                     100% 6145     6.0MB/s   00:00
kube-proxy-192.168.1.169.kubeconfig                                                                  100% 6125     7.4MB/s   00:00

run k8s cluster worker node as

# KUBE_PROXY_MODE=ipvs START_MODE=kubeletonly  hack/cluster-worker.sh -O
WARNING : The kubelet is configured to not fail if swap is enabled; production deployments should disable swap.
skipped the build.
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Kubelet cgroup driver defaulted to use: systemd
Detected host and ready to start services.  Doing some housekeeping first...
Using GO_OUT /home/kubernetes/_output/bin
Starting services now!
kubelet ( 3793 ) is running.
The kubelet was started.

Logs:
  /tmp/kubelet.log

master and worker node looks ok

# cluster/kubectl.sh get no -o wide
NAME            STATUS    ROLES     AGE       VERSION                                    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
192.168.1.168   Ready         4m        v1.11.0-alpha.0.255+d1cb55c8a7928e-dirty           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://1.12.6
192.168.1.169   Ready         40s       v1.11.0-alpha.0.255+d1cb55c8a7928e-dirty           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://1.12.6

now deploy kube-flannel.yaml

# cluster/kubectl.sh apply -f kube-flannel.yaml --namespace=kube-system
clusterroles "flannel" created
clusterrolebindings "flannel" created
serviceaccounts "flannel" created
configmaps "kube-flannel-cfg" created
daemonsets "kube-flannel-ds" created

check kube-flannel pod on worker node

# cluster/kubectl.sh get po -o wide --namespace=kube-system
NAME                        READY     STATUS             RESTARTS   AGE       IP              NODE
kube-dns-6844cfbdfb-7lxkf   3/3       Running            0          7m        10.244.0.23     192.168.1.168
kube-flannel-ds-44pz8       1/1       Running            0          2m        192.168.1.168   192.168.1.168
kube-flannel-ds-ws9jf       0/1       CrashLoopBackOff   2          2m        192.168.1.169   192.168.1.169

# cluster/kubectl.sh logs kube-flannel-ds-ws9jf --namespace=kube-system
I0221 21:27:35.615094       1 main.go:488] Using interface with name eth1 and address 192.168.1.169
I0221 21:27:35.615424       1 main.go:505] Defaulting external address to interface address (192.168.1.169)
E0221 21:28:05.617316       1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-ws9jf': Get https://10.0.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-ws9jf: dial tcp 10.0.0.1:443: i/o timeout

Anything else we need to know?:

I have the idea of pulling most recent k8s from kubernetes github upstream and test k8s in multi node environment with F5 networks https://github.com/F5Networks/k8s-bigip-ctlr. so I would always have the most recent k8s code with most recent k8s-bigip-ctlr test environment.

I don't think kube-flannel is the problem here, it is just an example that pod in worker node can't connect to the internal k8s api service 10.0.0.1:443, even running curl in the worker node can't connect to 10.0.0.1:443

in my test I used kube proxy mode ipvs, but I have same problem with iptables as kube proxy mode

I may have missed some configuration and this may not be bug , but I appreciate any guidance on this problem

cluster master network info

# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.1:443 rr persistent 10800
  -> 192.168.1.168:6443           Masq    1      2          0
TCP  10.0.0.10:53 rr
  -> 10.244.0.23:53               Masq    1      0          0
UDP  10.0.0.10:53 rr
  -> 10.244.0.23:53               Masq    1      0          0

# iptables -t nat -n -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0
RETURN     all  --  10.244.0.0/16        10.244.0.0/16
MASQUERADE  all  --  10.244.0.0/16       !224.0.0.0/4
RETURN     all  -- !10.244.0.0/16        10.244.0.0/24
MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16

Chain DOCKER (2 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-FIRE-WALL (0 references)
target     prot opt source               destination

Chain KUBE-MARK-DROP (0 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (0 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination

# iptables -n -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER-ISOLATION  all  --  0.0.0.0/0            0.0.0.0/0
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  10.244.0.0/16        0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            10.244.0.0/16

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0

Chain DOCKER (1 references)
target     prot opt source               destination

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination
DROP       all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000


# ip route show
default via 10.3.254.254 dev eth0 proto static metric 100
10.3.0.0/16 dev eth0 proto kernel scope link src 10.3.72.168 metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.168 metric 100

# ip addr show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:a0:ea:83 brd ff:ff:ff:ff:ff:ff
    inet 10.3.72.168/16 brd 10.3.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::d60f:453a:a188:23fd/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::1f30:56da:a20c:6574/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1400 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:cf:91:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.168/24 brd 192.168.1.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::498:5906:87ca:c493/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::fdfd:fb1f:204f:e03f/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
4: docker0:  mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:ba:8b:31:0f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: dummy0:  mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 9a:ea:3f:8a:39:81 brd ff:ff:ff:ff:ff:ff
6: kube-ipvs0:  mtu 1500 qdisc noop state DOWN
    link/ether 82:b6:9d:b8:fe:16 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/32 brd 10.0.0.1 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.0.0.10/32 brd 10.0.0.10 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
7: flannel.1:  mtu 1350 qdisc noqueue state UNKNOWN
    link/ether 0e:b3:53:6b:71:39 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::cb3:53ff:fe6b:7139/64 scope link
       valid_lft forever preferred_lft forever
8: cni0:  mtu 1350 qdisc noqueue state UP qlen 1000
    link/ether de:9f:7b:70:3c:56 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::dc9f:7bff:fe70:3c56/64 scope link
       valid_lft forever preferred_lft forever
10: veth8006881b@if3:  mtu 1350 qdisc noqueue master cni0 state UP
    link/ether 22:fb:13:24:c5:f9 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::20fb:13ff:fe24:c5f9/64 scope link
       valid_lft forever preferred_lft forever


cluster worker node network info

# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.1:443 rr persistent 10800
  -> 192.168.1.168:6443           Masq    1      0          0
TCP  10.0.0.10:53 rr
  -> 10.244.0.23:53               Masq    1      0          0
UDP  10.0.0.10:53 rr
  -> 10.244.0.23:53               Masq    1      0          0

# iptables -t nat -n -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0

Chain DOCKER (2 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-FIRE-WALL (0 references)
target     prot opt source               destination

Chain KUBE-MARK-DROP (0 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (0 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination

# iptables  -n -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
DOCKER-ISOLATION  all  --  0.0.0.0/0            0.0.0.0/0
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0

Chain DOCKER (1 references)
target     prot opt source               destination

Chain DOCKER-ISOLATION (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination
DROP       all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000


# ip route show
default via 10.3.254.254 dev eth0 proto static metric 100
10.3.0.0/16 dev eth0 proto kernel scope link src 10.3.72.169 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.169 metric 100

# ip addr show

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:57:e9:e3 brd ff:ff:ff:ff:ff:ff
    inet 10.3.72.169/16 brd 10.3.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::1f30:56da:a20c:6574/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1400 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:3b:ae:e4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.169/24 brd 192.168.1.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::fdfd:fb1f:204f:e03f/64 scope link
       valid_lft forever preferred_lft forever
4: docker0:  mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:5c:fc:a4:c1 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
5: dummy0:  mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 2e:1e:d6:90:7b:90 brd ff:ff:ff:ff:ff:ff
6: kube-ipvs0:  mtu 1500 qdisc noop state DOWN
    link/ether e6:01:f7:76:2a:c2 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/32 brd 10.0.0.1 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.0.0.10/32 brd 10.0.0.10 scope global kube-ipvs0
       valid_lft forever preferred_lft forever

Environment:

  • Kubernetes version (use kubectl version):

master node and worker node runs same k8s version

# cluster/kubectl.sh version
Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0-alpha.0.255+d1cb55c8a7928e-dirty", GitCommit:"d1cb55c8a7928e9dc733bf3d514cc8bc274b124b", GitTreeState:"dirty", BuildDate:"2018-02-21T17:06:41Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0-alpha.0.255+d1cb55c8a7928e-dirty", GitCommit:"d1cb55c8a7928e9dc733bf3d514cc8bc274b124b", GitTreeState:"dirty", BuildDate:"2018-02-21T17:06:41Z", GoVersion:"go1.9.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Kernel (e.g. uname -a):
# uname -a
Linux centos-k8s 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Feb 21, 2018
@vincentmli
Copy link
Author

/network

@vincentmli
Copy link
Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 21, 2018
@vincentmli
Copy link
Author

here is the kube-flannel.yaml, I added --iface=eth1 that is for the node network 192.168.1.x

# cat kube-flannel.yaml
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: quay.io/coreos/flannel:v0.10.0-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.10.0-amd64
        command:
        - /opt/bin/flanneld
        args:
        - --iface=eth1
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

@vincentmli
Copy link
Author

both my master and worker node /tmp/kube-proxy.log has similar log:

W0221 16:00:32.507339    2384 server.go:589] Failed to retrieve node info: nodes "192.168.1.169" not found
W0221 16:00:32.507492    2384 proxier.go:283] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0221 16:00:32.507565    2384 proxier.go:288] clusterCIDR not specified, unable to distinguish between internal and external traffic

again, but only master node works and able to reach 10.0.0.1:443, worker node can't reach to 10.0.0.1:443

@vincentmli
Copy link
Author

/sig kube-proxy

@vincentmli
Copy link
Author

/area kube-proxy

@vincentmli
Copy link
Author

/area ipvs

@vincentmli
Copy link
Author

I think I may run into comment mentioned in
#44063 (comment)
there is no SNAT in ipvs from the iptables tracing output:

the source is 10.0.0.1 and dst is 10.0.0.1 initially, then DNAT to 192.168.1.168, but no SNAT to 192.168.1.169.

if I use kube-proxy iptables mode, it works now since there is no 10.0.0.1 on dummy interface and I could setup route like below on worker node

ip route add 10.0.0.1/32 dev eth1 proto kernel scope link src 192.168.1.169 metric 100

iptables TRACE log

Feb 28 10:52:20 centos-k8s-node1 kernel: TRACE: raw:OUTPUT:policy:2 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=46599 DF PROTO=TCP SPT=40300 DPT=443 SEQ=43416768 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A000E05800000000001030307) UID=0 GID=0
Feb 28 10:52:20 centos-k8s-node1 kernel: TRACE: raw:OUTPUT:policy:2 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=46599 DF PROTO=TCP SPT=40300 DPT=6443 SEQ=43416768 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A000E05800000000001030307) UID=0 GID=0
Feb 28 10:52:20 centos-k8s-node1 kernel: TRACE: filter:OUTPUT:rule:1 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=46599 DF PROTO=TCP SPT=40300 DPT=6443 SEQ=43416768 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A000E05800000000001030307) UID=0 GID=0
Feb 28 10:52:20 centos-k8s-node1 kernel: TRACE: filter:KUBE-FIREWALL:return:2 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=46599 DF PROTO=TCP SPT=40300 DPT=6443 SEQ=43416768 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A000E05800000000001030307) UID=0 GID=0
Feb 28 10:52:20 centos-k8s-node1 kernel: TRACE: filter:OUTPUT:policy:2 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=46599 DF PROTO=TCP SPT=40300 DPT=6443 SEQ=43416768 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A000E05800000000001030307) UID=0 GID=0

@vincentmli
Copy link
Author

I am able to workaround IPVS SNAT issue by using xt_ipvs http://archive.linuxvirtualserver.org/html/lvs-devel/2010-07/msg00033.html

iptables -t nat -A POSTROUTING -m ipvs --vaddr 10.0.0.1/32 --vport 443 -j SNAT --to-source 192.168.1.169

@vincentmli
Copy link
Author

I removed the workaround of using xt_ipvs to add a specific SNAT for cluster services/ipvs rule 10.0.0.1:443 and try to understand why the iptables rule below setup by ipvs unable to do the SNAT

Chain KUBE-MARK-MASQ (0 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination         
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

here is the iptables TRACE log

Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: raw:OUTPUT:policy:2 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: mangle:OUTPUT:policy:1 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:OUTPUT:rule:1 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:KUBE-SERVICES:return:1 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:OUTPUT:rule:2 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:DOCKER:return:2 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:OUTPUT:policy:3 IN= OUT=lo SRC=10.0.0.1 DST=10.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: raw:OUTPUT:policy:2 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: mangle:OUTPUT:policy:1 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: filter:OUTPUT:rule:1 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: filter:KUBE-FIREWALL:return:2 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 
Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: filter:OUTPUT:policy:2 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 

Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: mangle:POSTROUTING:policy:1 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 

Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:POSTROUTING:rule:1 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 

Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:KUBE-POSTROUTING:return:3 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 

Mar  1 16:13:21 centos-k8s-node1 kernel: TRACE: nat:POSTROUTING:policy:7 IN= OUT=eth1 SRC=10.0.0.1 DST=192.168.1.168 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3347 DF PROTO=TCP SPT=44252 DPT=6443 SEQ=2290450769 ACK=0 WINDOW=43690 RES=0x00 SYN URGP=0 OPT (0204FFD70402080A05DC8C420000000001030307) UID=0 GID=0 

it looks to me the SYN packet to destination 10.0.0.1:443 did go through nat table KUBE-POSTROUTING and POSTROUTING chain but did not hit the rule:

MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

I am wondering where the mark 0x4000 by is set by the kube-proxy ipvs/iptables and how could SYN packet created by the kube-flannel pod on worker node to be set by the mark 0x4000 so the source IP can be SNAT to worker node ip 192.168.1.169 ?

@vincentmli
Copy link
Author

I found another workaround which is to add KUBE-MARK-MASQ chain in nat table POSTROUTING chain

iptables -t nat -I POSTROUTING -j KUBE-MARK-MASQ

the nat POSTROUTING chain ended up with:

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0
RETURN     all  --  10.244.0.0/16        10.244.0.0/16
MASQUERADE  all  --  10.244.0.0/16       !224.0.0.0/4
RETURN     all  -- !10.244.0.0/16        10.244.1.0/24
MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16

KUBE-MARK-MASQ

Chain KUBE-MARK-MASQ (1 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

the reason I added nat KUBE-MARK-MASQ in nat POSTROUTING because I found KUBE-MARK-MASQ is not referenced by any chain, there will be no packet hit KUBE-MARK-MASQ and no SNAT, am I miss anything? is this the right solution?

@vincentmli
Copy link
Author

I have been playing with kube-proxy flag --masquerade-all and --cluster-cidr to see if the KUBE-MARK-MASQ can be added in some of the chains, it has no effect at all

        fs.BoolVar(&o.config.IPTables.MasqueradeAll, "masquerade-all", o.config.IPTables.MasqueradeAll, "If using the pure iptables proxy, SNAT all traffic sent via Service cluster IPs (this not commonly needed)")
        fs.StringVar(&o.config.ClusterCIDR, "cluster-cidr", o.config.ClusterCIDR, "The CIDR range of pods in the cluster. When configured, traffic sent to a Service cluster IP from outside this range will be masqueraded and traffic sent from pods to an external LoadBalancer IP will be directed to the respective cluster IP instead")

the kube-proxy cluster-cidr is bit of confusing, why it is CIDR range of pods in the cluster, not the services cluster ip range that usually default to 10.0.0.0/24 ?

@vincentmli
Copy link
Author

add masquerade-all and cluster-cidr in config yaml file for kube-proxy like below in hack/cluster-worker.sh resolved the problem, it appears specify the flag argument --masquerade-all and --cluster-cidr for kube-proxy does not pass the argument value to ipvs proxier, that is different problem though. anyway, I think masquerade-all and cluster-cidr is what I needed, not sure if this is the best approach

diff -u hack/local-up-cluster.sh hack/cluster-worker.sh 
--- hack/local-up-cluster.sh	2018-02-21 09:06:38.561000000 -0800
+++ hack/cluster-worker.sh	2018-03-04 13:46:36.726000000 -0800
@@ -15,6 +15,8 @@
 # limitations under the License.
 
 KUBE_ROOT=$(dirname "${BASH_SOURCE}")/..
+K8S_MASTER="192.168.1.168"
+K8S_WORKER="192.168.1.169"
 
 # This command builds and runs a local kubernetes cluster.
 # You may need to run this as root to allow kubelet to open docker's socket,
@@ -22,7 +24,7 @@
 DOCKER_OPTS=${DOCKER_OPTS:-""}
 DOCKER=(docker ${DOCKER_OPTS})
 DOCKERIZE_KUBELET=${DOCKERIZE_KUBELET:-""}
-ALLOW_PRIVILEGED=${ALLOW_PRIVILEGED:-""}
+ALLOW_PRIVILEGED=${ALLOW_PRIVILEGED:-"true"}
 DENY_SECURITY_CONTEXT_ADMISSION=${DENY_SECURITY_CONTEXT_ADMISSION:-""}
 PSP_ADMISSION=${PSP_ADMISSION:-""}
 NODE_ADMISSION=${NODE_ADMISSION:-""}
@@ -34,7 +36,7 @@
 # many dev environments run with swap on, so we don't fail in this env
 FAIL_SWAP_ON=${FAIL_SWAP_ON:-"false"}
 # Name of the network plugin, eg: "kubenet"
-NET_PLUGIN=${NET_PLUGIN:-""}
+NET_PLUGIN=${NET_PLUGIN:-"cni"}
 # Place the config files and binaries required by NET_PLUGIN in these directory,
 # eg: "/etc/cni/net.d" for config files, and "/opt/cni/bin" for binaries.
 CNI_CONF_DIR=${CNI_CONF_DIR:-""}
@@ -64,7 +66,7 @@
 KUBECTL=${KUBECTL:-cluster/kubectl.sh}
 WAIT_FOR_URL_API_SERVER=${WAIT_FOR_URL_API_SERVER:-60}
 ENABLE_DAEMON=${ENABLE_DAEMON:-false}
-HOSTNAME_OVERRIDE=${HOSTNAME_OVERRIDE:-"127.0.0.1"}
+HOSTNAME_OVERRIDE=${HOSTNAME_OVERRIDE:-"${K8S_WORKER}"}
 EXTERNAL_CLOUD_PROVIDER=${EXTERNAL_CLOUD_PROVIDER:-false}
 EXTERNAL_CLOUD_PROVIDER_BINARY=${EXTERNAL_CLOUD_PROVIDER_BINARY:-""}
 CLOUD_PROVIDER=${CLOUD_PROVIDER:-""}
@@ -88,7 +90,7 @@
 AUTH_ARGS=${AUTH_ARGS:-""}
 
 # Install a default storage class (enabled by default)
-DEFAULT_STORAGE_CLASS=${KUBE_DEFAULT_STORAGE_CLASS:-true}
+DEFAULT_STORAGE_CLASS=${KUBE_DEFAULT_STORAGE_CLASS:-false}
 
 # start the cache mutation detector by default so that cache mutators will be found
 KUBE_CACHE_MUTATION_DETECTOR="${KUBE_CACHE_MUTATION_DETECTOR:-true}"
@@ -227,13 +229,13 @@
 API_SECURE_PORT=${API_SECURE_PORT:-6443}
 
 # WARNING: For DNS to work on most setups you should export API_HOST as the docker0 ip address,
-API_HOST=${API_HOST:-localhost}
-API_HOST_IP=${API_HOST_IP:-"127.0.0.1"}
+API_HOST=${API_HOST:-"${K8S_MASTER}"}
+API_HOST_IP=${API_HOST_IP:-"${K8S_MASTER}"}
 ADVERTISE_ADDRESS=${ADVERTISE_ADDRESS:-""}
 API_BIND_ADDR=${API_BIND_ADDR:-"0.0.0.0"}
 EXTERNAL_HOSTNAME=${EXTERNAL_HOSTNAME:-localhost}
 
-KUBELET_HOST=${KUBELET_HOST:-"127.0.0.1"}
+KUBELET_HOST=${KUBELET_HOST:-"${K8S_WORKER}"}
 # By default only allow CORS for requests on localhost
 API_CORS_ALLOWED_ORIGINS=${API_CORS_ALLOWED_ORIGINS:-/127.0.0.1(:[0-9]+)?$,/localhost(:[0-9]+)?$}
 KUBELET_PORT=${KUBELET_PORT:-10250}
@@ -741,7 +743,7 @@
         --hostname-override="${HOSTNAME_OVERRIDE}" \
         ${cloud_config_arg} \
         --address="${KUBELET_HOST}" \
-        --kubeconfig "$CERT_DIR"/kubelet.kubeconfig \
+        --kubeconfig "$CERT_DIR"/kubelet-"${K8S_WORKER}".kubeconfig \
         --feature-gates="${FEATURE_GATES}" \
         --cpu-cfs-quota=${CPU_CFS_QUOTA} \
         --enable-controller-attach-detach="${ENABLE_CONTROLLER_ATTACH_DETACH}" \
@@ -815,10 +817,13 @@
 apiVersion: kubeproxy.config.k8s.io/v1alpha1
 kind: KubeProxyConfiguration
 clientConnection:
-  kubeconfig: ${CERT_DIR}/kube-proxy.kubeconfig
+  kubeconfig: ${CERT_DIR}/kube-proxy-${K8S_WORKER}.kubeconfig
 hostnameOverride: ${HOSTNAME_OVERRIDE}
 featureGates: ${FEATURE_GATES}
 mode: ${KUBE_PROXY_MODE}
+ClusterCIDR: 10.0.0.0/24
+iptables:
+  masqueradeAll: true 
 EOF
 
     sudo "${GO_OUT}/hyperkube" proxy \
@@ -827,13 +832,6 @@
       --v=${LOG_LEVEL} 2>&1 &
     PROXY_PID=$!
 
-    SCHEDULER_LOG=${LOG_DIR}/kube-scheduler.log
-    ${CONTROLPLANE_SUDO} "${GO_OUT}/hyperkube" scheduler \
-      --v=${LOG_LEVEL} \
-      --kubeconfig "$CERT_DIR"/scheduler.kubeconfig \
-      --feature-gates="${FEATURE_GATES}" \
-      --master="https://${API_HOST}:${API_SECURE_PORT}" >"${SCHEDULER_LOG}" 2>&1 &
-    SCHEDULER_PID=$!
 }
 
 function start_kubedns {
@@ -1002,6 +1000,7 @@
         ;;
       Linux)
         start_kubelet
+        start_kubeproxy
         ;;
       *)
         warning "Unsupported host OS.  Must be Linux or Mac OS X, kubelet aborted."

@vincentmli
Copy link
Author

for reference, the nat table rule in worker node ended up with below, KUBE-MARK-MASQ is referenced by KUBE-SERVICES

iptables -t nat -n -L

Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
KUBE-POSTROUTING all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
RETURN all -- 10.244.0.0/16 10.244.0.0/16
MASQUERADE all -- 10.244.0.0/16 !224.0.0.0/4
RETURN all -- !10.244.0.0/16 10.244.1.0/24
MASQUERADE all -- !10.244.0.0/16 10.244.0.0/16

Chain DOCKER (0 references)
target prot opt source destination

Chain KUBE-FIRE-WALL (0 references)
target prot opt source destination

Chain KUBE-MARK-DROP (0 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x8000

Chain KUBE-MARK-MASQ (1 references)
target prot opt source destination
MARK all -- 0.0.0.0/0 0.0.0.0/0 MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 0.0.0.0/0 0.0.0.0/0 match-set KUBE-CLUSTER-IP dst,dst

@alijawadfahs
Copy link

I have the same problem, I'm running a cluster of 4 RPi's, and I have used flannel for a while with iptables with no problems to be mentioned (maybe one problem that i need to edit the forward chains in the iptables).

I started facing "Failed to create SubnetManager" when I switched to IPVS, from my humble point of view, IPVS is not compatible with flannel for the moment or vice-versa.

I'll keep you updated if I figured this thing out.

@mmack
Copy link

mmack commented Apr 26, 2018

Same problem here... watching...

@rtreffer
Copy link

On a node in our cluster I've noticed that 6: kube-ipvs0: mtu 1500 qdisc noop state DOWN is an issue.
I'd be very interested to hear if ip link set dev kube-ipvs0 up helps.
I've opened #63199 to bring devices up by default.

@m1093782566
Copy link
Contributor

It should be fixed by #63319.

I am going to close this issue. Please feel free to re-open if you still have the troubles when you check out the HEAD.

@m1093782566
Copy link
Contributor

/close

@smiklosovic
Copy link

smiklosovic commented May 6, 2019

This is still an issue in Kubernetes 1.14. I am on RHEL and I do have exact same results. It seems like

iptables -t nat -I POSTROUTING -j KUBE-MARK-MASQ

on worker node after it is joined helped and flannel network got to Running state but honestly I am admitting I do not know what I am doing at this point.

@n-sviridenko
Copy link

n-sviridenko commented May 30, 2019

Same issue (v1.10.3).

@dptiwari
Copy link

is someone got actual solution for this.. I am still facing this issue on RHEL 7 with Kubernetes V1.22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipvs area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

9 participants