-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady... #284
Comments
Hitting this as well with the same setup as above. |
The second one is same with #285 The line in the script should be updated as |
Still seeing this now and again:
|
@max-rocket-internet Hey, are you still hitting the issue of With the newer CNI the support script should work since it addresses the change for that kubelet port. Have you seen For either of these, if you have, what CNI version are you using? Thanks! |
I'm on EKS 1.12.7 and CNI 1.3.3, and this error actually happened to most of my nodes for about 10 minutes, and resolved itself magically (seemingly). It was right after I re-deployed my ASGs through CloudFormation. @tiffanyfay Do you have any insight on why this could happen? |
Recording this here in case it helps others. I had a In my case, running |
I also had a similar issue today with EKS 1.12 and CNI plugin version 1.4.1. I didn't find anymore debugging info and the nodes were replaced by the cluster auto-scaler. Is there anything I should look out for if this happens again? |
I have been seeing this problem on pods that start immediately after the kube node comes up. If I delete the pods and have them "try again" they get their IPs and there's no warnings. Could this be solved through a node readiness change? |
Just had the same issue with 1.11.9. The cni networking failed on one of two new nodes so the failed node never joined the cluster. A reboot from the AWS Console got it working |
the first two warnings I see are
|
This is the workaround we use as well. Our environment where we ran into it: k8s: This has happened only a couple times over half a year (so on older versions too), so it's difficult for us to reproduce. |
EKS: Same problem on multiple EKS cluster. New VMs cannot join the cluster. Kubelet error on the nodes:
Events:
|
@temal- Thanks for the report. If the cni binary and config file is missing, ipamd must have failed to start correctly on the new node. There are a few possible options. Either the calls to the EC2 control plane got throttled and timed out, or there are no more ENIs or IPs available in the subnet. If you could get the logfiles from ipamd on a node that has this issue it would be extremely helpful. (A comprehensive log collector script: amazon-eks-ami/log-collector-script) |
@mogren Thanks for the quick reply. |
We are trying to create a new cluster using
|
EKS : Got the same issue while updating the EKS cluster. UPD: But really strange that with not all needed SG rules in ControlPlane SG with CNI 1.5.1 new worker nodes are become |
|
Had the same issue as @hardcorexcat, rollback to v1.5.1 resolved the issue too, and now after a while upgrading back to v1.5.3 works too with new nodes. Clusters are created with terraform-aws-eks. |
Hi all, Here is my case: EnvironmentEKS: v1.14 I just created a fresh EKS cluster with 2 worker nodes that join the cluster but with a Regards, |
Hi, I am hitting the same issue with eks cluster created by terraform. |
Just in case someone comes across this who is using a g4dn family instance on AWS. I was stuck on this for a while because the version of the CNI plugin I was using didn't support that family. After upgrading the CNI plugin it worked. https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html |
For the past few days I've been experimenting with EKS cluster creation. I'm using terraform, actually a terraform module similar to the popular community module. Creating clusters below version 1.14 have no problems with worker nodes being in a "Ready" state. I'm using the latest CNI version: amazon-k8s-cni:v1.5.5 To sum up, I've decided to use a 1.13 cluster in which I see no problems with nodes using a 1.14 AMI in hopes of fixing this problem in the near future. Epilogue: I'm using a full 1.13 version cluster because every once in a while a worker node would briefly become "NotReady" and then after a few seconds revert to Ready. Very strange behaviour. |
I have experienced the same as @Erokos. With 1.13 works, with 1.14 nodes fail to get to ready. I dont think the issue is related to AWS VPC CNI, because I tried replacing it with Calico and got same problem: cni pod (aws-node or calico-node) cannot connect to 10.100.0.1 which is kubernetes service clusterip. |
Coming from AWS support: Another possible cause is my old AWS provider. I use 1.60.0. Hope this helps [1] https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html |
Hi @ppaepam, is this still an issue? |
I am also having this issue. tried add worker node via Cloud formation. |
I fixed this issue by upgrading Kubernetes components. I had the same problem in my AWS EKS cluster. So ran below commands using eksctl CLI tool. eksctl utils update-kube-proxy --name Your_Cluster_Name --approve |
This issue contains a mix of CNI versions and EKS cluster versions. I think @ppaepam and @SarasaGunawardhana are both right, and if anyone has similar issues please open a new issue to track that specific case. |
I experienced this issue after updating EKS to version 1.16 and @SarasaGunawardhana commands did the trick for me. |
@mlachmish also struggeling with it. Thx for the confirmation :) |
Leaving this here as this issue was the first result on Google. The problem for me was that my I had to manually edit this daemonset and remove the flag ( For reference, this is the daemonset - command:
- /bin/sh
- -c
- kube-proxy --oom-score-adj=-998 --master=https://MYCLUSTER.eks.amazonaws.com
--kubeconfig=/var/lib/kube-proxy/kubeconfig --proxy-mode=iptables --v=2
1>>/var/log/kube-proxy.log 2>&1
|
Thankyou @SarasaGunawardhana, This has just worked for me |
Just to verify, I've recently created a 1.15 cluster with an additional security group for the EKS control plane and have had no problems. Before, and that worked for 1.13 version, my EKS module used to assign the default VPC security group to the EKS cluster control plane. |
These logs still occurs on some occasions. |
this just occurred to me when upgrading from EKS no matter what we did, End state: cluster upgraded to |
Hi All, I am still facing the issue where i am trying to update from The cluster version upgraded successfully but for nodes i am seeing the same error Any help on how to workaround this would be really great, Thanks. |
Hi @aksharj Can you please try the suggestion by @max-rocket-internet? Please see this - #284 (comment) Thank you! |
I tried to use this method, kube-proxy still cannot be started properly, then I refer to this tutorial https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
then I install default pod security policy install psp |
For others still running into this, it just happened to me--with a much simpler solution. If you're using eksctl and restricting access to the public API endpoint (either with |
Okay for anyone that might still face this issue. After running the steps that @SarasaGunawardhana mentioned
I noticed that the So after further troubleshooting I had to manually upgrade the CNI addon on my cluster. |
I am still getting this error on k8 1.22 eks I have this error when switching my ami version from ami-06bdb0d00ff41dc6d None of the fixes above have worked is there a configuration I am missing when moving to newer ami? |
Have you checked if you have the AmazonEKS_CNI_Policy associated with the EKS Worker Node Role ? |
EKS:
v1.11.5
CNI:
602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.3.0
AMI:
amazon-eks-node-1.11-v20181210 (ami-0a9006fb385703b54)
We are still seeing these CNI errors in pod events. e.g.
I tried to run
/opt/cni/bin/aws-cni-support.sh
on the node with podaws-node-hhtrt
but I get this error:The text was updated successfully, but these errors were encountered: