Kubelet not starting with external cloud provider OpenStack on version 1.26.5 #10250
Description
Environment:
-
Cloud provider or hardware configuration: external cloud provider openstack
-
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):
Linux 5.4.0-149-generic x86_64\nNAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
- Version of Ansible (
ansible --version
): (official Kubespray Docker container)
ansible [core 2.12.5]
config file = /kubespray/ansible.cfg
configured module search path = ['/kubespray/library']
ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
jinja version = 3.1.2
libyaml = True
- Version of Python (
python --version
):- on localhost (official Kubespray (2.22.1) Docker container) - Python 3.10.6
- on virtual machines - Python 3.8.10
Kubespray version (commit) (git rev-parse --short HEAD
): official Kubespray (2.22.1) Docker container), can not check git, no git repo inside container (lack of .git
subdirectory)
Network plugin used: Calico
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
):
The only that matters is:
kube_feature_gates:
- "CSIMigration=true"
- "CSIMigrationOpenStack=true"
- "ExpandCSIVolumes=true"
Command used to invoke ansible: ansible-playbook --become --inventory environments/development upgrade-cluster.yml
Output of ansible run:
I think just the end is enough:
TASK [kubernetes/control-plane : kubeadm | Check api is up] **********************************************************************************************************************************
ok: [master01.development]
Friday 23 June 2023 22:57:31 +0000 (0:00:01.695) 0:41:26.997 ***********
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (3 retries left).
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (2 retries left).
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (1 retries left).
TASK [kubernetes/control-plane : kubeadm | Upgrade first master] *****************************************************************************************************************************
fatal: [master01.development]: FAILED! => {
"attempts": 3,
"changed": true,
"cmd": [
"timeout",
"-k",
"600s",
"600s",
"/usr/local/bin/kubeadm",
"upgrade",
"apply",
"-y",
"v1.26.5",
"--certificate-renewal=True",
"--config=/etc/kubernetes/kubeadm-config.yaml",
"--ignore-preflight-errors=all",
"--allow-experimental-upgrades",
"--etcd-upgrade=false",
"--force"
],
"delta": "0:05:10.488402",
"end": "2023-06-23 23:18:30.658145",
"failed_when_result": true,
"rc": 1,
"start": "2023-06-23 23:13:20.169743"
}
STDOUT:
[upgrade/config] Making sure the configuration is correct:
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.26.5"
[upgrade/versions] Cluster version: v1.25.6
[upgrade/versions] kubeadm version: v1.26.5
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.26.5" (timeout: 5m0s)...
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3726338917"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2023-06-23-23-13-29/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
STDERR:
W0623 23:13:20.233727 126677 common.go:93] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!
W0623 23:13:20.239246 126677 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
[WARNING ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [master01.development]
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component kube-apiserver on Node master01.development did not change after 5m0s: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
MSG:
non-zero return code
Anything else do we need to know:
So after kubeadm failed to upgrade cluster on my first control-plane node I started to look for the cause. I found that kubelet systemd service is not working and its keeps restarting. However systemd logs was not helpful at this time so I decided to stop that service, source the kubelet env and start it by myself. The output was as follows:
Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
W0624 06:00:10.303241 276356 feature_gate.go:241] Setting GA feature gate CSIMigration=true. It will be removed in a future release.
E0624 06:00:10.303479 276356 run.go:74] "command failed" err="failed to set feature gates from initial flags-based config: unrecognized feature gate: CSIMigrationOpenStack"
Next I removed CSIMigrationOpenStack future gate from /etc/kubernetes/kubelet-config.yml
file. After that I start kubelet service and it finally worked. So I fixed the node, and run again upgrade-cluster.yaml
playbook with kube_feature_gates
variable fixed. Now its working as expected.
So I think that this is only documentation level bug, because this project documentation about Openstack indicates to set that feature.
If I am not wrong this should be very easy PR to merge.
If you wish I can make that PR.