Kubelet not starting with external cloud provider OpenStack on version 1.26.5

**Environment**:
- **Cloud provider or hardware configuration:** external cloud provider openstack

- **OS (`printf "$(uname -srm)\n$(cat /etc/os-release)\n"`):**
```bash
Linux 5.4.0-149-generic x86_64\nNAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
```

- **Version of Ansible** (`ansible --version`): (official Kubespray Docker container)

```
ansible [core 2.12.5]
  config file = /kubespray/ansible.cfg
  configured module search path = ['/kubespray/library']
  ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
  jinja version = 3.1.2
  libyaml = True
```
- **Version of Python** (`python --version`):
  * on localhost (official Kubespray (2.22.1) Docker container) - Python 3.10.6
  * on virtual machines - Python 3.8.10

**Kubespray version (commit) (`git rev-parse --short HEAD`):** official Kubespray (2.22.1) Docker container), can not check git, no git repo inside container (lack of `.git` subdirectory)


**Network plugin used**: Calico


**Full inventory with variables (`ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"`):**

The only that matters is:

```yaml
kube_feature_gates:
  - "CSIMigration=true"
  - "CSIMigrationOpenStack=true"
  - "ExpandCSIVolumes=true"
```
**Command used to invoke ansible**: `ansible-playbook --become --inventory environments/development upgrade-cluster.yml`


**Output of ansible run**:


I think just the end is enough:

```
TASK [kubernetes/control-plane : kubeadm | Check api is up] **********************************************************************************************************************************
ok: [master01.development]
Friday 23 June 2023  22:57:31 +0000 (0:00:01.695)       0:41:26.997 *********** 
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (3 retries left).
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (2 retries left).
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (1 retries left).

TASK [kubernetes/control-plane : kubeadm | Upgrade first master] *****************************************************************************************************************************
fatal: [master01.development]: FAILED! => {
    "attempts": 3,
    "changed": true,
    "cmd": [
        "timeout",
        "-k",
        "600s",
        "600s",
        "/usr/local/bin/kubeadm",
        "upgrade",
        "apply",
        "-y",
        "v1.26.5",
        "--certificate-renewal=True",
        "--config=/etc/kubernetes/kubeadm-config.yaml",
        "--ignore-preflight-errors=all",
        "--allow-experimental-upgrades",
        "--etcd-upgrade=false",
        "--force"
    ],
    "delta": "0:05:10.488402",
    "end": "2023-06-23 23:18:30.658145",
    "failed_when_result": true,
    "rc": 1,
    "start": "2023-06-23 23:13:20.169743"
}

STDOUT:

[upgrade/config] Making sure the configuration is correct:
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.26.5"
[upgrade/versions] Cluster version: v1.25.6
[upgrade/versions] kubeadm version: v1.26.5
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.26.5" (timeout: 5m0s)...
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3726338917"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2023-06-23-23-13-29/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)


STDERR:

W0623 23:13:20.233727  126677 common.go:93] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!
W0623 23:13:20.239246  126677 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
        [WARNING ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [master01.development]
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component kube-apiserver on Node master01.development did not change after 5m0s: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher


MSG:

non-zero return code
```
**Anything else do we need to know**:


So after kubeadm failed to upgrade cluster on my first control-plane node I started to look for the cause. I found that kubelet systemd service is not working and its keeps restarting.  However systemd logs was not helpful at this time so I decided to stop that service, source the kubelet env and start it by myself. The output was as follows:

```bash
Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
W0624 06:00:10.303241  276356 feature_gate.go:241] Setting GA feature gate CSIMigration=true. It will be removed in a future release.
E0624 06:00:10.303479  276356 run.go:74] "command failed" err="failed to set feature gates from initial flags-based config: unrecognized feature gate: CSIMigrationOpenStack"
```

Next I removed  CSIMigrationOpenStack future gate from `/etc/kubernetes/kubelet-config.yml` file. After that I start kubelet service and it finally worked. So I fixed the node, and run again `upgrade-cluster.yaml` playbook with `kube_feature_gates` variable fixed. Now its working as expected.

So I think that this is only documentation level bug, because this project [documentation about Openstack](https://github.com/kubernetes-sigs/kubespray/blob/v2.22.1/docs/openstack.md?plain=1#L74) indicates to set that feature.
 
If I am not wrong this should be very easy PR to merge.
 
If you wish I can make that PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet not starting with external cloud provider OpenStack on version 1.26.5 #10250

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development