Skip to content

Kubelet not starting with external cloud provider OpenStack on version 1.26.5 #10250

Closed
@czqrny

Description

Environment:

  • Cloud provider or hardware configuration: external cloud provider openstack

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 5.4.0-149-generic x86_64\nNAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Version of Ansible (ansible --version): (official Kubespray Docker container)
ansible [core 2.12.5]
  config file = /kubespray/ansible.cfg
  configured module search path = ['/kubespray/library']
  ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
  jinja version = 3.1.2
  libyaml = True
  • Version of Python (python --version):
    • on localhost (official Kubespray (2.22.1) Docker container) - Python 3.10.6
    • on virtual machines - Python 3.8.10

Kubespray version (commit) (git rev-parse --short HEAD): official Kubespray (2.22.1) Docker container), can not check git, no git repo inside container (lack of .git subdirectory)

Network plugin used: Calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

The only that matters is:

kube_feature_gates:
  - "CSIMigration=true"
  - "CSIMigrationOpenStack=true"
  - "ExpandCSIVolumes=true"

Command used to invoke ansible: ansible-playbook --become --inventory environments/development upgrade-cluster.yml

Output of ansible run:

I think just the end is enough:

TASK [kubernetes/control-plane : kubeadm | Check api is up] **********************************************************************************************************************************
ok: [master01.development]
Friday 23 June 2023  22:57:31 +0000 (0:00:01.695)       0:41:26.997 *********** 
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (3 retries left).
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (2 retries left).
FAILED - RETRYING: [master01.development]: kubeadm | Upgrade first master (1 retries left).

TASK [kubernetes/control-plane : kubeadm | Upgrade first master] *****************************************************************************************************************************
fatal: [master01.development]: FAILED! => {
    "attempts": 3,
    "changed": true,
    "cmd": [
        "timeout",
        "-k",
        "600s",
        "600s",
        "/usr/local/bin/kubeadm",
        "upgrade",
        "apply",
        "-y",
        "v1.26.5",
        "--certificate-renewal=True",
        "--config=/etc/kubernetes/kubeadm-config.yaml",
        "--ignore-preflight-errors=all",
        "--allow-experimental-upgrades",
        "--etcd-upgrade=false",
        "--force"
    ],
    "delta": "0:05:10.488402",
    "end": "2023-06-23 23:18:30.658145",
    "failed_when_result": true,
    "rc": 1,
    "start": "2023-06-23 23:13:20.169743"
}

STDOUT:

[upgrade/config] Making sure the configuration is correct:
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.26.5"
[upgrade/versions] Cluster version: v1.25.6
[upgrade/versions] kubeadm version: v1.26.5
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.26.5" (timeout: 5m0s)...
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests3726338917"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2023-06-23-23-13-29/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)


STDERR:

W0623 23:13:20.233727  126677 common.go:93] WARNING: Usage of the --config flag with kubeadm config types for reconfiguring the cluster during upgrade is not recommended!
W0623 23:13:20.239246  126677 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
        [WARNING ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [master01.development]
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component kube-apiserver on Node master01.development did not change after 5m0s: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher


MSG:

non-zero return code

Anything else do we need to know:

So after kubeadm failed to upgrade cluster on my first control-plane node I started to look for the cause. I found that kubelet systemd service is not working and its keeps restarting. However systemd logs was not helpful at this time so I decided to stop that service, source the kubelet env and start it by myself. The output was as follows:

Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Flag --cloud-provider has been deprecated, will be removed in 1.25 or later, in favor of removing cloud provider code from Kubelet.
W0624 06:00:10.303241  276356 feature_gate.go:241] Setting GA feature gate CSIMigration=true. It will be removed in a future release.
E0624 06:00:10.303479  276356 run.go:74] "command failed" err="failed to set feature gates from initial flags-based config: unrecognized feature gate: CSIMigrationOpenStack"

Next I removed CSIMigrationOpenStack future gate from /etc/kubernetes/kubelet-config.yml file. After that I start kubelet service and it finally worked. So I fixed the node, and run again upgrade-cluster.yaml playbook with kube_feature_gates variable fixed. Now its working as expected.

So I think that this is only documentation level bug, because this project documentation about Openstack indicates to set that feature.

If I am not wrong this should be very easy PR to merge.

If you wish I can make that PR.

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions