Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 1.6.2 -> 1.6.3 upgrade failed #9860

Open
GounGG opened this issue Nov 26, 2024 · 2 comments
Open

[BUG] 1.6.2 -> 1.6.3 upgrade failed #9860

GounGG opened this issue Nov 26, 2024 · 2 comments
Labels
kind/bug require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage stale

Comments

@GounGG
Copy link

GounGG commented Nov 26, 2024

Describe the bug

Use helm to upgrade 1.6.2 to 1.6.3, longhorn-manage pod cannot start

To Reproduce

commabd

helm upgrade longhorn --install longhorn-charts/1.6.3 -f longhorn-charts/values/dev.yaml --namespace longhorn-system --create-namespace

value

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/path: "/metrics"
  prometheus.io/port: "9500"

longhornManager:
  tolerations: 
    - key: "node-role.kubernetes.io/storage"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

longhornDriver:
  tolerations: 
    - key: "node-role.kubernetes.io/storage"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

defaultSettings:
  defaultDataPath: "/data/longhorn"
  createDefaultDiskLabeledNodes: "true"
  taintToleration: "node-role.kubernetes.io/storage=true:NoSchedule"
  
persistence:
  defaultClass: "false"

longhornUI:
  replicas: 1

Environment

  • Longhorn version: 1.6.2
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: 1.21
    • Number of control plane nodes in the cluster: 3
    • Number of worker nodes in the cluster: 15
  • Node config
    • OS type and version: ubuntu 20.04
    • Kernel version: 5.4

error log

W1126 07:12:04.950651       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-11-26T07:12:04Z" level=info msg="Starting longhorn conversion webhook server" func=webhook.StartWebhook file="webhook.go:24"
time="2024-11-26T07:12:04Z" level=info msg="Waiting for conversion webhook to become ready" func=webhook.StartWebhook file="webhook.go:43"
time="2024-11-26T07:12:04Z" level=warning msg="Failed to check endpoint https://localhost:9501/v1/healthz" func=webhook.isServiceAvailable file="webhook.go:78" error="Get \"https://localhost:9501/v1/healthz\": dial tcp 127.0.0.1:9501: connect: connection refused"
time="2024-11-26T07:12:04Z" level=info msg="Active TLS secret longhorn-system/longhorn-webhook-tls (ver=1436382578) (count 2): map[listener.cattle.io/cn-longhorn-admission-webhook.longhor-59584d:longhorn-admission-webhook.longhorn-system.svc listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc listener.cattle.io/fingerprint:SHA1=5DBAB4FDF556A7428F58DA8BC58EDB8B97CACA77]" func="memory.(*memory).Update" file="memory.go:42"
time="2024-11-26T07:12:04Z" level=info msg="Listening on :9501" func=server.ListenAndServe.func2 file="server.go:77"
time="2024-11-26T07:12:06Z" level=info msg="Started longhorn conversion webhook server on localhost" func=webhook.StartWebhook file="webhook.go:47"
time="2024-11-26T07:12:07Z" level=warning msg="Failed to check endpoint https://longhorn-conversion-webhook.longhorn-system.svc:9501/v1/healthz" func=webhook.isServiceAvailable file="webhook.go:78" error="Get \"https://longhorn-conversion-webhook.longhorn-system.svc:9501/v1/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="2024-11-26T07:12:09Z" level=info msg="conversion webhook service is now accessible" func=webhook.CheckWebhookServiceAvailability file="webhook.go:63"
W1126 07:12:09.962071       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I1126 07:12:09.963240       1 shared_informer.go:311] Waiting for caches to sync for longhorn datastore
I1126 07:12:10.963900       1 shared_informer.go:318] Caches are synced for longhorn datastore
time="2024-11-26T07:12:10Z" level=info msg="Starting longhorn admission webhook server" func=webhook.StartWebhook file="webhook.go:24"
time="2024-11-26T07:12:10Z" level=info msg="Waiting for admission webhook to become ready" func=webhook.StartWebhook file="webhook.go:43"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for nodes.longhorn.io (Node)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for settings.longhorn.io (Setting)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for recurringjobs.longhorn.io (RecurringJob)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for backingimages.longhorn.io (BackingImage)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for volumes.longhorn.io (Volume)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for orphans.longhorn.io (Orphan)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for snapshots.longhorn.io (Snapshot)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for supportbundles.longhorn.io (SupportBundle)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for systembackups.longhorn.io (SystemBackup)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=warning msg="Failed to check endpoint https://localhost:9502/v1/healthz" func=webhook.isServiceAvailable file="webhook.go:78" error="Get \"https://localhost:9502/v1/healthz\": dial tcp 127.0.0.1:9502: connect: connection refused"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for systemrestores.longhorn.io (SystemRestore)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for volumeattachments.longhorn.io (VolumeAttachment)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for engines.longhorn.io (Engine)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for replicas.longhorn.io (Replica)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add validation handler for instancemanagers.longhorn.io (InstanceManager)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for backups.longhorn.io (Backup)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for backingimages.longhorn.io (BackingImage)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for backingimagemanagers.longhorn.io (BackingImageManager)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for backingimagedatasources.longhorn.io (BackingImageDataSource)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for nodes.longhorn.io (Node)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for volumes.longhorn.io (Volume)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for engines.longhorn.io (Engine)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for recurringjobs.longhorn.io (RecurringJob)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for engineimages.longhorn.io (EngineImage)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for orphans.longhorn.io (Orphan)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for sharemanagers.longhorn.io (ShareManager)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for backupvolumes.longhorn.io (BackupVolume)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for snapshots.longhorn.io (Snapshot)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for replicas.longhorn.io (Replica)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for supportbundles.longhorn.io (SupportBundle)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for systembackups.longhorn.io (SystemBackup)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for volumeattachments.longhorn.io (VolumeAttachment)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for instancemanagers.longhorn.io (InstanceManager)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Add mutation handler for backupbackingimages.longhorn.io (BackupBackingImage)" func=server.addHandler file="handler.go:17"
time="2024-11-26T07:12:10Z" level=info msg="Active TLS secret longhorn-system/longhorn-webhook-tls (ver=1436382578) (count 2): map[listener.cattle.io/cn-longhorn-admission-webhook.longhor-59584d:longhorn-admission-webhook.longhorn-system.svc listener.cattle.io/cn-longhorn-conversion-webhook.longho-6a0089:longhorn-conversion-webhook.longhorn-system.svc listener.cattle.io/fingerprint:SHA1=5DBAB4FDF556A7428F58DA8BC58EDB8B97CACA77]" func="memory.(*memory).Update" file="memory.go:42"
time="2024-11-26T07:12:10Z" level=info msg="Listening on :9502" func=server.ListenAndServe.func2 file="server.go:77"
time="2024-11-26T07:12:12Z" level=info msg="Started longhorn admission webhook server on localhost" func=webhook.StartWebhook file="webhook.go:47"
time="2024-11-26T07:12:12Z" level=info msg="admission webhook service is now accessible" func=webhook.CheckWebhookServiceAvailability file="webhook.go:63"
time="2024-11-26T07:12:12Z" level=info msg="Starting longhorn recovery-backend server" func=recovery_backend.StartRecoveryBackend file="recovery_backend.go:13"
time="2024-11-26T07:12:12Z" level=info msg="Started longhorn recovery-backend server" func=recovery_backend.StartRecoveryBackend file="recovery_backend.go:22"
W1126 07:12:12.973198       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2024-11-26T07:12:12Z" level=info msg="Recovery-backend server is running at :9503" func="server.(*RecoveryBackendServer).ListenAndServe" file="server.go:36"
time="2024-11-26T07:12:12Z" level=info msg="Checking if the upgrade path from v1.6.2 to v1.6.3 is supported" func=util.checkLHUpgradePath file="util.go:249"
time="2024-11-26T07:12:12Z" level=info msg="Checking if the engine upgrade path from map[ei-b0369a5d:{Version: GitCommit: BuildDate: CLIAPIVersion:10 CLIAPIMinVersion:0 ControllerAPIVersion:5 ControllerAPIMinVersion:0 DataFormatVersion:0 DataFormatMinVersion:0}] is supported" func=util.checkEngineUpgradePath file="util.go:318" newEngineClientAPIMinVersion=8 newEngineClientAPIVersion=10 newEngineControllerAPIMinVersion=4 newEngineControllerAPIVersion=5
time="2024-11-26T07:12:12Z" level=info msg="Waiting for old Longhorn manager pods to be fully removed" func=upgrade.waitForOldLonghornManagersToBeFullyRemoved file="upgrade.go:314"
I1126 07:12:13.013894       1 leaderelection.go:250] attempting to acquire leader lease longhorn-system/longhorn-manager-upgrade-lock...
I1126 07:12:13.030092       1 leaderelection.go:260] successfully acquired lease longhorn-system/longhorn-manager-upgrade-lock
time="2024-11-26T07:12:13Z" level=info msg="Start upgrading" func=upgrade.upgrade.func1 file="upgrade.go:140"
time="2024-11-26T07:12:13Z" level=info msg="No API version upgrade is needed" func=upgrade.doAPIVersionUpgrade file="upgrade.go:186"
time="2024-11-26T07:12:13Z" level=info msg="Walking through the resource upgrade path v1.6.2 to v1.6.3" func=upgrade.doResourceUpgrade file="upgrade.go:272"
time="2024-11-26T07:12:13Z" level=error msg="Upgrade failed: upgrade resources failed: Internal error occurred: failed calling webhook \"validator.longhorn.io\": the server could not find the requested resource" func=upgrade.upgrade.func1.1 file="upgrade.go:135"
time="2024-11-26T07:12:13Z" level=info msg="Upgrade leader lost: k8s-dev-node010" func=upgrade.upgrade.func2 file="upgrade.go:149"
time="2024-11-26T07:12:13Z" level=fatal msg="Error starting manager: upgrade resources failed: Internal error occurred: failed calling webhook \"validator.longhorn.io\": the server could not find the requested resource" func=main.main.DaemonCmd.func3 file="daemon.go:92"
@GounGG GounGG added kind/bug require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage labels Nov 26, 2024
@github-project-automation github-project-automation bot moved this to New Issues in Longhorn Sprint Nov 26, 2024
@GounGG
Copy link
Author

GounGG commented Nov 26, 2024

these pods is always restarting
image

@longhorn-io-github-bot longhorn-io-github-bot moved this from New to In Progress in Community Review Sprint Nov 26, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage stale
Projects
Status: In Progress
Status: New Issues
Development

No branches or pull requests

1 participant