Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing timeout parameter in helm chart for agent-core container in openebs-agents-core deployment #3805

Open
oleksandr-shalbanov-fntext opened this issue Nov 14, 2024 · 3 comments · Fixed by openebs/mayastor-extensions#569
Assignees

Comments

@oleksandr-shalbanov-fntext

Description

I’m currently trying to configure openEBS replicated storage on my on-prem cluster(1.27 k8s based on Oracle Linux 8 )
I’ve installed openebs with the latest helm chart, with the following values:

release:
  version: "4.1.0"

openebs-crds:
  csi:
    volumeSnapshots:
      enabled: true
      keep: true

# Refer to https://github.com/openebs/dynamic-localpv-provisioner/blob/HEAD/deploy/helm/charts/values.yaml for complete set of values.
localpv-provisioner:
  rbac:
    create: true

# Refer to https://github.com/openebs/zfs-localpv/blob/HEAD/deploy/helm/charts/values.yaml for complete set of values.
zfs-localpv:
  enabled: false
  # crds:
  #   zfsLocalPv:
  #     enabled: true
  #   csi:
  #     volumeSnapshots:
  #       enabled: false
  # zfsNode:
  #   encrKeysDir: /opt/keys

# Refer to https://github.com/openebs/lvm-localpv/blob/HEAD/deploy/helm/charts/values.yaml for complete set of values.
lvm-localpv:
  enabled: false
  # crds:
  #   lvmLocalPv:
  #     enabled: true
  #   csi:
  #     volumeSnapshots:
  #       enabled: false

# Refer to https://github.com/openebs/mayastor-extensions/blob/v2.7.0/chart/values.yaml for complete set of values.
mayastor:
  enabled: true
  csi:
    node:
      initContainers:
        enabled: false
  etcd:
    # -- Kubernetes Cluster Domain
    clusterDomain: cluster.local
    replicaCount: 3
  localpv-provisioner:
    enabled: false
  crds:
    enabled: false
  base:
    metrics:
      enabled: false

# -- Configuration options for pre-upgrade helm hook job.
preUpgradeHook:
  image:
    # -- The container image registry URL for the hook job
    registry: docker.io
    # -- The container repository for the hook job
    repo: bitnami/kubectl
    # -- The container image tag for the hook job
    tag: "1.25.15"
    # -- The imagePullPolicy for the container
    pullPolicy: IfNotPresent

engines:
  local:
    lvm:
      enabled: false
    zfs:
      enabled: false
  replicated:
    mayastor:
      enabled: true

The installation has finished successfully. I have 4 2TB disks attached to 4 of my nodes.
I’ve labeled them like this: kubectl label node selllvk8swrkr16 openebs.io/engine=mayastor

Expected Behavior

I expect that all disk pools, I create further, work without issues

Current Behavior

When I’m trying to create disk pools, on some nodes disk pull was created successfully, but on some of them I’m getting the following error:

 2024-10-23T19:47:30.424508Z ERROR core::controller::io_engine::v1::pool: error: gRPC request 'create_pool' for 'Pool' failed with 'status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} }'
    at control-plane/agents/src/bin/core/controller/io_engine/v1/pool.rs:33
    in core::pool::service::create_pool with request: CreatePool { node: NodeId("selllvk8swrkr08"), id: PoolId("disk-pool-selllvk8swrkr08-sdb"), disks: [PoolDeviceUri("/dev/sdb")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: disk-pool-selllvk8swrkr08-sdb

  2024-10-23T19:47:30.433028Z ERROR core::pool::service: error: gRPC request 'create_pool' for 'Pool' failed with 'status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} }'
    at control-plane/agents/src/bin/core/pool/service.rs:285
    in core::pool::service::create_pool with request: CreatePool { node: NodeId("selllvk8swrkr08"), id: PoolId("disk-pool-selllvk8swrkr08-sdb"), disks: [PoolDeviceUri("/dev/sdb")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: disk-pool-selllvk8swrkr08-sdb

  2024-10-23T19:47:35.556435Z ERROR core::pool::service: error: Pool Resource pending deletion - please retry
    at control-plane/agents/src/bin/core/pool/service.rs:285
    in core::pool::service::create_pool with request: CreatePool { node: NodeId("selllvk8swrkr08"), id: PoolId("disk-pool-selllvk8swrkr08-sdb"), disks: [PoolDeviceUri("/dev/sdb")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: disk-pool-selllvk8swrkr08-sdb

  2024-10-23T19:47:54.918561Z  INFO core::controller::resources::operations_helper: complete_destroy, val: ()
    at control-plane/agents/src/bin/core/controller/resources/operations_helper.rs:382
    in core::controller::reconciler::pool::deleting_pool_spec_reconciler with pool.id: disk-pool-selllvk8swrkr08-sdb, request.reconcile: true

  2024-10-23T19:47:54.923180Z  INFO core::controller::reconciler::pool: Pool deleted successfully
    at control-plane/agents/src/bin/core/controller/reconciler/pool/mod.rs:200
    in core::controller::reconciler::pool::deleting_pool_spec_reconciler with pool.id: disk-pool-selllvk8swrkr08-sdb, request.reconcile: true

  2024-10-23T19:48:15.727087Z ERROR core::controller::io_engine::v1::pool: error: gRPC request 'create_pool' for 'Pool' failed with 'status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} }'
    at control-plane/agents/src/bin/core/controller/io_engine/v1/pool.rs:33
    in core::pool::service::create_pool with request: CreatePool { node: NodeId("selllvk8swrkr08"), id: PoolId("disk-pool-selllvk8swrkr08-sdb"), disks: [PoolDeviceUri("/dev/sdb")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: disk-pool-selllvk8swrkr08-sdb

  2024-10-23T19:48:15.736083Z ERROR core::pool::service: error: gRPC request 'create_pool' for 'Pool' failed with 'status: Cancelled, message: "Timeout expired", details: [], metadata: MetadataMap { headers: {} }'
    at control-plane/agents/src/bin/core/pool/service.rs:285
    in core::pool::service::create_pool with request: CreatePool { node: NodeId("selllvk8swrkr08"), id: PoolId("disk-pool-selllvk8swrkr08-sdb"), disks: [PoolDeviceUri("/dev/sdb")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: disk-pool-selllvk8swrkr08-sdb

  2024-10-23T19:48:20.874247Z ERROR core::pool::service: error: Pool Resource pending deletion - please retry
    at control-plane/agents/src/bin/core/pool/service.rs:285
    in core::pool::service::create_pool with request: CreatePool { node: NodeId("selllvk8swrkr08"), id: PoolId("disk-pool-selllvk8swrkr08-sdb"), disks: [PoolDeviceUri("/dev/sdb")], labels: Some({"openebs.io/created-by": "operator-diskpool"}) }, pool.id: disk-pool-selllvk8swrkr08-sdb

Possible Solution

This could be solved by modifying exec command of agent-core container from openebs-agents-core deployment.
Following strings have to be added:

- --no-min-timeouts
- --request-timeout=60s

So I suggest to add the additional parameter to the helm chart values to be able to set --request-timeout during installation.

@tiagolobocastro
Copy link
Contributor

Thanks for creation this @oleksandr-shalbanov-fntext
In addition, I think the reason why it didn't work without the min-timeouts is because we have a regression.
A few issues, a timed out create is marking the pool as deleting, which is is not helpful (regression).
Also import_pool is now failing (another regression) if the lock is taken rather than wait, which means if a previous create_pool is still in progress, it actually ends up failing the import.
I'll fix these 2 in addition to the missing helm vars for min-timeouts, so you shouldn't even have to change the timeouts I hope.

@oleksandr-shalbanov-fntext
Copy link
Author

@tiagolobocastro Thank you. I can create pr for adding additional parameters to helm chart. Should I? :)

@tiagolobocastro
Copy link
Contributor

That would be awesome, thank you!

bors-openebs-mayastor bot pushed a commit to openebs/mayastor-extensions that referenced this issue Nov 25, 2024
569: feat(chart): add requestTimeout for core-agent container r=tiagolobocastro a=oleksandr-shalbanov-fntext

## Description
Adding helm chart value .agents.core.requestTimeout to be able to configure core agent timeout.

## Motivation and Context
This is to resolve [#3805](openebs/openebs#3805) issue.


## Regression
No



## How Has This Been Tested?
Installed OpenEBS with helm chart to my test environment

## Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Checklist:
- [x] My code follows the code style of this project.
- [x] My change requires a change to the documentation.
- [x] I have updated the documentation accordingly.
- [ ] I have added unit tests to cover my changes.

Co-authored-by: Oleksandr Shalbanov <oleksandr.shalbanov@evry.com>
bors-openebs-mayastor bot pushed a commit to openebs/mayastor-extensions that referenced this issue Nov 25, 2024
569: feat(chart): add requestTimeout for core-agent container r=oleksandr-shalbanov-fntext a=oleksandr-shalbanov-fntext

## Description
Adding helm chart value .agents.core.requestTimeout to be able to configure core agent timeout.

## Motivation and Context
This is to resolve [#3805](openebs/openebs#3805) issue.


## Regression
No



## How Has This Been Tested?
Installed OpenEBS with helm chart to my test environment

## Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Checklist:
- [x] My code follows the code style of this project.
- [x] My change requires a change to the documentation.
- [x] I have updated the documentation accordingly.
- [ ] I have added unit tests to cover my changes.

Co-authored-by: Oleksandr Shalbanov <oleksandr.shalbanov@evry.com>
bors-openebs-mayastor bot pushed a commit to openebs/mayastor-extensions that referenced this issue Nov 25, 2024
569: feat(chart): add requestTimeout for core-agent container r=oleksandr-shalbanov-fntext a=oleksandr-shalbanov-fntext

## Description
Adding helm chart value .agents.core.requestTimeout to be able to configure core agent timeout.

## Motivation and Context
This is to resolve [#3805](openebs/openebs#3805) issue.


## Regression
No



## How Has This Been Tested?
Installed OpenEBS with helm chart to my test environment

## Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Checklist:
- [x] My code follows the code style of this project.
- [x] My change requires a change to the documentation.
- [x] I have updated the documentation accordingly.
- [ ] I have added unit tests to cover my changes.

Co-authored-by: Oleksandr Shalbanov <oleksandr.shalbanov@evry.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants