Skip to content

Commit

Permalink
docs(lep): add "V2 Data Engine Live Upgrade"
Browse files Browse the repository at this point in the history
Signed-off-by: Derek Su <derek.su@suse.com>
  • Loading branch information
derekbit committed Nov 21, 2024
1 parent ec9841b commit 6643586
Show file tree
Hide file tree
Showing 3 changed files with 247 additions and 0 deletions.
242 changes: 242 additions & 0 deletions enhancements/20241114-v2-data-engine-live-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# V2 Data Engine Live Upgrade

## Summary

The document proposes a solution for upgrading Longhorn's v2 data engine. In contrast to the v1 data engine, which allows live upgrades on a per-volume basis, the v2 data engine only supports upgrades at the node level.

### Related Issues

## Motivation

### Goals

- Support v2 data engine upgrade at the node level
- No need to detach v2 volumes before upgrading v2 data engine

### Non-goals [optional]

- Support v2 data engine upgrade on a per-volume basis

## Proposal

### User Stories

As a Longhorn user, I want to upgrade the v2 data engine without needing to detach v2 volumes and interrupt services.

### User Experience In Detail

The steps to upgrade the v2 data engine of Longhorn nodes

1. Upgrade the Longhorn system by helm upgrade, manifest or other method.

2. After upgrading, an instance-manager resource with the default instance manager image is created on each node but remains in


- `stopped`: if the running instance-manager pod using an old instance manager image still contains running replicas or engines.
- `running`: if the old instance-manager is deleted because it does not contains running replicas and engines.

3. Create a `DataEngineUpgradeManager` resource
```
apiVersion: longhorn.io/v1beta2
kind: DataEngineUpgradeManager
metadata:
name: upgrade
namespace: longhorn-system
spec:
dataEngine: v2
```
4. User can observe the nodes in the cluster being upgraded one by one. During the upgrade of a node’s v2 data engine, an `NodeDataEngineUpgrade` resource for the upgrading node is created. The old instance manager and its pod are deleted, causing the replicas to enter an `error` state. The default instance manager pod then starts and transitions to a `running` state, after which the replicas in the `error` state are automatically rebuilt and becomes `running`. If the upgrade process is stalled, users can check the status of the `NodeDataEngineUpgrade` resource to troubleshoot issues and resume the upgrade process.
## CRDs
### New CRDs
Two CRDs, `DataEngineUpgradeManager` and `NodeDataEngineUpgrade`, are introduced. The `DataEngineUpgradeManager` controls and oversees the overall v2 data engine upgrade process and provides a global view of the upgrade status, while the `NodeDataEngineUpgrade` manages the upgrade process at the node level.
### DataEngineUpgradeManager
```
spec:
DataEngine DataEngineType
Nodes []string
status:
OwnerID string
InstanceManagerImage string
State UpgradeState
ErrorMessage string
UpgradingNode string
UpgradeNodes map[string]*UpgradeNodeStatus
```
### NodeDataEngineUpgrade
```
pec:
NodeID string
DataEngine DataEngineType
InstanceManagerImage string
DataEngineUpgradeManager string
status:
OwnerID string
Volumes map[string]*VolumeUpgradeStatus
State UpgradeState
ErrorMessage string
```
### CRD Changes
### Webhook Validation
1. When the v2 data engine of a node is being upgraded (i.e., `node.spec.dataEngineUpgradeRequested` is `true`), the `node.spec.nodeID` will be rejected for volume attachment.
### Webhook Mutation
## Controllers
### DataEngineUpgradeManager Controller
- Create one `DataEngineUpgradeManager` CR for the whole cluster
- Monitor the `DataEngineUpgradeManager` CRs
### NodeDataEngineUpgrade Controller
- Perform the node upgrade process for the v2 data engine
## Design
### Architecture Diagram
![Architecture Diagram](image/v2-data-engine-live-upgrade/arch-diagram.svg)
### Work Flow and State Transition
1. Create a `DataEngineUpgradeManager` resource
2. According to `DataEngineUpgradeManager.Status.State`, `DataEngineUpgradeManager Controller` does
- `undefined`
1. Update `status.instanceManagerImage` from `defaultInstanceManagerImage` setting
2. Check whether the default instance manager image has be pulled on each Longhorn node. If not, update `status.State` to `error` and `status.ErrorMessage`
3. Update `status.state` to `initializing`
- `initializing`
1. Update `status.upgradeNodes`
- If `spec.nodes` is empty, list all nodes and add them to `status.upgradeNodes`
- If `spec.nodes` is not empty, list the nodes in the `spec.nodes` and add them to `status.upgradeNodes`
2. Then, update the `status.currentState` to `upgrading`
- `upgrading`
1. Iterate `status.upgradeNodes`
2. Find a node that is not upgraded and create a `nodeDataEngineUpgrade` resource
3. Set the node to `status.upgradingNode`
4. Update the `status.currentState` to `completed` if all nodes are finished
- `completed`
3. According to `NodeDataEngineUpgrade.status.state`, `NodeDataEngineUpgrade controller` does
- `undefined`
1. Update `status.currentState` to `initializing`
- `initializing`
1. Set upgrading node to **unscheduable** by setting `node.spec.upgradeRequested` to `true`
2. List volumes on upgrading node
3. Snapshot all volumes
4. Update `status.volumes`
5. Update `status.currentState` to `switching-over`
- `switching-over`
1. Find a available node for target instance replacement creation
2. Update attached volumes: update `volume.spec.targetNodeID` and `volume.spec.image`
3. Waiting for the target instances of attached volumes are switched over
- `upgrading-instance-manager`
1. Delete old instance-manager
2. Waiting for the default instance-manager becomes running
- `switching-back`
1. Update attached volumes: update `volume.spec.targetNodeID` as `volume.spec.NodeID`
2. Waiting for the attached volumes are switched backup
- `rebuilding-replica`
1. Waiting for the volumes become `healthy`
2. Waiting for the replicas become `running`
- `completed`
### State Transition of NodeDataEngineUpgrade
![State Transition of NodeDataEngineUpgrade](image/v2-data-engine-live-upgrade/state-transition.svg)
## Test Plan
1. Test the different setups that fail to upgrade
- Case 1
Reason: single replica of vol-1
| node 1 | node 2 | node 3 |
| --- | --- | --- |
| **vol-1-e** | | |
| vol-1-r-1 | | |
| **vol-2-e** | | |
| vol-2-r-1 | vol-2-r-2 | vol-2-r-3 |
| **vol-3-e** | | |
| vol-3-r-1 | vol-3-r-2 | vol-3-r-3 |
- Case 2
Reason: single replica of vol-1
| node 1 | node 2 | node 3 |
| --- | --- | --- |
| **vol-1-e** | | |
| | vol-1-r-1 | |
| **vol-2-e** | | |
| vol-2-r-1 | vol-2-r-2 | vol-2-r-3 |
| **vol-3-e** | | |
| vol-3-r-1 | vol-3-r-2 | vol-3-r-3 |
2. Test the different setups that succeed to upgrade
- Case 1
| node 1 | node 2 | node 3 |
| --- | --- | --- |
| **vol-1-e** | | |
| vol-1-r-1 | vol-1-r-2 | vol-1-r-3 |
| **vol-2-e** | | |
| vol-2-r-1 | vol-2-r-2 | vol-2-r-3 |
| **vol-3-e** | | |
| vol-3-r-1 | vol-3-r-2 | vol-3-r-3 |
- Case 2
| node 1 | node 2 | node 3 |
| --- | --- | --- |
| **vol-1-e** | | |
| | vol-1-r-1 | vol-1-r-2 |
| **vol-2-e** | | |
| | vol-2-r-1 | vol-2-r-2 |
| **vol-3-e** | | |
| | vol-3-r-1 | vol-3-r-2 |
- Case 3
| node 1 | node 2 | node 3 |
| --- | --- | --- |
| **vol-1-e** | | |
| vol-1-r-1 | vol-1-r-2 | vol-1-r-3 |
| | **vol-2-e** | |
| vol-2-r-1 | vol-2-r-2 | vol-2-r-3 |
| | | **vol-3-e** |
| vol-3-r-1 | vol-3-r-2 | vol-3-r-3 |
## Note \[optional\]
- [ ] Handle replica replenishment (Volume is being upgraded, skip replenishReplicas)
- [ ] Upgrading node shouldn’t support backup (handled in the isResponsibleFor())
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 6643586

Please sign in to comment.