feat(v2/regression): add v2 test cases #2247

derekbit · 2025-01-06T08:43:29Z

Which issue(s) this PR fixes:

Issue

longhorn/longhorn#10136
longhorn/longhorn#10139

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Tests
- Added new test cases for backup functionality
- Verified backup system behavior during replica unavailability
- Tested backup block deletion process
- Expanded test coverage for volume backup operations

coderabbitai · 2025-01-06T08:43:38Z

Walkthrough

This pull request introduces two new test functions in the manager/integration/tests/test_basic.py file to enhance the testing of Longhorn's backup functionality. The first test, test_backup_status_for_unavailable_replicas, verifies the backup system's behavior when a replica responsible for a backup becomes unavailable. The second test, test_backup_block_deletion, checks the correct deletion of non-referenced backup blocks while maintaining backup process integrity.

Changes

File	Change Summary
`manager/integration/tests/test_basic.py`	Added two new test functions: - `test_backup_status_for_unavailable_replicas`: Tests backup status when a replica becomes unavailable - `test_backup_block_deletion`: Validates backup block deletion process

Assessment against linked issues

Objective	Addressed	Explanation
Replica address in backup status [#10136]	✅

Possibly related PRs

Add case test_metric_longhorn_backup #2145: Validates backup metrics, related to backup functionality testing
test(integration/system-backup): check latest backup in if-not-present volume backup policy #2154: Modifies system backup and restore tests
test(integration): skip failed replica validation in test_allow_volume_creation_with_degraded_availability_restore #2174: Validates volume replica testing during backup scenarios

Suggested reviewers

c3y1huang
ChanYiLin
yangchiu

Poem

🐰 In the realm of backups, where data takes flight,
Replicas dance, some vanish from sight
Our tests stand guard, with vigilance keen
Ensuring no block is lost, no status unseen
A rabbit's code, robust and bright! 🔍

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Longhorn 10136 Longhorn 10139 Signed-off-by: Derek Su <derek.su@suse.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

manager/integration/tests/test_basic.py (4)

Line range hint 534-614: Add more assertions to verify backup error state

The test case has good coverage of the backup failure scenario, but could be enhanced with additional assertions to verify:

The specific error message content in the backup status
The backup state transitions (e.g., from InProgress to Error)
The volume state during and after the replica failure

 def backup_status_for_unavailable_replicas_test(client, volume_name,  # NOQA
                                                 size, backing_image=""):  # NOQA
     volume = create_and_check_volume(client, volume_name,
                                      num_of_replicas=2,
                                      size=str(size),
                                      backing_image=backing_image)
+    # Verify initial volume state
+    assert volume.ready
+    assert volume.robustness == "healthy"
+
     lht_hostId = get_self_host_id()
     volume = volume.attach(hostId=lht_hostId)
     volume = common.wait_for_volume_healthy(client, volume_name)

     # write data to the volume
     volume_endpoint = get_volume_endpoint(volume)

     # Take snapshots without overlapping
     data_size = size/Mi
     write_volume_dev_random_mb_data(
         volume_endpoint, 0, data_size)

     # create a snapshot and backup
     snap = create_snapshot(client, volume_name)
     volume.snapshotBackup(name=snap.name)
     bv, b = find_backup(client, volume_name, snap.name)
     backup_id = b.id

     # find the replica for this backup
     replica_name = find_replica_for_backup(client, volume_name, backup_id)

     # disable scheduling on that node
     volume = client.by_id_volume(volume_name)
     for r in volume.replicas:
         if r.name == replica_name:
             node = client.by_id_node(r.hostId)
             node = set_node_scheduling(client, node, allowScheduling=False)
             common.wait_for_node_update(client, node.id,
                                         "allowScheduling", False)
     assert node

     # remove the replica with the backup
     volume.replicaRemove(name=replica_name)
     volume = common.wait_for_volume_degraded(client, volume_name)

+    # Verify backup transitions to Error state with expected message
+    def verify_backup_error(b):
+        return (b.id == backup_id and
+                b.state == "Error" and
+                "Failed to get backup status" in b.error)
+
     # now the backup status should in an Error state and with an error message
-    def backup_failure_predicate(b):
-        return b.id == backup_id and "Error" in b.state and b.error != ""
     volume = common.wait_for_backup_state(client, volume_name,
-                                          backup_failure_predicate)
+                                          verify_backup_error)

     # re enable scheduling on the previously disabled node
     node = client.by_id_node(node.id)
     node = set_node_scheduling(client, node, allowScheduling=True)
     common.wait_for_node_update(client, node.id,
                                 "allowScheduling", True)

Line range hint 615-717: Improve test documentation and error handling

The test case for backup block deletion has good coverage but could benefit from:

More detailed docstring explaining the test scenarios
Better error handling for backup operations
Additional assertions for backup block counts

 def test_backup_block_deletion(set_random_backupstore, client, core_api, volume_name):  # NOQA
     """
     Test backup block deletion
 
     Context:
 
     We want to make sure that we only delete non referenced backup blocks,
     we also don't want to delete blocks while there other backups in progress.
     The reason for this is that we don't yet know which blocks are required by
     the in progress backup, so blocks deletion could lead to a faulty backup.
+
+    Test Scenarios:
+    1. Normal backup block deletion when no backups are in progress
+    2. Block deletion prevention when a backup is in progress
+    3. Error handling for failed backup operations
+    4. Verification of block counts before and after deletion
 
     Setup:
 
     1. Setup minio as S3 backupstore
 
     Steps:
     ...
     """
     backup_store_type = set_random_backupstore
     if backup_store_type not in ["nfs", "s3"]:
         pytest.skip("Skip test case because the backup store type is not supported") # NOQA

     backupstore_cleanup(client)

     volume = create_and_check_volume(client, volume_name)
     host_id = get_self_host_id()
     volume = volume.attach(hostId=host_id)
     volume = common.wait_for_volume_healthy(client, volume_name)

+    try:
         data0 = {'pos': 0,
                  'len': 2 * BACKUP_BLOCK_SIZE,
                  'content': common.generate_random_data(2 * BACKUP_BLOCK_SIZE)}
         _, backup0, _, data0 = create_backup(
             client, volume_name, data0)
+    except Exception as e:
+        pytest.fail(f"Failed to create initial backup: {str(e)}")

     data1 = {'pos': 0,
              'len': BACKUP_BLOCK_SIZE,
              'content': common.generate_random_data(BACKUP_BLOCK_SIZE)}
     _, backup1, _, data1 = create_backup(client, volume_name, data1)

     data2 = {'pos': 0,
              'len': BACKUP_BLOCK_SIZE,
              'content': common.generate_random_data(BACKUP_BLOCK_SIZE)}
     backup_volume, backup2, _, data2 = create_backup(client, volume_name, data2)

     backup_blocks_count = backupstore_count_backup_block_files(client,
                                                                core_api,
                                                                volume_name)
+    # Verify initial block count
     assert backup_blocks_count == 4

     bvs = client.list_backupVolume()
+    # Verify backup volume data stored size
+    backup_volume_found = False
     for bv in bvs:
         if bv['name'] == backup_volume.name:
+            backup_volume_found = True
             assert bv['dataStored'] == \
                 str(backup_blocks_count * BACKUP_BLOCK_SIZE)
+    assert backup_volume_found, "Backup volume not found"

Line range hint 1480-1533: Add cleanup and improve error handling in restore test

The restore test case could be enhanced with:

Better cleanup in case of test failures
More robust error handling for backup operations
Additional assertions for data verification

 @pytest.mark.v2_volume_test  # NOQA
 @pytest.mark.coretest   # NOQA
-def test_restore_inc(set_random_backupstore, client, core_api, volume_name, pod):  # NOQA
+def test_restore_inc(set_random_backupstore, client, core_api, volume_name, pod, request):  # NOQA
     """
     Test restore from disaster recovery volume (incremental restore)
 
     Run test against all the backupstores
+
+    Test Scenarios:
+    1. Basic restore functionality
+    2. Incremental restore with multiple backups
+    3. Error handling during restore
+    4. Data verification after restore
 
     1. Create a volume and attach to the current node
     2. Generate `data0`, write to the volume, make a backup `backup0`
     ...
     """
+    def cleanup():
+        # Cleanup volumes and backups in case of test failure
+        volumes_to_delete = [
+            volume_name,
+            "sb-0-" + volume_name,
+            "sb-1-" + volume_name,
+            "sb-2-" + volume_name
+        ]
+        for vol in volumes_to_delete:
+            try:
+                v = client.by_id_volume(vol)
+                if v:
+                    client.delete(v)
+                    wait_for_volume_delete(client, vol)
+            except Exception:
+                pass
+
+    request.addfinalizer(cleanup)

     restore_inc_test(client, core_api, volume_name, pod)

Line range hint 1-100: Add missing docstrings and improve test setup documentation

The test file could benefit from:

A module-level docstring explaining the test suite's purpose
Better documentation of test prerequisites and assumptions
More detailed explanation of test utilities and helper functions

Add a module-level docstring at the beginning of the file:

+"""
+Integration tests for Longhorn's basic functionality.
+
+This module contains core test cases that verify basic Longhorn operations:
+- Volume lifecycle (create, attach, detach, delete)
+- Backup and restore operations
+- Snapshot management
+- Error handling and recovery
+- Filesystem operations
+
+Prerequisites:
+- A running Kubernetes cluster
+- Longhorn installed and running
+- Storage classes configured
+- Backup stores (S3, NFS) configured if testing backup functionality
+
+Test utilities and common operations are imported from the common module.
+"""
+
 import pytest
 
 import os
 import subprocess
 ...

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e35dd45 and 6bf286f.

📒 Files selected for processing (1)

manager/integration/tests/test_basic.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: make check
GitHub Check: Summary

🔇 Additional comments (1)

manager/integration/tests/test_basic.py (1)

Line range hint 1-2247: LGTM! The test coverage looks comprehensive

The test file provides thorough coverage of Longhorn's core functionality. The new test functions for backup status and block deletion are well structured and cover important edge cases. The suggested improvements around error handling, documentation, and cleanup would be good to have but are not critical for the current functionality.

yangchiu · 2025-01-07T07:10:57Z

manager/integration/tests/test_basic.py

@@ -531,6 +531,7 @@ def snapshot_test(client, volume_name, backing_image):  # NOQA
    cleanup_volume(client, volume)


+@pytest.mark.v2_volume_test  # NOQA


@chriscchien Should we review longhorn/longhorn#9760 (comment) again to include all test cases that support v2?

The supported cases will be updated over time. I suggest we can update it later, but to prevent any confusion, the ground truth should be the label @pytest.mark.v2_volume_test in manager/integration/tests.

yangchiu · 2025-01-07T08:19:55Z

@mergify backport v1.8.x

mergify · 2025-01-07T08:20:27Z

backport v1.8.x

✅ Backports have been created

#2253 feat(v2/regression): add v2 test cases (backport #2247) has been created for branch v1.8.x

derekbit requested a review from chriscchien January 6, 2025 08:43

derekbit self-assigned this Jan 6, 2025

derekbit requested a review from a team as a code owner January 6, 2025 08:43

derekbit force-pushed the issue-10136 branch from f84ae80 to ceaef09 Compare January 6, 2025 08:43

derekbit mentioned this pull request Jan 6, 2025

[BUG] Replica address is missing in the v2 volume backup status response from spdk engine longhorn/longhorn#10136

Open

feat(v2/regression): add v2 test cases

6bf286f

Longhorn 10136 Longhorn 10139 Signed-off-by: Derek Su <derek.su@suse.com>

derekbit force-pushed the issue-10136 branch from ceaef09 to 6bf286f Compare January 7, 2025 03:08

derekbit changed the title ~~feat(v2/regression): add test_backup_status_for_unavailable_replicas~~ feat(v2/regression): add v2 test cases Jan 7, 2025

coderabbitai bot reviewed Jan 7, 2025

View reviewed changes

yangchiu reviewed Jan 7, 2025

View reviewed changes

yangchiu self-requested a review January 7, 2025 08:19

yangchiu approved these changes Jan 7, 2025

View reviewed changes

Merge branch 'master' into issue-10136

23d6ace

yangchiu merged commit 5745036 into longhorn:master Jan 7, 2025
4 of 6 checks passed

mergify bot mentioned this pull request Jan 7, 2025

feat(v2/regression): add v2 test cases (backport #2247) #2253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v2/regression): add v2 test cases #2247

feat(v2/regression): add v2 test cases #2247

derekbit commented Jan 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

yangchiu Jan 7, 2025

derekbit Jan 7, 2025

yangchiu commented Jan 7, 2025

mergify bot commented Jan 7, 2025 •

edited

Loading

		@@ -531,6 +531,7 @@ def snapshot_test(client, volume_name, backing_image): # NOQA
		cleanup_volume(client, volume)


		@pytest.mark.v2_volume_test # NOQA

feat(v2/regression): add v2 test cases #2247

feat(v2/regression): add v2 test cases #2247

Conversation

derekbit commented Jan 6, 2025 • edited by coderabbitai bot Loading

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

coderabbitai bot commented Jan 6, 2025 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

yangchiu Jan 7, 2025

Choose a reason for hiding this comment

derekbit Jan 7, 2025

Choose a reason for hiding this comment

yangchiu commented Jan 7, 2025

mergify bot commented Jan 7, 2025 • edited Loading

✅ Backports have been created

derekbit commented Jan 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

mergify bot commented Jan 7, 2025 •

edited

Loading