Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] Flaky test case test_setting_concurrent_volume_backup_restore_limit on v1.6.x-head for amd64 #9455

Closed
yangchiu opened this issue Sep 12, 2024 · 2 comments
Assignees
Labels
kind/test Request for adding test

Comments

@yangchiu
Copy link
Member

What's the test to develop? Please describe

Test case test_setting_concurrent_volume_backup_restore_limit is flaky on v1.6.x-head for amd64:

https://ci.longhorn.io/job/public/job/v1.6.x/job/v1.6.x-longhorn-upgrade-tests-sles-amd64/193/testReport/tests/test_settings/test_setting_concurrent_volume_backup_restore_limit_s3_/

set_random_backupstore = None
client = <longhorn.Client object at 0x7f7e995b83d0>
volume_name = 'longhorn-testvol-3539nw'

    def test_setting_concurrent_volume_backup_restore_limit(set_random_backupstore, client, volume_name):  # NOQA
        """
    
        Scenario: setting Concurrent Volume Backup Restore Limit
                  should limit the concurrent volume backup restoring
    
        Issue: https://github.com/longhorn/longhorn/issues/4558
    
        Given/When see:
          setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test
    
        Then Number of restoring volumes per node not exceed the setting value.
        """
>       setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test(
            client, volume_name
        )

test_settings.py:1021: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_settings.py:1005: in setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test
    wait_for_volume_restoration_completed(client, restore_volume_name)
common.py:4670: in wait_for_volume_restoration_completed
    monitor_restore_progress(client, name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

client = <longhorn.Client object at 0x7f7e995b83d0>
volume_name = 'longhorn-testvol-3539nw-restore-11'

    def monitor_restore_progress(client, volume_name):
        completed = 0
        rs = {}
        for i in range(RETRY_COUNTS_LONG):
            completed = 0
            v = client.by_id_volume(volume_name)
            rs = v.restoreStatus
            for r in rs:
                assert r.error == ""
                if r.state == "complete":
                    assert r.progress == 100
                    completed += 1
            if completed == len(rs):
                break
            time.sleep(RETRY_INTERVAL)
>       assert completed == len(rs)
E       AssertionError

common.py:3281: AssertionError

Because the restoration takes longer than expected. It might be related to #9439, but I'll increase the timeout for now to make it pass.

Describe the tasks for the test

Additional context

@yangchiu
Copy link
Member Author

Based on the debug console log of https://ci.longhorn.io/job/private/job/longhorn-tests-regression/7568/consoleFull, a replica restoration could take more than 10 minutes to start:

wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (870)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (871)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (872)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (873)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (874)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (875)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (876)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = , r.progress = 0 ... (877)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 9 ... (878)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 9 ... (879)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 9 ... (880)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 9 ... (881)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 9 ... (882)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 25 ... (883)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 25 ... (884)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 25 ... (885)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 25 ... (886)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 25 ... (887)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 44 ... (888)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 44 ... (889)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 44 ... (890)
wait for volume longhorn-testvol-20ngs4-restore-14 restored r.state = in_progress, r.progress = 44 ... (891)

This means we'd need to increase the timeout of our test cases to exceed 10 minutes in order to pass, which doesn’t seem very reasonable. I'll hold off for now and see if we can discover anything in #9439.

@github-project-automation github-project-automation bot moved this from To do to Done in QA Sprint Sep 18, 2024
@github-project-automation github-project-automation bot moved this from New Issues to Closed in Longhorn Sprint Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test Request for adding test
Projects
Status: Closed
Status: Closed
Development

No branches or pull requests

1 participant