Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
drm/amd/amdgpu: fix corner case in SRIOV tdr
Browse files Browse the repository at this point in the history
[Why]
In SRIOV multi-vf, after using ordered workqueue for tdr, there could be
a chance that a ring timeout continuously makes an innocent ring
timeout.

[How]
1. Use advance tdr mode in SRIOV as default
2. Use mdelay in flr work to make sure the waiting won't exceeds ring
   timeout.

Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Jingwen Chen authored and Jingwen Chen committed Sep 29, 2021
1 parent 87e71bc commit 4ff45ec
Showing 3 changed files with 6 additions and 2 deletions.
4 changes: 4 additions & 0 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
Original file line number Diff line number Diff line change
@@ -63,6 +63,10 @@ void amdgpu_virt_init_setting(struct amdgpu_device *adev)
#endif
adev->cg_flags = 0;
adev->pg_flags = 0;

/*use advance recovery mode for SRIOV*/
if (amdgpu_gpu_recovery)
amdgpu_gpu_recovery = 2;
}

void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
2 changes: 1 addition & 1 deletion drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
Original file line number Diff line number Diff line change
@@ -265,7 +265,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
if (xgpu_ai_mailbox_peek_msg(adev) == IDH_FLR_NOTIFICATION_CMPL)
goto flr_done;

msleep(10);
mdelay(10);
timeout -= 10;
} while (timeout > 1);

2 changes: 1 addition & 1 deletion drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
Original file line number Diff line number Diff line change
@@ -294,7 +294,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
if (xgpu_nv_mailbox_peek_msg(adev) == IDH_FLR_NOTIFICATION_CMPL)
goto flr_done;

msleep(10);
mdelay(10);
timeout -= 10;
} while (timeout > 1);

0 comments on commit 4ff45ec

Please sign in to comment.