Skip to content

Commit

Permalink
drm/amdkfd: fix the hang caused by the write reorder to fence_addr
Browse files Browse the repository at this point in the history
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
  • Loading branch information
Victor Zhao authored and Alex Deucher committed Oct 22, 2024
1 parent 9343b90 commit 8834456
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
Original file line number Diff line number Diff line change
Expand Up @@ -2048,7 +2048,7 @@ int amdkfd_fence_wait_timeout(struct device_queue_manager *dqm,
{
unsigned long end_jiffies = msecs_to_jiffies(timeout_ms) + jiffies;
struct device *dev = dqm->dev->adev->dev;
uint64_t *fence_addr = dqm->fence_addr;
volatile uint64_t *fence_addr = dqm->fence_addr;

while (*fence_addr != fence_value) {
/* Fatal err detected, this response won't come */
Expand Down Expand Up @@ -2254,6 +2254,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
goto out;

*dqm->fence_addr = KFD_FENCE_INIT;
mb();
pm_send_query_status(&dqm->packet_mgr, dqm->fence_gpu_addr,
KFD_FENCE_COMPLETED);
/* should be timed out */
Expand Down
2 changes: 1 addition & 1 deletion drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ struct device_queue_manager {
uint16_t vmid_pasid[VMID_NUM];
uint64_t pipelines_addr;
uint64_t fence_gpu_addr;
uint64_t *fence_addr;
volatile uint64_t *fence_addr;
struct kfd_mem_obj *fence_mem;
bool active_runlist;
int sched_policy;
Expand Down

0 comments on commit 8834456

Please sign in to comment.