Skip to content

Commit

Permalink
drm/amd/amdkfd: Evict all queues even HWS remove queue failed
Browse files Browse the repository at this point in the history
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain->sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
  • Loading branch information
Yifan Zha authored and Alex Deucher committed Mar 11, 2025
1 parent 760632f commit 42c854b
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
Original file line number Diff line number Diff line change
Expand Up @@ -1221,11 +1221,13 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
decrement_queue_count(dqm, qpd, q);

if (dqm->dev->kfd->shared_resources.enable_mes) {
retval = remove_queue_mes(dqm, q, qpd);
if (retval) {
int err;

err = remove_queue_mes(dqm, q, qpd);
if (err) {
dev_err(dev, "Failed to evict queue %d\n",
q->properties.queue_id);
goto out;
retval = err;
}
}
}
Expand Down

0 comments on commit 42c854b

Please sign in to comment.