Skip to content

Commit

Permalink
drm/amdkfd: KFD release_work possible circular locking
Browse files Browse the repository at this point in the history
[ Upstream commit 1b9366c ]

If waiting for gpu reset done in KFD release_work, thers is WARNING:
possible circular locking dependency detected

  #2  kfd_create_process
        kfd_process_mutex
          flush kfd release work

  #1  kfd release work
        wait for amdgpu reset work

  #0  amdgpu_device_gpu_reset
        kgd2kfd_pre_reset
          kfd_process_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock((work_completion)(&p->release_work));
                  lock((wq_completion)kfd_process_wq);
                  lock((work_completion)(&p->release_work));
   lock((wq_completion)amdgpu-reset-dev);

To fix this, KFD create process move flush release work outside
kfd_process_mutex.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
  • Loading branch information
Philip Yang authored and Greg Kroah-Hartman committed May 29, 2025
1 parent 1dd943d commit 0d8562e
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions drivers/gpu/drm/amd/amdkfd/kfd_process.c
Original file line number Diff line number Diff line change
Expand Up @@ -842,6 +842,14 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
return ERR_PTR(-EINVAL);
}

/* If the process just called exec(3), it is possible that the
* cleanup of the kfd_process (following the release of the mm
* of the old process image) is still in the cleanup work queue.
* Make sure to drain any job before trying to recreate any
* resource for this process.
*/
flush_workqueue(kfd_process_wq);

/*
* take kfd processes mutex before starting of process creation
* so there won't be a case where two threads of the same process
Expand All @@ -860,14 +868,6 @@ struct kfd_process *kfd_create_process(struct task_struct *thread)
if (process) {
pr_debug("Process already found\n");
} else {
/* If the process just called exec(3), it is possible that the
* cleanup of the kfd_process (following the release of the mm
* of the old process image) is still in the cleanup work queue.
* Make sure to drain any job before trying to recreate any
* resource for this process.
*/
flush_workqueue(kfd_process_wq);

process = create_process(thread);
if (IS_ERR(process))
goto out;
Expand Down

0 comments on commit 0d8562e

Please sign in to comment.