Skip to content

Commit

Permalink
drm/i915/guc: Don't deadlock busyness stats vs reset
Browse files Browse the repository at this point in the history
The engine busyness stats has a worker function to do things like
64bit extend the 32bit hardware counters. The GuC's reset prepare
function flushes out this worker function to ensure no corruption
happens during the reset. Unforunately, the worker function has an
infinite wait for active resets to finish before doing its work. Thus
a deadlock would occur if the worker function had actually started
just as the reset starts.

The function being used to lock the reset-in-progress mutex is called
intel_gt_reset_trylock(). However, as noted it does not follow
standard 'trylock' conventions and exit if already locked. So rename
the current _trylock function to intel_gt_reset_lock_interruptible(),
which is the behaviour it actually provides. In addition, add a new
implementation of _trylock and call that from the busyness stats
worker instead.

v2: Rename existing trylock to interruptible rather than trying to
preserve the existing (confusing) naming scheme (review comments from
Tvrtko).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-3-John.C.Harrison@Intel.com
  • Loading branch information
John Harrison authored and John Harrison committed Nov 4, 2022
1 parent de51de9 commit 178b8a3
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 4 deletions.
2 changes: 1 addition & 1 deletion drivers/gpu/drm/i915/gem/i915_gem_mman.c
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (ret)
goto err_rpm;

ret = intel_gt_reset_trylock(ggtt->vm.gt, &srcu);
ret = intel_gt_reset_lock_interruptible(ggtt->vm.gt, &srcu);
if (ret)
goto err_pages;

Expand Down
18 changes: 16 additions & 2 deletions drivers/gpu/drm/i915/gt/intel_reset.c
Original file line number Diff line number Diff line change
Expand Up @@ -1407,15 +1407,19 @@ void intel_gt_handle_error(struct intel_gt *gt,
intel_runtime_pm_put(gt->uncore->rpm, wakeref);
}

int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
static int _intel_gt_reset_lock(struct intel_gt *gt, int *srcu, bool retry)
{
might_lock(&gt->reset.backoff_srcu);
might_sleep();
if (retry)
might_sleep();

rcu_read_lock();
while (test_bit(I915_RESET_BACKOFF, &gt->reset.flags)) {
rcu_read_unlock();

if (!retry)
return -EBUSY;

if (wait_event_interruptible(gt->reset.queue,
!test_bit(I915_RESET_BACKOFF,
&gt->reset.flags)))
Expand All @@ -1429,6 +1433,16 @@ int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
return 0;
}

int intel_gt_reset_trylock(struct intel_gt *gt, int *srcu)
{
return _intel_gt_reset_lock(gt, srcu, false);
}

int intel_gt_reset_lock_interruptible(struct intel_gt *gt, int *srcu)
{
return _intel_gt_reset_lock(gt, srcu, true);
}

void intel_gt_reset_unlock(struct intel_gt *gt, int tag)
__releases(&gt->reset.backoff_srcu)
{
Expand Down
1 change: 1 addition & 0 deletions drivers/gpu/drm/i915/gt/intel_reset.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine,
void __i915_request_reset(struct i915_request *rq, bool guilty);

int __must_check intel_gt_reset_trylock(struct intel_gt *gt, int *srcu);
int __must_check intel_gt_reset_lock_interruptible(struct intel_gt *gt, int *srcu);
void intel_gt_reset_unlock(struct intel_gt *gt, int tag);

void intel_gt_set_wedged(struct intel_gt *gt);
Expand Down
4 changes: 3 additions & 1 deletion drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
Original file line number Diff line number Diff line change
Expand Up @@ -1401,7 +1401,9 @@ static void guc_timestamp_ping(struct work_struct *wrk)

/*
* Synchronize with gt reset to make sure the worker does not
* corrupt the engine/guc stats.
* corrupt the engine/guc stats. NB: can't actually block waiting
* for a reset to complete as the reset requires flushing out
* this worker thread if started. So waiting would deadlock.
*/
ret = intel_gt_reset_trylock(gt, &srcu);
if (ret)
Expand Down

0 comments on commit 178b8a3

Please sign in to comment.