Skip to content

Commit

Permalink
drm/i915/guc: Flag an error if an engine reset fails
Browse files Browse the repository at this point in the history
If GuC encounters an error during engine reset, the i915 driver
promotes to full GT reset. This includes an info message about why the
reset is happening. However, that is not treated as a failure by any
of the CI systems because resets are an expected occurrance during
testing. This kind of failure is a major problem and should never
happen. So, complain more loudly and make sure CI notices.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211211065859.2248188-4-John.C.Harrison@Intel.com
  • Loading branch information
John Harrison authored and John Harrison committed Dec 20, 2021
1 parent 0dd8674 commit fb3965f
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
Original file line number Diff line number Diff line change
Expand Up @@ -4033,11 +4033,12 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len)
{
struct intel_engine_cs *engine;
struct intel_gt *gt = guc_to_gt(guc);
u8 guc_class, instance;
u32 reason;

if (unlikely(len != 3)) {
drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
drm_err(&gt->i915->drm, "Invalid length %u", len);
return -EPROTO;
}

Expand All @@ -4047,12 +4048,19 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,

engine = guc_lookup_engine(guc, guc_class, instance);
if (unlikely(!engine)) {
drm_err(&guc_to_gt(guc)->i915->drm,
drm_err(&gt->i915->drm,
"Invalid engine %d:%d", guc_class, instance);
return -EPROTO;
}

intel_gt_handle_error(guc_to_gt(guc), engine->mask,
/*
* This is an unexpected failure of a hardware feature. So, log a real
* error message not just the informational that comes with the reset.
*/
drm_err(&gt->i915->drm, "GuC engine reset request failed on %d:%d (%s) because 0x%08X",
guc_class, instance, engine->name, reason);

intel_gt_handle_error(gt, engine->mask,
I915_ERROR_CAPTURE,
"GuC failed to reset %s (reason=0x%08x)\n",
engine->name, reason);
Expand Down

0 comments on commit fb3965f

Please sign in to comment.