Skip to content

Commit

Permalink
drm/xe/bo: sync kernel fences for KMD buffers
Browse files Browse the repository at this point in the history
With things like pipelined evictions, VRAM pages can be marked as free
and yet still have some active kernel fences, with the idea that the
next caller to allocate the memory will respect them. However it looks
like we are missing synchronisation for KMD internal buffers, like
page-tables, lrc etc. For userspace objects we should already have the
required synchronisation for CPU access via the fault handler, and
likewise for GPU access when vm_binding them.

To fix this synchronise against any kernel fences for all KMD objects at
creation. This should resolve some severe corruption seen during
evictions.

v2 (Matt B):
  - Revamp the comment explaining this. Also mention why USAGE_KERNEL is
    correct here.
v3 (Thomas):
  - Make sure to use ctx.interruptible for the wait.

Testcase: igt@xe-evict-ccs
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/853
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/855
Reported-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
  • Loading branch information
Matthew Auld authored and Rodrigo Vivi committed Dec 21, 2023
1 parent a667cf5 commit 503a6f4
Showing 1 changed file with 31 additions and 0 deletions.
31 changes: 31 additions & 0 deletions drivers/gpu/drm/xe/xe_bo.c
Original file line number Diff line number Diff line change
Expand Up @@ -1269,6 +1269,37 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
if (err)
return ERR_PTR(err);

/*
* The VRAM pages underneath are potentially still being accessed by the
* GPU, as per async GPU clearing and async evictions. However TTM makes
* sure to add any corresponding move/clear fences into the objects
* dma-resv using the DMA_RESV_USAGE_KERNEL slot.
*
* For KMD internal buffers we don't care about GPU clearing, however we
* still need to handle async evictions, where the VRAM is still being
* accessed by the GPU. Most internal callers are not expecting this,
* since they are missing the required synchronisation before accessing
* the memory. To keep things simple just sync wait any kernel fences
* here, if the buffer is designated KMD internal.
*
* For normal userspace objects we should already have the required
* pipelining or sync waiting elsewhere, since we already have to deal
* with things like async GPU clearing.
*/
if (type == ttm_bo_type_kernel) {
long timeout = dma_resv_wait_timeout(bo->ttm.base.resv,
DMA_RESV_USAGE_KERNEL,
ctx.interruptible,
MAX_SCHEDULE_TIMEOUT);

if (timeout < 0) {
if (!resv)
dma_resv_unlock(bo->ttm.base.resv);
xe_bo_put(bo);
return ERR_PTR(timeout);
}
}

bo->created = true;
if (bulk)
ttm_bo_set_bulk_move(&bo->ttm, bulk);
Expand Down

0 comments on commit 503a6f4

Please sign in to comment.