Skip to content

Commit

Permalink
Merge tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm…
Browse files Browse the repository at this point in the history
…/drm

Pul drm fixes from Dave Airlie:
 "Regular weekly fixes, mostly amdgpu and xe. One nouveau fix is a
  better fix for the deadlock and also helps with a sync race we were
  seeing.

  dma-buf:
   - heaps CMA page accounting fix

  virtio-gpu:
   - fix segment size

  xe:
   - A crash fix
   - A fix for an assert due to missing mem_acces ref
   - Only allow a single user-fence per exec / bind.
   - Some sparse warning fixes
   - Two fixes for compilation failures on various odd combinations of
     gcc / arch pointed out on LKML.
   - Fix a fragile partial allocation pointed out on LKML.
   - A sysfs ABI documentation warning fix

  amdgpu:
   - Fix reboot issue seen on some 7000 series dGPUs
   - Fix client init order for KFD
   - Misc display fixes
   - USB-C fix
   - DCN 3.5 fixes
   - Fix issues with GPU scheduler and GPU reset
   - GPU firmware loading fix
   - Misc fixes
   - GC 11.5 fix
   - VCN 4.0.5 fix
   - IH overflow fix

  amdkfd:
   - SVM fixes
   - Trap handler fix
   - Fix device permission lookup
   - Properly reserve BO before validating it

  nouveau:
   - fence/irq lock deadlock fix (second attempt)
   - gsp command size fix

* tag 'drm-fixes-2024-02-03' of git://anongit.freedesktop.org/drm/drm: (35 commits)
  nouveau: offload fence uevents work to workqueue
  nouveau/gsp: use correct size for registry rpc.
  drm/amdgpu/pm: Use inline function for IP version check
  drm/hwmon: Fix abi doc warnings
  drm/xe: Make all GuC ABI shift values unsigned
  drm/xe/vm: Subclass userptr vmas
  drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines
  drm/xe: Don't use __user error pointers
  drm/xe: Annotate mcr_[un]lock()
  drm/xe: Only allow 1 ufence per exec / bind IOCTL
  drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms
  drm/xe: Fix crash in trace_dma_fence_init()
  drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
  drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend
  drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0
  drm/amdkfd: reserve the BO before validating it
  drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
  drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'
  drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'
  drm/amd: Don't init MEC2 firmware when it fails to load
  ...
  • Loading branch information
Linus Torvalds committed Feb 2, 2024
2 parents eab5c86 + 39126ab commit 9c2f033
Show file tree
Hide file tree
Showing 72 changed files with 475 additions and 403 deletions.
14 changes: 7 additions & 7 deletions Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
What: /sys/devices/.../hwmon/hwmon<i>/in0_input
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/in0_input
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Description: RO. Current Voltage in millivolt.

Only supported for particular Intel i915 graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_max
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Expand All @@ -20,15 +20,15 @@ Description: RW. Card reactive sustained (PL1/Tau) power limit in microwatts.

Only supported for particular Intel i915 graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_rated_max
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Description: RO. Card default power limit (default TDP setting).

Only supported for particular Intel i915 graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max_interval
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Expand All @@ -37,7 +37,7 @@ Description: RW. Sustained power limit interval (Tau in PL1/Tau) in

Only supported for particular Intel i915 graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_crit
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_crit
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Expand All @@ -50,7 +50,7 @@ Description: RW. Card reactive critical (I1) power limit in microwatts.

Only supported for particular Intel i915 graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/curr1_crit
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Expand All @@ -63,7 +63,7 @@ Description: RW. Card reactive critical (I1) power limit in milliamperes.

Only supported for particular Intel i915 graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/energy1_input
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/energy1_input
Date: February 2023
KernelVersion: 6.2
Contact: intel-gfx@lists.freedesktop.org
Expand Down
14 changes: 7 additions & 7 deletions Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
What: /sys/devices/.../hwmon/hwmon<i>/power1_max
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Expand All @@ -12,15 +12,15 @@ Description: RW. Card reactive sustained (PL1) power limit in microwatts.

Only supported for particular Intel xe graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_rated_max
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RO. Card default power limit (default TDP setting).

Only supported for particular Intel xe graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_crit
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_crit
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Expand All @@ -33,7 +33,7 @@ Description: RW. Card reactive critical (I1) power limit in microwatts.

Only supported for particular Intel xe graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/curr1_crit
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/curr1_crit
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Expand All @@ -44,23 +44,23 @@ Description: RW. Card reactive critical (I1) power limit in milliamperes.
the operating frequency if the power averaged over a window
exceeds this limit.

What: /sys/devices/.../hwmon/hwmon<i>/in0_input
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/in0_input
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RO. Current Voltage in millivolt.

Only supported for particular Intel xe graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/energy1_input
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/energy1_input
Date: September 2023
KernelVersion: 6.5
Contact: intel-xe@lists.freedesktop.org
Description: RO. Energy input of device in microjoules.

Only supported for particular Intel xe graphics platforms.

What: /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
What: /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max_interval
Date: October 2023
KernelVersion: 6.6
Contact: intel-xe@lists.freedesktop.org
Expand Down
7 changes: 3 additions & 4 deletions drivers/dma-buf/heaps/cma_heap.c
Original file line number Diff line number Diff line change
Expand Up @@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
if (vmf->pgoff > buffer->pagecount)
return VM_FAULT_SIGBUS;

vmf->page = buffer->pages[vmf->pgoff];
get_page(vmf->page);

return 0;
return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
}

static const struct vm_operations_struct dma_heap_vm_ops = {
Expand All @@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
return -EINVAL;

vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);

vma->vm_ops = &dma_heap_vm_ops;
vma->vm_private_data = buffer;

Expand Down
32 changes: 21 additions & 11 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,31 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work)
static const struct drm_client_funcs kfd_client_funcs = {
.unregister = drm_client_release,
};

int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev)
{
int ret;

if (!adev->kfd.init_complete)
return 0;

ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd",
&kfd_client_funcs);
if (ret) {
dev_err(adev->dev, "Failed to init DRM client: %d\n",
ret);
return ret;
}

drm_client_register(&adev->kfd.client);

return 0;
}

void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
{
int i;
int last_valid_bit;
int ret;

amdgpu_amdkfd_gpuvm_init_mem_limits();

Expand All @@ -164,12 +184,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
.enable_mes = adev->enable_mes,
};

ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd", &kfd_client_funcs);
if (ret) {
dev_err(adev->dev, "Failed to init DRM client: %d\n", ret);
return;
}

/* this is going to have a few of the MSBs set that we need to
* clear
*/
Expand Down Expand Up @@ -208,10 +222,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)

adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev,
&gpu_resources);
if (adev->kfd.init_complete)
drm_client_register(&adev->kfd.client);
else
drm_client_release(&adev->kfd.client);

amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;

Expand Down
4 changes: 3 additions & 1 deletion drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,8 @@ int amdgpu_queue_mask_bit_to_set_resource_bit(struct amdgpu_device *adev,
struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
struct mm_struct *mm,
struct svm_range_bo *svm_bo);

int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev);
#if defined(CONFIG_DEBUG_FS)
int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
#endif
Expand Down Expand Up @@ -301,7 +303,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev,
struct kgd_mem *mem, void *drm_priv);
int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv);
void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
int amdgpu_amdkfd_gpuvm_sync_memory(
struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
Expand Down
2 changes: 1 addition & 1 deletion drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
for (i = 0; i < adev->gfx.num_compute_rings; i++) {
struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];

if (!(ring && drm_sched_wqueue_ready(&ring->sched)))
if (!amdgpu_ring_sched_ready(ring))
continue;

/* stop secheduler and drain ring. */
Expand Down
20 changes: 17 additions & 3 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
Original file line number Diff line number Diff line change
Expand Up @@ -2085,21 +2085,35 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
return ret;
}

void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
{
struct kfd_mem_attachment *entry;
struct amdgpu_vm *vm;
int ret;

vm = drm_priv_to_vm(drm_priv);

mutex_lock(&mem->lock);

ret = amdgpu_bo_reserve(mem->bo, true);
if (ret)
goto out;

list_for_each_entry(entry, &mem->attachments, list) {
if (entry->bo_va->base.vm == vm)
kfd_mem_dmaunmap_attachment(mem, entry);
if (entry->bo_va->base.vm != vm)
continue;
if (entry->bo_va->base.bo->tbo.ttm &&
!entry->bo_va->base.bo->tbo.ttm->sg)
continue;

kfd_mem_dmaunmap_attachment(mem, entry);
}

amdgpu_bo_unreserve(mem->bo);
out:
mutex_unlock(&mem->lock);

return ret;
}

int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
Expand Down
8 changes: 4 additions & 4 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1678,7 +1678,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;
drm_sched_wqueue_stop(&ring->sched);
}
Expand All @@ -1694,7 +1694,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;
drm_sched_wqueue_start(&ring->sched);
}
Expand Down Expand Up @@ -1916,8 +1916,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)

ring = adev->rings[val];

if (!ring || !ring->funcs->preempt_ib ||
!drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring) ||
!ring->funcs->preempt_ib)
return -EINVAL;

/* the last preemption failed */
Expand Down
36 changes: 13 additions & 23 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
Original file line number Diff line number Diff line change
Expand Up @@ -4121,23 +4121,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
}
}
} else {
switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
case IP_VERSION(13, 0, 0):
case IP_VERSION(13, 0, 7):
case IP_VERSION(13, 0, 10):
r = psp_gpu_reset(adev);
break;
default:
tmp = amdgpu_reset_method;
/* It should do a default reset when loading or reloading the driver,
* regardless of the module parameter reset_method.
*/
amdgpu_reset_method = AMD_RESET_METHOD_NONE;
r = amdgpu_asic_reset(adev);
amdgpu_reset_method = tmp;
break;
}

tmp = amdgpu_reset_method;
/* It should do a default reset when loading or reloading the driver,
* regardless of the module parameter reset_method.
*/
amdgpu_reset_method = AMD_RESET_METHOD_NONE;
r = amdgpu_asic_reset(adev);
amdgpu_reset_method = tmp;
if (r) {
dev_err(adev->dev, "asic reset on init failed\n");
goto failed;
Expand Down Expand Up @@ -5031,7 +5021,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;

spin_lock(&ring->sched.job_list_lock);
Expand Down Expand Up @@ -5170,7 +5160,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;

/* Clear job fence from fence drv to avoid force_completion
Expand Down Expand Up @@ -5637,7 +5627,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = tmp_adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;

drm_sched_stop(&ring->sched, job ? &job->base : NULL);
Expand Down Expand Up @@ -5706,7 +5696,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = tmp_adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;

drm_sched_start(&ring->sched, true);
Expand Down Expand Up @@ -6061,7 +6051,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;

drm_sched_stop(&ring->sched, NULL);
Expand Down Expand Up @@ -6189,7 +6179,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];

if (!ring || !drm_sched_wqueue_ready(&ring->sched))
if (!amdgpu_ring_sched_ready(ring))
continue;

drm_sched_start(&ring->sched, true);
Expand Down
4 changes: 4 additions & 0 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
Original file line number Diff line number Diff line change
Expand Up @@ -2255,6 +2255,10 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
if (ret)
goto err_pci;

ret = amdgpu_amdkfd_drm_client_create(adev);
if (ret)
goto err_pci;

/*
* 1. don't init fbdev on hw without DCE
* 2. don't init fbdev if there are no connectors
Expand Down
Loading

0 comments on commit 9c2f033

Please sign in to comment.