Skip to content

Set module parameter gpu_recovery=1 for amdgpu #303

Merged
merged 1 commit into from
Jan 27, 2023
Merged

Conversation

donald
Copy link
Collaborator

@donald donald commented Jan 27, 2023

Experimentally set gpu_recovery=1 to maybe prevent a black screen after

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=16634335, emitted seq=16634337
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 718 thread Xorg:cs0 pid 719
kernel: amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.

Already active on fenchurch:

buczek@fenchurch:~$ cat /sys/module/amdgpu/parameters/gpu_recovery 
1

Experimentally set gpu_recovery=1 to maybe prevent a black screen after

    kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=16634335, emitted seq=16634337
    kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 718 thread Xorg:cs0 pid 719
    kernel: amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.
@donald donald merged commit da10da9 into master Jan 27, 2023
@donald
Copy link
Collaborator Author

donald commented Feb 27, 2023

Instead of

fenchurch kernel: [1230177.266745] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=16634335, emitted seq=16634337
fenchurch kernel: [1230177.277767] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 718 thread Xorg:cs0 pid 719
fenchurch kernel: [1230177.288856] amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.

and a black screen, we now get

fenchurch kernel: [2626466.848775] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=44663384, emitted seq=44663386
fenchurch kernel: [2626466.859818] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 721 thread Xorg:cs0 pid 731
fenchurch kernel: [2626466.871181] amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
fenchurch kernel: [2626467.426971] amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset
fenchurch kernel: [2626467.439013] amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
fenchurch kernel: [2626467.446512] [drm] PCIE gen 3 link speeds already enabled
fenchurch kernel: [2626467.452714] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F4012E6000).
fenchurch kernel: [2626467.461860] [drm] VRAM is lost due to GPU reset!
fenchurch kernel: [2626468.109263] [drm] UVD initialized successfully.
fenchurch kernel: [2626483.829099] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626483.859275] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626483.912989] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626483.928472] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626487.680371] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626487.695090] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626499.264495] amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow start
fenchurch kernel: [2626499.271678] amdgpu 0000:01:00.0: amdgpu: recover vram bo from shadow done
fenchurch kernel: [2626499.278642] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.282566] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.286498] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.286619] DMAR: DRHD: handling fault status reg 3
fenchurch kernel: [2626499.290431] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.295474] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xac077000 [fault reason 0x06] PTE Read access is not set
fenchurch kernel: [2626499.299400] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.315303] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.319226] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.319487] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
fenchurch kernel: [2626499.323150] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.334983] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.338904] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.342828] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.346751] [drm] Skip scheduling IBs!
fenchurch kernel: [2626499.408545] amdgpu 0000:01:00.0: amdgpu: GPU reset(1) succeeded!

and zillions of "ERROR Failed to initialize parser -125!" and "amdgpu_cs_ioctl: 102 callbacks suppressed" messages and a black screen.

This is not really an improvement. I'm going to revert the commit.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant