-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
ffa0c57
to
ed8d179
Compare
Where did you get the kernel config from? I had hoped that we might get config changes into the history by using individual commits for config changes as the kernel-build cronjob does ( see |
I got the config from Linux 5.4.5, and there all new configs were added in separate commits.
|
The monitor doesn’t come back up after it is turned off, and Linux shows the messages below. ``` 2020-01-06T07:12:21+01:00 amaru automount[498]: expired /project/miso 2020-01-06T07:20:15.939399+01:00 amaru kernel: [333722.724352] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0 2020-01-06T07:20:15.939406+01:00 amaru kernel: [333722.724352] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. 2020-01-06T07:20:15.939406+01:00 amaru kernel: [333722.724353] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel 2020-01-06T07:20:15.939407+01:00 amaru kernel: [333722.724353] drm/i915 developers can then reassign to the right component if it's not a kernel issue. 2020-01-06T07:20:15.939408+01:00 amaru kernel: [333722.724353] The GPU crash dump is required to analyze GPU hangs, so please always attach it. 2020-01-06T07:20:15.939409+01:00 amaru kernel: [333722.724353] GPU crash dump saved to /sys/class/drm/card0/error 2020-01-06T07:20:15.940403+01:00 amaru kernel: [333722.725362] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:20:15.940409+01:00 amaru kernel: [333722.726079] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} 2020-01-06T07:20:15.941346+01:00 amaru kernel: [333722.726313] i915 0000:00:02.0: Resetting chip for hang on rcs0 2020-01-06T07:20:15.942398+01:00 amaru kernel: [333722.728031] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} 2020-01-06T07:20:15.943430+01:00 amaru kernel: [333722.728741] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001} 2020-01-06T07:20:21.955508+01:00 amaru kernel: [333728.740356] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:20:29.955470+01:00 amaru kernel: [333736.740352] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:20:30.873401+01:00 amaru kernel: [333737.658366] GpuWatchdog[7339]: segfault at 0 ip 0000562464c95bdd sp 00007fc578ea07c0 error 6 in vivaldi-bin[562460c9c000+72c3000] 2020-01-06T07:20:30.873407+01:00 amaru kernel: [333737.658369] Code: 48 c1 c9 03 48 81 f9 af 00 00 00 0f 87 c9 00 00 00 48 8d 15 18 3a 91 fb f6 04 11 80 0f 84 b8 00 00 00 be 01 00 00 00 ff 50 30 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 41 d7 a3 03 01 80 7d 8f 00 2020-01-06T07:21:30+01:00 amaru automount[498]: attempting to mount entry /project/miso 2020-01-06T07:21:30+01:00 amaru automount[498]: mounted /project/miso 2020-01-06T07:21:37.923474+01:00 amaru kernel: [333804.708221] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:21:39.971410+01:00 amaru kernel: [333806.756227] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:07.939475+01:00 amaru kernel: [333834.724208] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:09.923481+01:00 amaru kernel: [333836.708210] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:43.971484+01:00 amaru kernel: [333870.756149] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:45.955484+01:00 amaru kernel: [333872.740135] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:47.939483+01:00 amaru kernel: [333874.724136] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:57.923477+01:00 amaru kernel: [333884.708106] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:22:59.971411+01:00 amaru kernel: [333886.756071] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:23:17.955476+01:00 amaru kernel: [333904.740070] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:23:19.939470+01:00 amaru kernel: [333906.724077] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:24:59.971485+01:00 amaru kernel: [334006.755948] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:25:01.955471+01:00 amaru kernel: [334008.739944] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:25:03.939495+01:00 amaru kernel: [334010.723944] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:25:25.955505+01:00 amaru kernel: [334032.739910] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:26:25.923488+01:00 amaru kernel: [334092.707821] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:26:27.971503+01:00 amaru kernel: [334094.755805] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:27:05.923475+01:00 amaru kernel: [334132.707753] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0 2020-01-06T07:27:21+01:00 amaru automount[498]: expiring path /project/miso […] ``` The [patch][1] below might fix it. > Subject: [PATCH] drm/i915/gt: Detect if we miss WaIdleLiteRestore > Date: Mon, 30 Dec 2019 11:15:30 +0000 > Message-ID: <20191230111530.3750048-1-chris@chris-wilson.co.uk> (raw) > > In order to avoid confusing the HW, we must never submit an empty ring > during lite-restore, that is we should always advance the RING_TAIL > before submitting to stay ahead of the RING_HEAD. > > Normally this is prevented by keeping a couple of spare NOPs in the > request->wa_tail so that on resubmission we can advance the tail. This > relies on the request only being resubmitted once, which is the normal > condition as it is seen once for ELSP[1] and then later in ELSP[0]. On > preemption, the requests are unwound and the tail reset back to the > normal end point (as we know the request is incomplete and therefore its > RING_HEAD is even earlier). > > However, if this w/a should fail we would try and resubmit the request > with the RING_TAIL already set to the location of this request's wa_tail > potentially causing a GPU hang. We can spot when we do try and > incorrectly resubmit without advancing the RING_TAIL and spare any > embarrassment by forcing the context restore. > > In the case of preempt-to-busy, we leave the requests running on the HW > while we unwind. As the ring is still live, we cannot rewind our > rq->tail without forcing a reload so leave it set to rq->wa_tail and > only force a reload if we resubmit after a lite-restore. (Normally, the > forced reload will be a part of the preemption event.) > > Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy") > Closes: https://gitlab.freedesktop.org/drm/intel/issues/673 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: stable@vger.kernel.org > Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk > (cherry picked from commit 82c69bf58650e644c61aa2bf5100b63a1070fd2f) [1]: https://lore.kernel.org/stable/20191230111530.3750048-1-chris@chris-wilson.co.uk/
ed8d179
to
c138f11
Compare
If you had merged I don't insist. Depends on how much time you want to spend with it. |
My bad. Everything okay. |
Tested on rabammel and inbetweenmove.