From bc5f2ab11ca6dda4a4826e7e78d5365d7c3e1569 Mon Sep 17 00:00:00 2001 From: Rodrigo Vivi Date: Wed, 23 Sep 2015 11:32:36 -0700 Subject: [PATCH 1/4] drm/i915/skl: Don't call intel_prepare_ddi when encoder list isn't yet initialized. In case something goes wrong with power well initialization we were calling intel_prepare_ddi during boot while encoder list isnt't initilized. [ 9.618747] i915 0000:00:02.0: Invalid ROM contents [ 9.631446] [drm] failed to find VBIOS tables [ 9.720036] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000058 [ 9.721986] IP: [] ddi_get_encoder_port+0x82/0x190 [i915] [ 9.723736] PGD 0 [ 9.724286] Oops: 0000 [#1] PREEMPT SMP [ 9.725386] Modules linked in: intel_powerclamp snd_hda_intel(+) coretemp crc 32c_intel snd_hda_codec snd_hda_core serio_raw snd_pcm snd_timer i915(+) parport _pc parport pinctrl_sunrisepoint pinctrl_intel nfsd nfs_acl [ 9.730635] CPU: 0 PID: 497 Comm: systemd-udevd Not tainted 4.3.0-rc2-eywa-10 967-g72de2cfd-dirty #2 [ 9.732785] Hardware name: Intel Corporation Cannonlake Client platform/Skyla ke DT DDR4 RVP8, BIOS CNLSE2R1.R00.X021.B00.1508040310 08/04/2015 [ 9.735785] task: ffff88008a704700 ti: ffff88016a1ac000 task.ti: ffff88016a1a c000 [ 9.737584] RIP: 0010:[] [] ddi_get_enco der_port+0x82/0x190 [i915] [ 9.739934] RSP: 0000:ffff88016a1af710 EFLAGS: 00010296 [ 9.741184] RAX: 000000000000004e RBX: ffff88008a9edc98 RCX: 0000000000000001 [ 9.742934] RDX: 000000000000004e RSI: ffffffff81fc1e82 RDI: 00000000ffffffff [ 9.744634] RBP: ffff88016a1af730 R08: 0000000000000000 R09: 0000000000000578 [ 9.746333] R10: 0000000000001065 R11: 0000000000000578 R12: fffffffffffffff8 [ 9.748033] R13: ffff88016a1af7a8 R14: ffff88016a1af794 R15: 0000000000000000 [ 9.749733] FS: 00007eff2e1e07c0(0000) GS:ffff88016fc00000(0000) knlGS:00000 00000000000 [ 9.751683] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.753083] CR2: 0000000000000058 CR3: 000000016922b000 CR4: 00000000003406f0 [ 9.754782] Stack: [ 9.755332] ffff88008a9edc98 ffff88008a9ed800 ffffffffa01d07b0 00000000fffb9 09e [ 9.757232] ffff88016a1af7d8 ffffffffa0154ea7 0000000000000246 ffff88016a370 080 [ 9.759182] ffff88016a370080 ffff88008a9ed800 0000000000000246 ffff88008a9ed c98 [ 9.761132] Call Trace: [ 9.761782] [] intel_prepare_ddi+0x67/0x860 [i915] [ 9.763332] [] ? _raw_spin_unlock_irqrestore+0x26/0x40 [ 9.765031] [] ? gen9_read32+0x141/0x360 [i915] [ 9.766531] [] skl_set_power_well+0x431/0xa80 [i915] [ 9.768181] [] skl_power_well_enable+0x13/0x20 [i915] [ 9.769781] [] intel_power_well_enable+0x28/0x50 [i915] [ 9.771481] [] intel_display_power_get+0x92/0xc0 [i915] [ 9.773180] [] intel_display_set_init_power+0x3b/0x40 [i91 5] [ 9.774980] [] intel_power_domains_init_hw+0x120/0x520 [i9 15] [ 9.776780] [] i915_driver_load+0xb21/0xf40 [i915] So let's protect this case. My first attempt was to remove the intel_prepare_ddi, but Daniel had pointed out this is really needed to restore those registers values. And Imre pointed out that this case was without the flag protection and this was actually where things were going bad. So I've just checked and this indeed solves my issue. The regressing intel_prepare_ddi call was added in commit 1d2b9526a790d55b7ae870934a74937081f62de2 Author: Damien Lespiau Date: Fri Mar 6 18:50:53 2015 +0000 drm/i915/skl: Restore the DDI translation tables when enabling PW1 Cc: Imre Deak Cc: Daniel Vetter Signed-off-by: Rodrigo Vivi Reviewed-by: Imre Deak [Jani: regression reference] Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_runtime_pm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index af7fdb3bd663a..7401cf90b0dbc 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -246,7 +246,8 @@ static void skl_power_well_post_enable(struct drm_i915_private *dev_priv, } if (power_well->data == SKL_DISP_PW_1) { - intel_prepare_ddi(dev); + if (!dev_priv->power_domains.initializing) + intel_prepare_ddi(dev); gen8_irq_power_well_post_enable(dev_priv, 1 << PIPE_A); } } From dfc53c5e73f8b73abf920241e45eab87335ae742 Mon Sep 17 00:00:00 2001 From: Michel Thierry Date: Mon, 28 Sep 2015 13:25:12 +0100 Subject: [PATCH 2/4] drm/i915: Consider HW CSB write pointer before resetting the sw read pointer A previous commit resets the Context Status Buffer (CSB) read pointer in ring init commit c0a03a2e4c4e ("drm/i915: Reset CSB read pointer in ring init") This is generally correct, but this pointer is not reset after suspend/resume in some platforms (cht). In this case, the driver should read the register value instead of resetting the sw read counter to 0. Otherwise we process old events, leading to unwanted pre-emptions or something worse. But in other platforms (bdw) and also during GPU reset or power up, the CSBWP is reset to 0x7 (an invalid number), and in this case the read pointer should be set to 5 (the interrupt code will increment this counter one more time, and will start reading from CSB[0]). v2: When the CSB registers are reset, the read pointer needs to be set to 5, otherwise the first write (CSB[0]) won't be read (Mika). Replace magic numbers with GEN8_CSB_ENTRIES (6) and GEN8_CSB_PTR_MASK (0x07). Cc: Mika Kuoppala Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Lei Shen Signed-off-by: Deepak S Signed-off-by: Michel Thierry Reviewed-by: Mika Kuoppala Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_lrc.c | 39 ++++++++++++++++++++++++++------ drivers/gpu/drm/i915/intel_lrc.h | 2 ++ 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 72e0edd7bbde7..7412caedcf7f9 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -484,18 +484,18 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring) status_pointer = I915_READ(RING_CONTEXT_STATUS_PTR(ring)); read_pointer = ring->next_context_status_buffer; - write_pointer = status_pointer & 0x07; + write_pointer = status_pointer & GEN8_CSB_PTR_MASK; if (read_pointer > write_pointer) - write_pointer += 6; + write_pointer += GEN8_CSB_ENTRIES; spin_lock(&ring->execlist_lock); while (read_pointer < write_pointer) { read_pointer++; status = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + - (read_pointer % 6) * 8); + (read_pointer % GEN8_CSB_ENTRIES) * 8); status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + - (read_pointer % 6) * 8 + 4); + (read_pointer % GEN8_CSB_ENTRIES) * 8 + 4); if (status & GEN8_CTX_STATUS_IDLE_ACTIVE) continue; @@ -521,10 +521,12 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring) spin_unlock(&ring->execlist_lock); WARN(submit_contexts > 2, "More than two context complete events?\n"); - ring->next_context_status_buffer = write_pointer % 6; + ring->next_context_status_buffer = write_pointer % GEN8_CSB_ENTRIES; I915_WRITE(RING_CONTEXT_STATUS_PTR(ring), - _MASKED_FIELD(0x07 << 8, ((u32)ring->next_context_status_buffer & 0x07) << 8)); + _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8, + ((u32)ring->next_context_status_buffer & + GEN8_CSB_PTR_MASK) << 8)); } static int execlists_context_queue(struct drm_i915_gem_request *request) @@ -1422,6 +1424,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring) { struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; + u8 next_context_status_buffer_hw; I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask)); I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff); @@ -1436,7 +1439,29 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring) _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) | _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE)); POSTING_READ(RING_MODE_GEN7(ring)); - ring->next_context_status_buffer = 0; + + /* + * Instead of resetting the Context Status Buffer (CSB) read pointer to + * zero, we need to read the write pointer from hardware and use its + * value because "this register is power context save restored". + * Effectively, these states have been observed: + * + * | Suspend-to-idle (freeze) | Suspend-to-RAM (mem) | + * BDW | CSB regs not reset | CSB regs reset | + * CHT | CSB regs not reset | CSB regs not reset | + */ + next_context_status_buffer_hw = (I915_READ(RING_CONTEXT_STATUS_PTR(ring)) + & GEN8_CSB_PTR_MASK); + + /* + * When the CSB registers are reset (also after power-up / gpu reset), + * CSB write pointer is set to all 1's, which is not valid, use '5' in + * this special case, so the first element read is CSB[0]. + */ + if (next_context_status_buffer_hw == GEN8_CSB_PTR_MASK) + next_context_status_buffer_hw = (GEN8_CSB_ENTRIES - 1); + + ring->next_context_status_buffer = next_context_status_buffer_hw; DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name); memset(&ring->hangcheck, 0, sizeof(ring->hangcheck)); diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index 64f89f9982a20..3c63bb32ad81c 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -25,6 +25,8 @@ #define _INTEL_LRC_H_ #define GEN8_LR_CONTEXT_ALIGN 4096 +#define GEN8_CSB_ENTRIES 6 +#define GEN8_CSB_PTR_MASK 0x07 /* Execlists regs */ #define RING_ELSP(ring) ((ring)->mmio_base+0x230) From 4ad640e99e5e5514d623210bc937e665ffd8f43f Mon Sep 17 00:00:00 2001 From: Egbert Eich Date: Wed, 23 Sep 2015 16:13:00 +0200 Subject: [PATCH 3/4] drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2 drm_kms_helper_poll_enable() was converted to lock the mode_config mutex in commit 8c4ccc4ab6f64e859d4ff8d7c02c2ed2e956e07f ("drm/probe-helper: Grab mode_config.mutex in poll_init/enable"). This disregarded the cases where this function is called from a context where this mutex is already locked. Add a non-locking version as well. Changes since v1: - use function name suffix '_locked' for the function that is to be called from a locked context. Signed-off-by: Egbert Eich Reviewed-by: Daniel Vetter Signed-off-by: Jani Nikula --- drivers/gpu/drm/drm_probe_helper.c | 19 ++++++++++++++++--- include/drm/drm_crtc_helper.h | 1 + 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c index d734780b31c0f..a18164f2f6d28 100644 --- a/drivers/gpu/drm/drm_probe_helper.c +++ b/drivers/gpu/drm/drm_probe_helper.c @@ -94,7 +94,18 @@ static int drm_helper_probe_add_cmdline_mode(struct drm_connector *connector) } #define DRM_OUTPUT_POLL_PERIOD (10*HZ) -static void __drm_kms_helper_poll_enable(struct drm_device *dev) +/** + * drm_kms_helper_poll_enable_locked - re-enable output polling. + * @dev: drm_device + * + * This function re-enables the output polling work without + * locking the mode_config mutex. + * + * This is like drm_kms_helper_poll_enable() however it is to be + * called from a context where the mode_config mutex is locked + * already. + */ +void drm_kms_helper_poll_enable_locked(struct drm_device *dev) { bool poll = false; struct drm_connector *connector; @@ -113,6 +124,8 @@ static void __drm_kms_helper_poll_enable(struct drm_device *dev) if (poll) schedule_delayed_work(&dev->mode_config.output_poll_work, DRM_OUTPUT_POLL_PERIOD); } +EXPORT_SYMBOL(drm_kms_helper_poll_enable_locked); + static int drm_helper_probe_single_connector_modes_merge_bits(struct drm_connector *connector, uint32_t maxX, uint32_t maxY, bool merge_type_bits) @@ -174,7 +187,7 @@ static int drm_helper_probe_single_connector_modes_merge_bits(struct drm_connect /* Re-enable polling in case the global poll config changed. */ if (drm_kms_helper_poll != dev->mode_config.poll_running) - __drm_kms_helper_poll_enable(dev); + drm_kms_helper_poll_enable_locked(dev); dev->mode_config.poll_running = drm_kms_helper_poll; @@ -428,7 +441,7 @@ EXPORT_SYMBOL(drm_kms_helper_poll_disable); void drm_kms_helper_poll_enable(struct drm_device *dev) { mutex_lock(&dev->mode_config.mutex); - __drm_kms_helper_poll_enable(dev); + drm_kms_helper_poll_enable_locked(dev); mutex_unlock(&dev->mode_config.mutex); } EXPORT_SYMBOL(drm_kms_helper_poll_enable); diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h index 2a747a91fdede..3febb4b9fce92 100644 --- a/include/drm/drm_crtc_helper.h +++ b/include/drm/drm_crtc_helper.h @@ -240,5 +240,6 @@ extern void drm_kms_helper_hotplug_event(struct drm_device *dev); extern void drm_kms_helper_poll_disable(struct drm_device *dev); extern void drm_kms_helper_poll_enable(struct drm_device *dev); +extern void drm_kms_helper_poll_enable_locked(struct drm_device *dev); #endif From b94be972538f4523f09243af7c726158f0bfae33 Mon Sep 17 00:00:00 2001 From: Egbert Eich Date: Wed, 23 Sep 2015 16:13:01 +0200 Subject: [PATCH 4/4] drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2 drm_kms_helper_poll_enable() is called from a context in intel_hpd_irq_storm_disable() where the the mode_config mutex is already locked. When this function was converted to lock this mutex in commit 8c4ccc4ab6f6 ("drm/probe-helper: Grab mode_config.mutex in poll_init/enable") a deadlock occurred. Call the newly implemented non-locking version of this function. Changes since v1: - use function name suffix '_locked' for the function that is to be called from a locked context. Signed-off-by: Egbert Eich Reviewed-by: Daniel Vetter Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/intel_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_hotplug.c b/drivers/gpu/drm/i915/intel_hotplug.c index 53c0173a39fe1..b17785719598c 100644 --- a/drivers/gpu/drm/i915/intel_hotplug.c +++ b/drivers/gpu/drm/i915/intel_hotplug.c @@ -180,7 +180,7 @@ static void intel_hpd_irq_storm_disable(struct drm_i915_private *dev_priv) /* Enable polling and queue hotplug re-enabling. */ if (hpd_disabled) { - drm_kms_helper_poll_enable(dev); + drm_kms_helper_poll_enable_locked(dev); mod_delayed_work(system_wq, &dev_priv->hotplug.reenable_work, msecs_to_jiffies(HPD_STORM_REENABLE_DELAY)); }