Re: [PATCH v12 2/7] drm/i915/skl: Add support for the SAGV, fix underrun hangs

From: Maarten Lankhorst
Date: Mon Aug 22 2016 - 03:27:45 EST


Op 18-08-16 om 16:05 schreef Lyude Paul:
> On Thu, 2016-08-18 at 09:39 +0200, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 17-08-16 om 21:55 schreef Lyude:
>>> Since the watermark calculations for Skylake are still broken, we're apt
>>> to hitting underruns very easily under multi-monitor configurations.
>>> While it would be lovely if this was fixed, it's not. Another problem
>>> that's been coming from this however, is the mysterious issue of
>>> underruns causing full system hangs. An easy way to reproduce this with
>>> a skylake system:
>>>
>>> - Get a laptop with a skylake GPU, and hook up two external monitors to
>>> it
>>> - Move the cursor from the built-in LCD to one of the external displays
>>> as quickly as you can
>>> - You'll get a few pipe underruns, and eventually the entire system will
>>> just freeze.
>>>
>>> After doing a lot of investigation and reading through the bspec, I
>>> found the existence of the SAGV, which is responsible for adjusting the
>>> system agent voltage and clock frequencies depending on how much power
>>> we need. According to the bspec:
>>>
>>> "The display engine access to system memory is blocked during the
>>> adjustment time. SAGV defaults to enabled. Software must use the
>>> GT-driver pcode mailbox to disable SAGV when the display engine is not
>>> able to tolerate the blocking time."
>>>
>>> The rest of the bspec goes on to explain that software can simply leave
>>> the SAGV enabled, and disable it when we use interlaced pipes/have more
>>> then one pipe active.
>>>
>>> Sure enough, with this patchset the system hangs resulting from pipe
>>> underruns on Skylake have completely vanished on my T460s. Additionally,
>>> the bspec mentions turning off the SAGV with more then one pipe
>>> enabled
>>> as a workaround for display underruns. While this patch doesn't entirely
>>> fix that, it looks like it does improve the situation a little bit so
>>> it's likely this is going to be required to make watermarks on Skylake
>>> fully functional.
>>>
>>> This will still need additional work in the future: we shouldn't be
>>> enabling the SAGV if any of the currently enabled planes can't enable WM
>>> levels that introduce latencies >= 30 Âs.
>>>
>>> Changes since v11:
>>> - Add skl_can_enable_sagv()
>>> - Make sure we don't enable SAGV when not all planes can enable
>>> watermarks >= the SAGV engine block time. I was originally going to
>>> save this for later, but I recently managed to run into a machine
>>> that was having problems with a single pipe configuration + SAGV.
>>> - Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
>>> - Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
>>> - Move printks outside of mutexes
>>> - Don't print error messages twice
>>> Changes since v10:
>>> - Apparently sandybridge_pcode_read actually writes values and reads
>>> them back, despite it's misleading function name. This means we've
>>> been doing this mostly wrong and have been writing garbage to the
>>> SAGV control. Because of this, we no longer attempt to read the SAGV
>>> status during initialization (since there are no helpers for this).
>>> - mlankhorst noticed that this patch was breaking on some very early
>>> pre-release Skylake machines, which apparently don't allow you to
>>> disable the SAGV. To prevent machines from failing tests due to SAGV
>>> errors, if the first time we try to control the SAGV results in the
>>> mailbox indicating an invalid command, we just disable future attempts
>>> to control the SAGV state by setting dev_priv->skl_sagv_status to
>>> I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
>>> - Move mutex_unlock() a little higher in skl_enable_sagv(). This
>>> doesn't actually fix anything, but lets us release the lock a little
>>> sooner since we're finished with it.
>>> Changes since v9:
>>> - Only enable/disable sagv on Skylake
>>> Changes since v8:
>>> - Add intel_state->modeset guard to the conditional for
>>> skl_enable_sagv()
>>> Changes since v7:
>>> - Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
>>> all we use it for anyway)
>>> - Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
>>> - Fix a styling error that snuck past me
>>> Changes since v6:
>>> - Protect skl_enable_sagv() with intel_state->modeset conditional in
>>> intel_atomic_commit_tail()
>>> Changes since v5:
>>> - Don't use is_power_of_2. Makes things confusing
>>> - Don't use the old state to figure out whether or not to
>>> enable/disable the sagv, use the new one
>>> - Split the loop in skl_disable_sagv into it's own function
>>> - Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
>>> Changes since v4:
>>> - Use is_power_of_2 against active_crtcs to check whether we have > 1
>>> pipe enabled
>>> - Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
>>> enabled
>>> - Call skl_sagv_enable/disable() from pre/post-plane updates
>>> Changes since v3:
>>> - Use time_before() to compare timeout to jiffies
>>> Changes since v2:
>>> - Really apply minor style nitpicks to patch this time
>>> Changes since v1:
>>> - Added comments about this probably being one of the requirements to
>>> fixing Skylake's watermark issues
>>> - Minor style nitpicks from Matt Roper
>>> - Disable these functions on Broxton, since it doesn't have an SAGV
>>>
>>> Signed-off-by: Lyude <cpaul@xxxxxxxxxx>
>>> Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
>>> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
>>> Cc: Ville SyrjÃlà <ville.syrjala@xxxxxxxxxxxxxxx>
>>> Cc: stable@xxxxxxxxxxxxxxx
>>> ---
>>> drivers/gpu/drm/i915/i915_drv.h | 7 ++
>>> drivers/gpu/drm/i915/i915_reg.h | 4 +
>>> drivers/gpu/drm/i915/intel_display.c | 11 +++
>>> drivers/gpu/drm/i915/intel_drv.h | 3 +
>>> drivers/gpu/drm/i915/intel_pm.c | 148
>>> +++++++++++++++++++++++++++++++++++
>>> 5 files changed, 173 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index 35caa9b..f20530bb 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1949,6 +1949,13 @@ struct drm_i915_private {
>>> struct i915_suspend_saved_registers regfile;
>>> struct vlv_s0ix_state vlv_s0ix_state;
>>>
>>> + enum {
>>> + I915_SKL_SAGV_UNKNOWN = 0,
>>> + I915_SKL_SAGV_DISABLED,
>>> + I915_SKL_SAGV_ENABLED,
>>> + I915_SKL_SAGV_NOT_CONTROLLED
>>> + } skl_sagv_status;
>>> +
>>> struct {
>>> /*
>>> * Raw watermark latency values:
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
>>> index 7419fbf..be82c49 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> @@ -7153,6 +7153,10 @@ enum {
>>> #define HSW_PCODE_DE_WRITE_FREQ_REQ 0x17
>>> #define DISPLAY_IPS_CONTROL 0x19
>>> #define HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL 0x1A
>>> +#define GEN9_PCODE_SAGV_CONTROL 0x21
>>> +#define GEN9_SAGV_DISABLE 0x0
>>> +#define GEN9_SAGV_IS_DISABLED 0x1
>>> +#define GEN9_SAGV_ENABLE 0x3
>>> #define GEN6_PCODE_DATA _MMIO(0x138128)
>>> #define GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8
>>> #define GEN6_PCODE_FREQ_RING_RATIO_SHIFT 16
>>> diff --git a/drivers/gpu/drm/i915/intel_display.c
>>> b/drivers/gpu/drm/i915/intel_display.c
>>> index 781d22e..ca4b83f 100644
>>> --- a/drivers/gpu/drm/i915/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/intel_display.c
>>> @@ -14156,6 +14156,13 @@ static void intel_atomic_commit_tail(struct
>>> drm_atomic_state *state)
>>> intel_state->cdclk_pll_vco != dev_priv-
>>>> cdclk_pll.vco))
>>> dev_priv->display.modeset_commit_cdclk(state);
>>>
>>> + /*
>>> + * SKL workaround: bspec recommends we disable the SAGV
>>> when we
>>> + * have more then one pipe enabled
>>> + */
>>> + if (IS_SKYLAKE(dev_priv) && !skl_can_enable_sagv(state))
>>> + skl_disable_sagv(dev_priv);
>>> +
>>> intel_modeset_verify_disabled(dev);
>>> }
>>>
>>> @@ -14229,6 +14236,10 @@ static void intel_atomic_commit_tail(struct
>>> drm_atomic_state *state)
>>> intel_modeset_verify_crtc(crtc, old_crtc_state, crtc-
>>>> state);
>>> }
>>>
>>> + if (IS_SKYLAKE(dev_priv) && intel_state->modeset &&
>>> + skl_can_enable_sagv(state))
>>> + skl_enable_sagv(dev_priv);
>>> +
>>> drm_atomic_helper_commit_hw_done(state);
>>>
>>> if (intel_state->modeset)
>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h
>>> b/drivers/gpu/drm/i915/intel_drv.h
>>> index 1c700b0..d203c77 100644
>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>> @@ -1722,6 +1722,9 @@ void ilk_wm_get_hw_state(struct drm_device *dev);
>>> void skl_wm_get_hw_state(struct drm_device *dev);
>>> void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
>>> struct skl_ddb_allocation *ddb /* out */);
>>> +bool skl_can_enable_sagv(struct drm_atomic_state *state);
>>> +int skl_enable_sagv(struct drm_i915_private *dev_priv);
>>> +int skl_disable_sagv(struct drm_i915_private *dev_priv);
>>> uint32_t ilk_pipe_pixel_rate(const struct intel_crtc_state *pipe_config);
>>> bool ilk_disable_lp_wm(struct drm_device *dev);
>>> int sanitize_rc6_option(struct drm_i915_private *dev_priv, int enable_rc6);
>>> diff --git a/drivers/gpu/drm/i915/intel_pm.c
>>> b/drivers/gpu/drm/i915/intel_pm.c
>>> index b4cf988..fed2bae8 100644
>>> --- a/drivers/gpu/drm/i915/intel_pm.c
>>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>>> @@ -2860,6 +2860,7 @@ bool ilk_disable_lp_wm(struct drm_device *dev)
>>>
>>> #define SKL_DDB_SIZE 896 /* in blocks */
>>> #define BXT_DDB_SIZE 512
>>> +#define SKL_SAGV_BLOCK_TIME 30 /* Âs */
>>>
>>> /*
>>> * Return the index of a plane in the SKL DDB and wm result
>>> arrays. Primary
>>> @@ -2883,6 +2884,153 @@ skl_wm_plane_id(const struct intel_plane *plane)
>>> }
>>> }
>>>
>>> +/*
>>> + * SAGV dynamically adjusts the system agent voltage and clock frequencies
>>> + * depending on power and performance requirements. The display engine
>>> access
>>> + * to system memory is blocked during the adjustment time. Because of the
>>> + * blocking time, having this enabled can cause full system hangs and/or
>>> pipe
>>> + * underruns if we don't meet all of the following requirements:
>>> + *
>>> + * - <= 1 pipe enabled
>>> + * - All planes can enable watermarks for latencies >= SAGV engine block
>>> time
>>> + * - We're not using an interlaced display configuration
>>> + */
>>> +int
>>> +skl_enable_sagv(struct drm_i915_private *dev_priv)
>>> +{
>>> + int ret;
>>> +
>>> + if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED ||
>>> + dev_priv->skl_sagv_status == I915_SKL_SAGV_ENABLED)
>>> + return 0;
>>> +
>>> + DRM_DEBUG_KMS("Enabling the SAGV\n");
>>> + mutex_lock(&dev_priv->rps.hw_lock);
>>> +
>>> + ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL,
>>> + GEN9_SAGV_ENABLE);
>>> +
>>> + /* We don't need to wait for the SAGV when enabling */
>>> + mutex_unlock(&dev_priv->rps.hw_lock);
>>> +
>>> + /*
>>> + * Some skl systems, pre-release machines in particular,
>>> + * don't actually have an SAGV.
>>> + */
>>> + if (ret == -ENOSYS) {
>>> + DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n");
>>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED;
>>> + return 0;
>>> + } else if (ret < 0) {
>>> + DRM_ERROR("Failed to enable the SAGV\n");
>>> + return ret;
>>> + }
>>> +
>>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_ENABLED;
>>> + return 0;
>>> +}
>>> +
>>> +static int
>>> +skl_do_sagv_disable(struct drm_i915_private *dev_priv)
>>> +{
>>> + int ret;
>>> + uint32_t temp = GEN9_SAGV_DISABLE;
>>> +
>>> + ret = sandybridge_pcode_read(dev_priv, GEN9_PCODE_SAGV_CONTROL,
>>> + &temp);
>>> + if (ret)
>>> + return ret;
>>> + else
>>> + return temp & GEN9_SAGV_IS_DISABLED;
>>> +}
>>> +
>>> +int
>>> +skl_disable_sagv(struct drm_i915_private *dev_priv)
>>> +{
>>> + int ret, result;
>>> +
>>> + if (dev_priv->skl_sagv_status == I915_SKL_SAGV_NOT_CONTROLLED ||
>>> + dev_priv->skl_sagv_status == I915_SKL_SAGV_DISABLED)
>>> + return 0;
>>> +
>>> + DRM_DEBUG_KMS("Disabling the SAGV\n");
>>> + mutex_lock(&dev_priv->rps.hw_lock);
>>> +
>>> + /* bspec says to keep retrying for at least 1 ms */
>>> + ret = wait_for(result = skl_do_sagv_disable(dev_priv), 1);
>>> + mutex_unlock(&dev_priv->rps.hw_lock);
>>> +
>>> + if (ret == -ETIMEDOUT) {
>>> + DRM_ERROR("Request to disable SAGV timed out\n");
>>> + return -ETIMEDOUT;
>>> + }
>>> +
>>> + /*
>>> + * Some skl systems, pre-release machines in particular,
>>> + * don't actually have an SAGV.
>>> + */
>>> + if (result == -ENOSYS) {
>>> + DRM_DEBUG_DRIVER("No SAGV found on system, ignoring\n");
>>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_NOT_CONTROLLED;
>>> + return 0;
>>> + } else if (result < 0) {
>>> + DRM_ERROR("Failed to disable the SAGV\n");
>>> + return result;
>>> + }
>>> +
>>> + dev_priv->skl_sagv_status = I915_SKL_SAGV_DISABLED;
>>> + return 0;
>>> +}
>>> +
>>> +bool skl_can_enable_sagv(struct drm_atomic_state *state)
>>> +{
>>> + struct drm_device *dev = state->dev;
>>> + struct drm_i915_private *dev_priv = to_i915(dev);
>>> + struct intel_atomic_state *intel_state =
>>> to_intel_atomic_state(state);
>>> + struct drm_crtc *crtc;
>>> + enum pipe pipe;
>>> + int level, plane;
>>> +
>>> + /*
>>> + * SKL workaround: bspec recommends we disable the SAGV when we
>>> have
>>> + * more then one pipe enabled
>>> + *
>>> + * If there are no active CRTCs, no additional checks need be
>>> performed
>>> + */
>>> + if (hweight32(intel_state->active_crtcs) == 0)
>>> + return true;
>>> + else if (hweight32(intel_state->active_crtcs) > 1)
>>> + return false;
>>> +
>>> + /* Since we're now guaranteed to only have one active CRTC... */
>>> + pipe = ffs(intel_state->active_crtcs) - 1;
>>> + crtc = dev_priv->pipe_to_crtc_mapping[pipe];
>>> +
>>> + if (crtc->state->mode.flags & DRM_MODE_FLAG_INTERLACE)
>>> + return false;
>>> +
>>> + for_each_plane(dev_priv, pipe, plane) {
>>> + /* Skip this plane if it's not enabled */
>>> + if (intel_state->wm_results.plane[pipe][plane][0] == 0)
>>> + continue;
>>> +
>>> + /* Find the highest enabled wm level for this plane */
>>> + for (level = ilk_wm_max_level(dev);
>>> + intel_state->wm_results.plane[pipe][plane][level] ==
>>> 0;
>>> + --level);
>>> +
>>> + /*
>>> + * If any of the planes on this pipe don't enable wm levels
>>> + * that incur memory latencies higher then 30Âs we can't
>>> enable
>>> + * the SAGV
>>> + */
>>> + if (dev_priv->wm.skl_latency[level] < SKL_SAGV_BLOCK_TIME)
>>> + return false;
>> Shouldn't this check be >= BLOCK_TIME?
>>
> That's the requirement for the sagv but the conditional here is still correct.
>
> WM0 - 2ms
> WM1 - 5ms
> WM2 - 10ms
> WM3 - 20ms
> WM4+- disabled
>
> (20ms < BLOCK_TIME) == true, which indicates we don't have any watermark levels
> with latency values >= 30ms. We can't enable so return false.
>
> WM0 - 2ms
> WM1 - 5ms
> WM2 - 10ms
> WM3 - 20ms
> WM4 - 33ms
> WM5 - 50ms
> WM6 - 70ms
> WM7 - 99ms
>
> (99ms < BLOCK_TIME) == false, and 99 >= BLOCK_TIME so we end up returning true
> to indicate it's safe to enable.
Ah right, thanks for explaining.

~Maarten