Re: [PATCH v11 7/7] drm/i915/skl: Update DDB values atomically with wms/plane attrs

From: Ville Syrjälä
Date: Fri Aug 12 2016 - 09:30:43 EST


On Thu, Aug 11, 2016 at 03:54:36PM -0400, Lyude wrote:
> Now that we can hook into update_crtcs and control the order in which we
> update CRTCs at each modeset, we can finish the final step of fixing
> Skylake's watermark handling by performing DDB updates at the same time
> as plane updates and watermark updates.
>
> The first major change in this patch is skl_update_crtcs(), which
> handles ensuring that we order each CRTC update in our atomic commits
> properly so that they honor the DDB flush order.
>
> The second major change in this patch is the order in which we flush the
> pipes. While the previous order may have worked, it can't be used in
> this approach since it no longer will do the right thing. For example,
> using the old ddb flush order:
>
> We have pipes A, B, and C enabled, and we're disabling C. Initial ddb
> allocation looks like this:
>
> | A | B |xxxxxxx|
>
> Since we're performing the ddb updates after performing any CRTC
> disablements in intel_atomic_commit_tail(), the space to the right of
> pipe B is unallocated.
>
> 1. Flush pipes with new allocation contained into old space. None
> apply, so we skip this
> 2. Flush pipes having their allocation reduced, but overlapping with a
> previous allocation. None apply, so we also skip this
> 3. Flush pipes that got more space allocated. This applies to A and B,
> giving us the following update order: A, B
>
> This is wrong, since updating pipe A first will cause it to overlap with
> B and potentially burst into flames. Our new order (see the code
> comments for details) would update the pipes in the proper order: B, A.
>
> As well, we calculate the order for each DDB update during the check
> phase, and reference it later in the commit phase when we hit
> skl_update_crtcs().
>
> This long overdue patch fixes the rest of the underruns on Skylake.
>
> Changes since v1:
> - Add skl_ddb_entry_write() for cursor into skl_write_cursor_wm()
> Changes since v2:
> - Use the method for updating CRTCs that Ville suggested
> - In skl_update_wm(), only copy the watermarks for the crtc that was
> passed to us
> Changes since v3:
> - Small comment fix in skl_ddb_allocation_overlaps()
>
> Fixes: 0e8fb7ba7ca5 ("drm/i915/skl: Flush the WM configuration")
> Fixes: 8211bd5bdf5e ("drm/i915/skl: Program the DDB allocation")
> [omitting CC for stable, since this patch will need to be changed for
> such backports first]
>
> Testcase: kms_cursor_legacy
> Signed-off-by: Lyude <cpaul@xxxxxxxxxx>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> Cc: Radhakrishna Sripada <radhakrishna.sripada@xxxxxxxxx>
> Cc: Hans de Goede <hdegoede@xxxxxxxxxx>
> Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
> ---
> drivers/gpu/drm/i915/intel_display.c | 100 +++++++++++++++--
> drivers/gpu/drm/i915/intel_drv.h | 7 ++
> drivers/gpu/drm/i915/intel_pm.c | 207 +++++++++--------------------------
> 3 files changed, 144 insertions(+), 170 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 61a45f1..68fdbf0 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -13348,16 +13348,23 @@ static void verify_wm_state(struct drm_crtc *crtc,
> hw_entry->start, hw_entry->end);
> }
>
> - /* cursor */
> - hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> - sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> -
> - if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> - DRM_ERROR("mismatch in DDB state pipe %c cursor "
> - "(expected (%u,%u), found (%u,%u))\n",
> - pipe_name(pipe),
> - sw_entry->start, sw_entry->end,
> - hw_entry->start, hw_entry->end);
> + /*
> + * cursor
> + * If the cursor plane isn't active, we may not have updated it's ddb
> + * allocation. In that case since the ddb allocation will be updated
> + * once the plane becomes visible, we can skip this check
> + */
> + if (intel_crtc->cursor_addr) {
> + hw_entry = &hw_ddb.plane[pipe][PLANE_CURSOR];
> + sw_entry = &sw_ddb->plane[pipe][PLANE_CURSOR];
> +
> + if (!skl_ddb_entry_equal(hw_entry, sw_entry)) {
> + DRM_ERROR("mismatch in DDB state pipe %c cursor "
> + "(expected (%u,%u), found (%u,%u))\n",
> + pipe_name(pipe),
> + sw_entry->start, sw_entry->end,
> + hw_entry->start, hw_entry->end);
> + }
> }
> }
>
> @@ -14109,6 +14116,72 @@ static void intel_update_crtcs(struct drm_atomic_state *state,
> }
> }
>
> +static void skl_update_crtcs(struct drm_atomic_state *state,
> + unsigned int *crtc_vblank_mask)
> +{
> + struct drm_device *dev = state->dev;
> + struct drm_i915_private *dev_priv = to_i915(dev);
> + struct intel_atomic_state *intel_state = to_intel_atomic_state(state);
> + struct drm_crtc *crtc;
> + struct drm_crtc_state *old_crtc_state;
> + struct skl_ddb_allocation *new_ddb = &intel_state->wm_results.ddb;
> + struct skl_ddb_allocation cur_ddb;
> + bool progress;
> + bool reallocated[I915_MAX_PIPES] = {};
> + enum pipe pipe;
> + int wait_vbl_pipes, i;
> +
> + /*
> + * Whenever the number of active pipes change, so does the DDB
> + * allocation. DDB allocations on pipes cannot ever overlap with
> + * eachother at any point in time, so we need to change the order we
> + * update the pipes so that we ensure they never overlap inbetween DDB
> + * updates.
> + */
> + do {
> + progress = false;
> + wait_vbl_pipes = 0;
> + cur_ddb = dev_priv->wm.skl_hw.ddb;

I guess would could do the loop twice to avoid the copy. But not sure
that's really any more efficient.

> +
> + for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
> + struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> + pipe = intel_crtc->pipe;
> +
> + if (!intel_crtc->active || needs_modeset(crtc->state))
> + continue;

crtc->state->active?

> + if (skl_ddb_allocation_equals(&cur_ddb, new_ddb, pipe))
> + continue;
> + if (skl_ddb_allocation_overlaps(state, &cur_ddb,
> + new_ddb, pipe))
> + continue;
> +
> + intel_update_crtc(crtc, state, old_crtc_state,
> + crtc_vblank_mask);

What did the caller want with the vblank_mask? Do more vblank waits?
Shouldn't be needed I think since we wait here, no?

> +
> + wait_vbl_pipes |= drm_crtc_mask(crtc);
> + reallocated[pipe] = true;
> + progress = true;

I guess we could throw this out actully. wait_vbl_pipes!=0 should mean
the same thing.

> + }
> +
> + /* Wait for each pipe's new allocation to take effect */
> + intel_atomic_wait_for_vblanks(dev, dev_priv, wait_vbl_pipes);
> + } while (progress);
> +
> + /*
> + * Now that we've handled any ddb reallocations, we can go ahead and
> + * enable any new pipes.
> + */
> + for_each_crtc_in_state(state, crtc, old_crtc_state, i) {
> + pipe = to_intel_crtc(crtc)->pipe;
> +
> + if (reallocated[pipe] || !crtc->state->active)
> + continue;

Do we need this reallocated[] flag? Isn't this the same as
'active && needs_modeset' ?

> +
> + intel_update_crtc(crtc, state, old_crtc_state,
> + crtc_vblank_mask);
> + }
> +}
> +
> static void intel_atomic_commit_tail(struct drm_atomic_state *state)
> {
> struct drm_device *dev = state->dev;
> @@ -15738,8 +15811,6 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
> dev_priv->display.crtc_disable = i9xx_crtc_disable;
> }
>
> - dev_priv->display.update_crtcs = intel_update_crtcs;
> -
> /* Returns the core display clock speed */
> if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv))
> dev_priv->display.get_display_clock_speed =
> @@ -15829,6 +15900,11 @@ void intel_init_display_hooks(struct drm_i915_private *dev_priv)
> skl_modeset_calc_cdclk;
> }
>
> + if (dev_priv->info.gen >= 9)
> + dev_priv->display.update_crtcs = skl_update_crtcs;
> + else
> + dev_priv->display.update_crtcs = intel_update_crtcs;
> +
> switch (INTEL_INFO(dev_priv)->gen) {
> case 2:
> dev_priv->display.queue_flip = intel_gen2_queue_flip;
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 88088c3..9f9fe69 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1723,6 +1723,13 @@ void skl_ddb_get_hw_state(struct drm_i915_private *dev_priv,
> struct skl_ddb_allocation *ddb /* out */);
> int skl_enable_sagv(struct drm_i915_private *dev_priv);
> int skl_disable_sagv(struct drm_i915_private *dev_priv);
> +bool skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
> + const struct skl_ddb_allocation *new,
> + enum pipe pipe);
> +bool skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
> + const struct skl_ddb_allocation *old,
> + const struct skl_ddb_allocation *new,
> + enum pipe pipe);
> void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> const struct skl_wm_values *wm);
> void skl_write_plane_wm(struct intel_crtc *intel_crtc,
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index f2e0071..6fe9e57 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3794,6 +3794,11 @@ void skl_write_plane_wm(struct intel_crtc *intel_crtc,
> wm->plane[pipe][plane][level]);
> }
> I915_WRITE(PLANE_WM_TRANS(pipe, plane), wm->plane_trans[pipe][plane]);
> +
> + skl_ddb_entry_write(dev_priv, PLANE_BUF_CFG(pipe, plane),
> + &wm->ddb.plane[pipe][plane]);
> + skl_ddb_entry_write(dev_priv, PLANE_NV12_BUF_CFG(pipe, plane),
> + &wm->ddb.y_plane[pipe][plane]);
> }
>
> void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> @@ -3810,170 +3815,49 @@ void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> wm->plane[pipe][PLANE_CURSOR][level]);
> }
> I915_WRITE(CUR_WM_TRANS(pipe), wm->plane_trans[pipe][PLANE_CURSOR]);
> -}
> -
> -static void skl_write_wm_values(struct drm_i915_private *dev_priv,
> - const struct skl_wm_values *new)
> -{
> - struct drm_device *dev = &dev_priv->drm;
> - struct intel_crtc *crtc;
>
> - for_each_intel_crtc(dev, crtc) {
> - int i;
> - enum pipe pipe = crtc->pipe;
> -
> - if ((new->dirty_pipes & drm_crtc_mask(&crtc->base)) == 0)
> - continue;
> - if (!crtc->active)
> - continue;
> -
> - for (i = 0; i < intel_num_planes(crtc); i++) {
> - skl_ddb_entry_write(dev_priv,
> - PLANE_BUF_CFG(pipe, i),
> - &new->ddb.plane[pipe][i]);
> - skl_ddb_entry_write(dev_priv,
> - PLANE_NV12_BUF_CFG(pipe, i),
> - &new->ddb.y_plane[pipe][i]);
> - }
> -
> - skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> - &new->ddb.plane[pipe][PLANE_CURSOR]);
> - }
> + skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> + &wm->ddb.plane[pipe][PLANE_CURSOR]);
> }
>
> -/*
> - * When setting up a new DDB allocation arrangement, we need to correctly
> - * sequence the times at which the new allocations for the pipes are taken into
> - * account or we'll have pipes fetching from space previously allocated to
> - * another pipe.
> - *
> - * Roughly the sequence looks like:
> - * 1. re-allocate the pipe(s) with the allocation being reduced and not
> - * overlapping with a previous light-up pipe (another way to put it is:
> - * pipes with their new allocation strickly included into their old ones).
> - * 2. re-allocate the other pipes that get their allocation reduced
> - * 3. allocate the pipes having their allocation increased
> - *
> - * Steps 1. and 2. are here to take care of the following case:
> - * - Initially DDB looks like this:
> - * | B | C |
> - * - enable pipe A.
> - * - pipe B has a reduced DDB allocation that overlaps with the old pipe C
> - * allocation
> - * | A | B | C |
> - *
> - * We need to sequence the re-allocation: C, B, A (and not B, C, A).
> - */
> -
> -static void
> -skl_wm_flush_pipe(struct drm_i915_private *dev_priv, enum pipe pipe, int pass)
> -{
> - int plane;
> -
> - DRM_DEBUG_KMS("flush pipe %c (pass %d)\n", pipe_name(pipe), pass);
> -
> - for_each_plane(dev_priv, pipe, plane) {
> - I915_WRITE(PLANE_SURF(pipe, plane),
> - I915_READ(PLANE_SURF(pipe, plane)));
> - }
> - I915_WRITE(CURBASE(pipe), I915_READ(CURBASE(pipe)));
> -}
> -
> -static bool
> -skl_ddb_allocation_included(const struct skl_ddb_allocation *old,
> - const struct skl_ddb_allocation *new,
> - enum pipe pipe)
> +bool skl_ddb_allocation_equals(const struct skl_ddb_allocation *old,
> + const struct skl_ddb_allocation *new,
> + enum pipe pipe)
> {
> - uint16_t old_size, new_size;
> -
> - old_size = skl_ddb_entry_size(&old->pipe[pipe]);
> - new_size = skl_ddb_entry_size(&new->pipe[pipe]);
> -
> - return old_size != new_size &&
> - new->pipe[pipe].start >= old->pipe[pipe].start &&
> - new->pipe[pipe].end <= old->pipe[pipe].end;
> + return new->pipe[pipe].start == old->pipe[pipe].start &&
> + new->pipe[pipe].end == old->pipe[pipe].end;
> }
>
> -static void skl_flush_wm_values(struct drm_i915_private *dev_priv,
> - struct skl_wm_values *new_values)
> +bool skl_ddb_allocation_overlaps(struct drm_atomic_state *state,
> + const struct skl_ddb_allocation *old,
> + const struct skl_ddb_allocation *new,
> + enum pipe pipe)
> {
> - struct drm_device *dev = &dev_priv->drm;
> - struct skl_ddb_allocation *cur_ddb, *new_ddb;
> - bool reallocated[I915_MAX_PIPES] = {};
> - struct intel_crtc *crtc;
> - enum pipe pipe;
> -
> - new_ddb = &new_values->ddb;
> - cur_ddb = &dev_priv->wm.skl_hw.ddb;
> -
> - /*
> - * First pass: flush the pipes with the new allocation contained into
> - * the old space.
> - *
> - * We'll wait for the vblank on those pipes to ensure we can safely
> - * re-allocate the freed space without this pipe fetching from it.
> - */
> - for_each_intel_crtc(dev, crtc) {
> - if (!crtc->active)
> - continue;
> -
> - pipe = crtc->pipe;
> -
> - if (!skl_ddb_allocation_included(cur_ddb, new_ddb, pipe))
> - continue;
> -
> - skl_wm_flush_pipe(dev_priv, pipe, 1);
> - intel_wait_for_vblank(dev, pipe);
> -
> - reallocated[pipe] = true;
> - }
> -
> -
> - /*
> - * Second pass: flush the pipes that are having their allocation
> - * reduced, but overlapping with a previous allocation.
> - *
> - * Here as well we need to wait for the vblank to make sure the freed
> - * space is not used anymore.
> - */
> - for_each_intel_crtc(dev, crtc) {
> - if (!crtc->active)
> - continue;
> -
> - pipe = crtc->pipe;
> -
> - if (reallocated[pipe])
> - continue;
> -
> - if (skl_ddb_entry_size(&new_ddb->pipe[pipe]) <
> - skl_ddb_entry_size(&cur_ddb->pipe[pipe])) {
> - skl_wm_flush_pipe(dev_priv, pipe, 2);
> - intel_wait_for_vblank(dev, pipe);
> - reallocated[pipe] = true;
> - }
> - }
> -
> - /*
> - * Third pass: flush the pipes that got more space allocated.
> - *
> - * We don't need to actively wait for the update here, next vblank
> - * will just get more DDB space with the correct WM values.
> - */
> - for_each_intel_crtc(dev, crtc) {
> - if (!crtc->active)
> - continue;
> + struct drm_device *dev = state->dev;
> + struct intel_crtc *intel_crtc;
> + enum pipe otherp;
>
> - pipe = crtc->pipe;
> + for_each_intel_crtc(dev, intel_crtc) {
> + otherp = intel_crtc->pipe;
>
> /*
> - * At this point, only the pipes more space than before are
> - * left to re-allocate.
> + * When checking for overlaps, we don't want to:
> + * - Compare against ourselves
> + * - Compare against pipes that will be/are disabled
> + * - Compare against pipes that aren't enabled yet
> */
> - if (reallocated[pipe])
> + if (otherp == pipe || !new->pipe[otherp].end ||
> + !old->pipe[otherp].end)
> continue;

Arent we setting start=end=0 while the ddb entry is deallocated (while
a pipe is disabled)? If so we shouldn't need to check for empty entry
explicitly. Hmm. maybe we're not. I think we should since that would
make things simpler.

>
> - skl_wm_flush_pipe(dev_priv, pipe, 3);
> + if ((new->pipe[pipe].start >= old->pipe[otherp].start &&
> + new->pipe[pipe].start < old->pipe[otherp].end) ||
> + (old->pipe[otherp].start >= new->pipe[pipe].start &&
> + old->pipe[otherp].start < new->pipe[pipe].end))

I'd extract this to a small helper, and simplify a bit while at it.
So perhaps:

skl_ddb_entries_overlap()
{
return a->start < b->end && b->start < a->end;
}

> + return true;
> }
> +
> + return false;
> }
>
> static int skl_update_pipe_wm(struct drm_crtc_state *cstate,
> @@ -4160,7 +4044,7 @@ static void skl_update_wm(struct drm_crtc *crtc)
> struct skl_wm_values *hw_vals = &dev_priv->wm.skl_hw;
> struct intel_crtc_state *cstate = to_intel_crtc_state(crtc->state);
> struct skl_pipe_wm *pipe_wm = &cstate->wm.skl.optimal;
> - int pipe;
> + enum pipe pipe = intel_crtc->pipe;
>
> if ((results->dirty_pipes & drm_crtc_mask(crtc)) == 0)
> return;
> @@ -4169,15 +4053,22 @@ static void skl_update_wm(struct drm_crtc *crtc)
>
> mutex_lock(&dev_priv->wm.wm_mutex);
>
> - skl_write_wm_values(dev_priv, results);
> - skl_flush_wm_values(dev_priv, results);
> -
> /*
> - * Store the new configuration (but only for the pipes that have
> - * changed; the other values weren't recomputed).
> + * If this pipe isn't active already, we're going to be enabling it
> + * very soon. Since it's safe to update a pipe's ddb allocation while
> + * the pipe's shut off, just do so here. Already active pipes will have
> + * their watermarks updated once we update their planes.
> */
> - for_each_pipe_masked(dev_priv, pipe, results->dirty_pipes)
> - skl_copy_wm_for_pipe(hw_vals, results, pipe);
> + if (crtc->state->active_changed) {
> + int plane;
> +
> + for (plane = 0; plane < intel_num_planes(intel_crtc); plane++)
> + skl_write_plane_wm(intel_crtc, results, plane);
> +
> + skl_write_cursor_wm(intel_crtc, results);
> + }
> +
> + skl_copy_wm_for_pipe(hw_vals, results, pipe);

Hmm. So I'm thinking that if we make the ddb allocation match the
actual pipe state at all time, the algorithm will be simpler. So if any
disabled pipe (even ones disabled temporarily for modeset) will have
their current ddb entry zeroed, the main update loop can handle
everything for us, even enabling new pipes. Well, it won't handle
disabling pipes, but that's fine.

>
> mutex_unlock(&dev_priv->wm.wm_mutex);
> }
> --
> 2.7.4

--
Ville Syrjälä
Intel OTC