Re: [git pull] drm for 5.8-rc1

From: James Jones
Date: Thu Jul 02 2020 - 17:13:59 EST


On 7/2/20 1:22 AM, Daniel Stone wrote:
Hi,

On Wed, 1 Jul 2020 at 20:45, James Jones <jajones@xxxxxxxxxx> wrote:
OK, I think I see what's going on. In the Xorg modesetting driver, the
logic is basically:

if (gbm_has_modifiers && DRM_CAP_ADDFB2_MODIFIERS != 0) {
drmModeAddFB2WithModifiers(..., gbm_bo_get_modifier(bo->gbm));
} else {
drmModeAddFB(...);
}

I read this thread expecting to explain the correct behaviour we
implement in Weston and how modesetting needs to be fixed, but ...
that seems OK to me? As long as `gbm_has_modifiers` is a proxy for 'we
used gbm_(bo|surface)_create_with_modifiers to allocate the buffer'.

Yes, the hazards of reporting findings before verifying. I now see modesetting does query the DRM-KMS modifiers and attempt to allocate with them if it found any. However, I still see a lot of ways things can go wrong, but I'm not going to share my speculation again until I've actually verified it, which is taking a frustratingly long time. The modesetting driver is not my friend right now.

There's no attempt to verify the DRM-KMS device supports the modifier,
but then, why would there be? GBM presumably chose a supported modifier
at buffer creation time, and we don't know which plane the FB is going
to be used with yet. GBM doesn't actually ask the kernel which
modifiers it supports here either though.

Right, it doesn't ask, because userspace tells it which modifiers to
use. The correct behaviour is to take the list from the KMS
`IN_FORMATS` property and then pass that to
`gbm_(bo|surface)_create_with_modifiers`; GBM must then select from
that list and only that list. If that call does not succeed and Xorg
falls back to `gbm_surface_create`, then it must not call
`gbm_bo_get_modifier` - so that would be a modesetting bug. If that
call does succeed and `gbm_bo_get_modifier` subsequently reports a
modifier which was not in the list, that's a Mesa driver bug.

It just goes into Mesa via
DRI and reports the modifier (unpatched) Mesa chose on its own. Mesa
just hard-codes the modifiers in its driver backends since its thinking
in terms of a device's 3D engine, not display. In theory, Mesa's DRI
drivers could query KMS for supported modifiers if allocating from GBM
using the non-modifiers path and the SCANOUT flag is set (perhaps some
drivers do this or its equivalent? Haven't checked.), but that seems
pretty gnarly and doesn't fix the modifier-based GBM allocation path
AFAIK. Bit of a mess.

Two options for GBM users:
* call gbm_*_create_with_modifiers, it succeeds, call
gbm_bo_get_modifier, pass modifier into AddFB
* call gbm_*_create (without modifiers), it succeeds, do not call
gbm_bo_get_modifier, do not pass a modifier into AddFB

Anything else is a bug in the user. Note that falling back from 1 to 2
is fine: if `gbm_*_create_with_modifiers()` fails, you can fall back
to the non-modifier path, provided you don't later try to get a
modifier back out.

For a quick userspace fix that could probably be pushed out everywhere
(Only affects Xorg server 1.20+ AFAIK), just retrying
drmModeAddFB2WithModifiers() without the DRM_MODE_FB_MODIFIERS flag on
failure should be sufficient.

This would break other drivers.

I think this could be done in a way that wouldn't, though it wouldn't be quite as simple. Let's see what the true root cause is first though.

Still need to verify as I'm having
trouble wrangling my Xorg build at the moment and I'm pressed for time.
A more complete fix would be quite involved, as modesetting isn't really
properly plumbed to validate GBM's modifiers against KMS planes, and it
doesn't seem like GBM/Mesa/DRI should be responsible for this as noted
above given the general modifier workflow/design.

Most importantly, options I've considered for fixing from the kernel side:

-Accept "legacy" modifiers in nouveau in addition to the new modifiers,
though avoid reporting them to userspace as supported to avoid further
proliferation. This is pretty straightforward. I'll need to modify
both the AddFB2 handler (nouveau_validate_decode_mod) and the mode set
plane validation logic (nv50_plane_format_mod_supported), but it should
end up just being a few lines of code.

I do think that they should also be reported to userspace if they are
accepted. Other users can and do look at the modifier list to see if
the buffer is acceptable for a given plane, so the consistency is good
here. Of course, in Mesa you would want to prioritise the new
modifiers over the legacy ones, and not allocate or return the legacy
ones unless that was all you were asked for. This would involve
tracking the used modifier explicitly through Mesa, rather than
throwing it away at alloc time and then later divining it from the
tiling mode.

Reporting them as supported is equivalent to reporting support for a memory layout the chips don't actually support (It corresponds to a valid layout on Tegra chips, but not on discrete NV chips). This is what the new modifiers are trying to avoid in the first place: Implying buffers can be shared between these Tegra chips and discrete NV GPUs.

Thanks,
-James

Cheers,
Daniel