Re: [PATCH] amdgpu_dm: fix nonblocking atomic commit use-after-free

From: Kazlauskas, Nicholas
Date: Tue Jul 28 2020 - 13:07:24 EST


On 2020-07-28 5:22 a.m., Paul Menzel wrote:
Dear Linux folks,


Am 25.07.20 um 07:20 schrieb Mazin Rezk:
On Saturday, July 25, 2020 12:59 AM, Duncan wrote:

On Sat, 25 Jul 2020 03:03:52 +0000 Mazin Rezk wrote:

Am 24.07.20 um 19:33 schrieb Kees Cook:

There was a fix to disable the async path for this driver that
worked around the bug too, yes? That seems like a safer and more
focused change that doesn't revert the SLUB defense for all
users, and would actually provide a complete, I think, workaround

That said, I haven't seen the async disabling patch. If you could
link to it, I'd be glad to test it out and perhaps we can use that
instead.

I'm confused. Not to put words in Kees' mouth; /I/ am confused (which
admittedly could well be just because I make no claims to be a
coder and am simply reading the bug and thread, but I'd appreciate some
"unconfusing" anyway).

My interpretation of the "async disabling" reference was that it was to
comment #30 on the bug:

https://bugzilla.kernel.org/show_bug.cgi?id=207383#c30

... which (if I'm not confused on this point too) appears to be yours.
There it was stated...

I've also found that this bug exclusively occurs when commit_work is on
the workqueue. After forcing drm_atomic_helper_commit to run all of the
commits without adding to the workqueue and running the OS, the issue
seems to have disappeared.
<<<<

Would not forcing all commits to run directly, without placing them on
the workqueue, be "async disabling"? That's what I /thought/ he was
referencing.

Oh, I thought he was referring to a different patch. Kees, could I get
your confirmation on this?

The change I made actually affected all of the DRM code, although this could
easily be changed to be specific to amdgpu. (By forcing blocking on
amdgpu_dm's non-blocking commit code)

That said, I'd still need to test further because I only did test it for a
couple of hours then. Although it should work in theory.

OTOH your base/context swap idea sounds like a possibly "less
disturbance" workaround, if it works, and given the point in the
commit cycle... (But if it's out Sunday it's likely too late to test
and get it in now anyway; if it's another week, tho...)

The base/context swap idea should make the use-after-free behave how it
did in 5.6. Since the bug doesn't cause an issue in 5.6, it's less of a
"less disturbance" workaround and more of a "no disturbance" workaround.

Sorry for bothering, but is there now a solution, besides reverting the commits, to avoid freezes/crashes *without* performance regressions?


Kind regards,

Paul

Mazin's "drm/amd/display: Clear dm_state for fast updates" change accomplishes this, at least as a temporary hack.

I've started work on a more large scale fix that we could get in in after.

Regards,
Nicholas Kazlauskas