Re: [PATCH 3/4] mm/hmm: HMM should have a callback before MM is destroyed

From: John Hubbard
Date: Thu Mar 15 2018 - 21:17:30 EST


On 03/15/2018 05:54 PM, Jerome Glisse wrote:
> On Thu, Mar 15, 2018 at 03:48:29PM -0700, Andrew Morton wrote:
>> On Thu, 15 Mar 2018 14:36:59 -0400 jglisse@xxxxxxxxxx wrote:
>>
>>> From: Ralph Campbell <rcampbell@xxxxxxxxxx>
>>>
>>> The hmm_mirror_register() function registers a callback for when
>>> the CPU pagetable is modified. Normally, the device driver will
>>> call hmm_mirror_unregister() when the process using the device is
>>> finished. However, if the process exits uncleanly, the struct_mm
>>> can be destroyed with no warning to the device driver.
>>
>> The changelog doesn't tell us what the runtime effects of the bug are.
>> This makes it hard for me to answer the "did Jerome consider doing
>> cc:stable" question.
>
> The impact is low, they might be issue only if application is kill,
> and we don't have any upstream user yet hence why i did not cc
> stable.
>

Hi Jerome and Andrew,

I'd claim that it is not possible to make a safe and correct device
driver, without this patch. That's because, without the .release callback
that you're adding here, the driver could end up doing operations on a
stale struct_mm, leading to crashes and other disasters.

Even if people think that maybe that window is "small", it's not really
any smaller than lots of race condition problems that we've seen. And
it is definitely not that hard to hit it: just a good directed stress
test involving multiple threads that are doing early process termination
while also doing lots of migrations and page faults, should suffice.

It is probably best to add this patch to stable, for that reason.

thanks,
--
John Hubbard
NVIDIA