Re: [PATCH 2/4] mm/memory-failure.c: fix wrong user reference report

From: Miaohe Lin
Date: Mon Mar 07 2022 - 06:31:46 EST


On 2022/3/4 16:27, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Feb 28, 2022 at 10:02:43PM +0800, Miaohe Lin wrote:
>> The dirty swapcache page is still residing in the swap cache after it's
>> hwpoisoned. So there is always one extra refcount for swap cache.
>
> The diff seems fine at a glance, but let me have a few question to
> understand the issue more.
>
> - Is the behavior described above the effect of recent change on shmem where
> dirty pagecache is pinned on hwpoison (commit a76054266661 ("mm: shmem:
> don't truncate page if memory failure happens"). Or the older kernels
> behave as the same?
>
> - Is the behavior true for normal anonymous pages (not shmem pages)?
>

The behavior described above is aimed at swapcache not pagecache. So it should be
irrelevant with the recent change on shmem.

What I try to fix here is that me_swapcache_dirty holds one extra pin via SwapCache
regardless of the return value of delete_from_lru_cache. We should try to report more
accurate extra refcount for debugging purpose.

> I'm trying to test hwpoison hitting the dirty swapcache, but it seems that
> in my testing memory_faliure() fails with "hwpoison: unhandlable page"

Maybe memory_faliure is racing with page reclaim where page is isolated?

> warning at get_any_page(). So I'm still not sure that me_pagecache_dirty()
> fixes any visible problem.

IIUC, me_pagecache_dirty can't do much except for the corresponding address_space supporting
error_remove_page which can truncate the dirty pagecache page. But this may cause silent data
loss. It's better to keep the page stay in the pagecache until the file is truncated, hole
punched or removed as commit a76054266661 pointed out.

Thanks.

> > Thanks,
> Naoya Horiguchi
>
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>
>> ---
>> mm/memory-failure.c | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 0d7c58340a98..5f9503573263 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -984,7 +984,6 @@ static int me_pagecache_dirty(struct page_state *ps, struct page *p)
>> static int me_swapcache_dirty(struct page_state *ps, struct page *p)
>> {
>> int ret;
>> - bool extra_pins = false;
>>
>> ClearPageDirty(p);
>> /* Trigger EIO in shmem: */
>> @@ -993,10 +992,7 @@ static int me_swapcache_dirty(struct page_state *ps, struct page *p)
>> ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
>> unlock_page(p);
>>
>> - if (ret == MF_DELAYED)
>> - extra_pins = true;
>> -
>> - if (has_extra_refcount(ps, p, extra_pins))
>> + if (has_extra_refcount(ps, p, true))
>> ret = MF_FAILED;
>>
>> return ret;
>> --
>> 2.23.0