Re: [RFC] mm/vmscan.c: avoid possible long latency caused by too_many_isolated()

From: Tim Chen
Date: Thu Apr 22 2021 - 17:03:00 EST




On 4/22/21 1:57 PM, Yu Zhao wrote:
> On Thu, Apr 22, 2021 at 2:38 PM Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
>>
>>
>>
>> On 4/22/21 1:30 PM, Yu Zhao wrote:
>>>
>>> HZ/10 is purely arbitrary but that's ok because we assume normally
>>> nobody hits it. If you do often, we need to figure out why and how not
>>> to hit it so often.
>>>
>>
>> Perhaps Zhengjun can test the proposed fix in his test case to see if the timeout value
>> makes any difference.
>
> Shakeel has another test to stress page reclaim to a point that the
> kernel can livelock for two hours because of a large number of
> concurrent reclaimers stepping on each other. He might be able to
> share that test with you in case you are interested.

That will be great. Yes, we are interested to have the test.

Tim

>
> Also it's Hugh who first noticed that migration can isolate many pages
> and in turn block page reclaim. He might be able to help too, in case
> you are interested in the interaction between migration and page
> reclaim.
>
> Thanks.
>