Re: [PATCH v2] Revert "mm: skip CMA pages when they are not available"

From: 刘海龙(LaoLiu)
Date: Tue Mar 19 2024 - 08:27:33 EST


On 2024/3/19 19:09, Barry Song wrote:
> On Tue, Mar 19, 2024 at 4:56 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>>
>> On Fri 15-03-24 16:18:03, liuhailong@xxxxxxxx wrote:
>>> From: "Hailong.Liu" <liuhailong@xxxxxxxx>
>>>
>>> This reverts
>>> commit b7108d66318a ("Multi-gen LRU: skip CMA pages when they are not eligible")
>>> commit 5da226dbfce3 ("mm: skip CMA pages when they are not available")
>>>
>>> skip_cma may cause system not responding. if cma pages is large in lru_list
>>> and system is in lowmemory, many tasks would direct reclaim and waste
>>> cpu time to isolate_lru_pages and return.
>>>
>>> Test this patch on android-5.15 8G device
>>> reproducer:
>>> - cma_declare_contiguous 3G pages
>>> - set /proc/sys/vm/swappiness 0 to enable direct_reclaim reclaim file
>>> only.
>>> - run a memleak process in userspace
>>
>> Does this represent a sane configuration? CMA memory is unusable for
>> kernel allocations and memleak process is also hard to reclaim due to
>> swap suppression. Isn't such a system doomed to struggle to reclaim any
>> memory?
Yes, All processes in the system are also hard to reclaim. and all processes
enter direct reclaim. with this patch, much of process which should skip_cma
would retry, scan, skipped in the process of isolsate_lru_pages. and system
process will have high priority, some normal processes (like kswapd) are
preempted.


Btw. how does the same setup behave with the regular LRU
>> implementation? My guess would be that it would struggle as well.
>
> I assume the regular LRU implementation you are talking about is the LRU
> without skip_cma()?
>
> I remember Hailong mentioned something like " it also trigger memory psi
> event to allow admin do something to release memory" and " without
> patch the devices would kill camera process". So it seems the difference
> is if a killing will occur.
>
> Hailong, would you like to provide more detail?

psi_event triggered after psi_memstall_leave. much system processes
perform_reclaim scan and skipped and leave without reclaim any pages.
the process is fast, so lmkd (userspace lowmemory killer) could not work
as before.

>
>> --
>> Michal Hocko
>> SUSE Labs
>>