Re: [PATCH v8 1/4] mm/madvise: pass task and mm to do_madvise

From: David Rientjes
Date: Wed Jun 24 2020 - 16:00:25 EST


On Mon, 22 Jun 2020, Minchan Kim wrote:

> Patch series "introduce memory hinting API for external process", v8.
>
> Now, we have MADV_PAGEOUT and MADV_COLD as madvise hinting API. With
> that, application could give hints to kernel what memory range are
> preferred to be reclaimed. However, in some platform(e.g., Android), the
> information required to make the hinting decision is not known to the app.
> Instead, it is known to a centralized userspace daemon(e.g.,
> ActivityManagerService), and that daemon must be able to initiate reclaim
> on its own without any app involvement.
>
> To solve the concern, this patch introduces new syscall -
> process_madvise(2). Bascially, it's same with madvise(2) syscall but it
> has some differences.
>
> 1. It needs pidfd of target process to provide the hint
>
> 2. It supports only MADV_{COLD|PAGEOUT|MERGEABLE|UNMEREABLE} at this
> moment. Other hints in madvise will be opened when there are explicit
> requests from community to prevent unexpected bugs we couldn't support.
>
> 3. Only privileged processes can do something for other process's
> address space.
>
> For more detail of the new API, please see "mm: introduce external memory
> hinting API" description in this patchset.
>
> This patch (of 4):
>
> In upcoming patches, do_madvise will be called from external process
> context so we shouldn't asssume "current" is always hinted process's
> task_struct.
>
> Furthermore, we must not access mm_struct via task->mm, but obtain it
> via access_mm() once (in the following patch) and only use that pointer
> [1], so pass it to do_madvise() as well. Note the vma->vm_mm pointers
> are safe, so we can use them further down the call stack.
>
> And let's pass *current* and current->mm as arguments of do_madvise so
> it shouldn't change existing behavior but prepare next patch to make
> review easy.
>
> Note: io_madvise passes NULL as target_task argument of do_madvise because
> it couldn't know who is target.
>
> [1] http://lore.kernel.org/r/CAG48ez27=pwm5m_N_988xT1huO7g7h6arTQL44zev6TD-h-7Tg@xxxxxxxxxxxxxx
>
> [vbabka@xxxxxxx: changelog tweak]
> [minchan@xxxxxxxxxx: use current->mm for io_uring]
> Link: http://lkml.kernel.org/r/20200423145215.72666-1-minchan@xxxxxxxxxx
> [akpm@xxxxxxxxxxxxxxxxxxxx: fix it for upstream changes]
> [akpm@xxxxxxxxxxxxxxxxxxxx: whoops]
> [rdunlap@xxxxxxxxxxxxx: add missing includes]
> Link: http://lkml.kernel.org/r/20200302193630.68771-2-minchan@xxxxxxxxxx
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: Tim Murray <timmurray@xxxxxxxxxx>
> Cc: Daniel Colascione <dancol@xxxxxxxxxx>
> Cc: Sandeep Patil <sspatil@xxxxxxxxxx>
> Cc: Sonny Rao <sonnyrao@xxxxxxxxxx>
> Cc: Brian Geffon <bgeffon@xxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>
> Cc: John Dias <joaodias@xxxxxxxxxx>
> Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> Cc: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>
> Cc: SeongJae Park <sj38.park@xxxxxxxxx>
> Cc: Christian Brauner <christian@xxxxxxxxxx>
> Cc: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>
> Cc: Oleksandr Natalenko <oleksandr@xxxxxxxxxx>
> Cc: SeongJae Park <sjpark@xxxxxxxxx>
> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx>
> Cc: <linux-man@xxxxxxxxxxxxxxx>

Acked-by: David Rientjes <rientjes@xxxxxxxxxx>