Re: [PATCH mm-unstable] mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE)

From: Michal Hocko
Date: Tue Aug 02 2022 - 08:04:22 EST


On Tue 02-08-22 02:48:33, Zach O'Keefe wrote:
[...]
> "mm/madvise: add MADV_COLLAPSE to process_madvise()" in the v7 series
> ended with me mentioning a couple options, but ultimately I didn't
> present a solution, and no consensus was reached[1]. After taking a
> closer look, this is my proposal for what I believe to be the best
> path forward. It should be squashed into the original patch. What do you think?

If it is agreed that the CAP_SYS_ADMIN is too strict of a requirement
then yes, this should be squashed into the original patch. There is no
real reason to create a potential bisection headache by changing the
permission model in a later patch.

>From my POV, I would agree that CAP_SYS_ADMIN is just too strict of a
requirement.

I didn't really have time to follow recent discussions but I would argue
that the operation is not really destructive or seriously harmful. All
applications can already have their memory (almost) equally THP
collapsed by khupaged with the proposed process_madvise semantic.

NOHUGEMEM and prctl opt out from THP are both honored AFAIU and the only
difference is the global THP killswitch behavior which I do not think
warrants the strongest CAP_SYS_ADMIN capability (especially because it
doesn't really control all kinds of THPs).

If there is a userspace agent collapsing memory and causing problems
then it can be easily fixed in the userspace. And I find that easier
to do than putting the bar so high that userspace agents would be
unfeasible because of CAP_SYS_ADMIN (which is nono in many cases as it
would allow essentially full control of other stuff). So from practical
POV, risking an extended RSS is really a negligible risk to lose a
potentially useful feature for all others.

Just my 2c

> Thanks again,
> Zach
>
> [1] https://lore.kernel.org/linux-mm/Ys4aTRqWIbjNs1mI@xxxxxxxxxx/

--
Michal Hocko
SUSE Labs