Re: [PATCH mm-unstable] mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE)

From: Yang Shi
Date: Thu Aug 04 2022 - 13:46:32 EST


On Tue, Aug 2, 2022 at 12:43 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
>
> On Tue, Aug 2, 2022 at 5:04 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Tue 02-08-22 02:48:33, Zach O'Keefe wrote:
> > [...]
> > > "mm/madvise: add MADV_COLLAPSE to process_madvise()" in the v7 series
> > > ended with me mentioning a couple options, but ultimately I didn't
> > > present a solution, and no consensus was reached[1]. After taking a
> > > closer look, this is my proposal for what I believe to be the best
> > > path forward. It should be squashed into the original patch. What do you think?
> >
> > If it is agreed that the CAP_SYS_ADMIN is too strict of a requirement
> > then yes, this should be squashed into the original patch. There is no
> > real reason to create a potential bisection headache by changing the
> > permission model in a later patch.
>
> Sorry about the confusion here. Assumed (incorrectly) that Andrew
> would kindly squash this in mm-unstable since I added the Fixes: tag.
> Next time I'll add some explicit verbiage saying it should be
> squashed.
>
> > From my POV, I would agree that CAP_SYS_ADMIN is just too strict of a
> > requirement.
> >
> > I didn't really have time to follow recent discussions but I would argue
> > that the operation is not really destructive or seriously harmful. All
> > applications can already have their memory (almost) equally THP
> > collapsed by khupaged with the proposed process_madvise semantic.
> >
> > NOHUGEMEM and prctl opt out from THP are both honored AFAIU and the only
> > difference is the global THP killswitch behavior which I do not think
> > warrants the strongest CAP_SYS_ADMIN capability (especially because it
> > doesn't really control all kinds of THPs).
>
> Ya. In fact, I don't think the ignoring the THP sysfs controls
> warrants any additional capability (set alone CAPS_SYS_ADMIN), since a
> malicious program can't really inflict any more damage than they would
> with CAP_SYS_NICE and PTRACE_MODE_READ.
>
> > If there is a userspace agent collapsing memory and causing problems
> > then it can be easily fixed in the userspace. And I find that easier
> > to do than putting the bar so high that userspace agents would be
> > unfeasible because of CAP_SYS_ADMIN (which is nono in many cases as it
> > would allow essentially full control of other stuff). So from practical
> > POV, risking an extended RSS is really a negligible risk to lose a
> > potentially useful feature for all others.
> >
>
> Agreed.

+1

>
> Thanks for taking the time, Michal!
> Zach
>
>
> > Just my 2c
> >
> > > Thanks again,
> > > Zach
> > >
> > > [1] https://lore.kernel.org/linux-mm/Ys4aTRqWIbjNs1mI@xxxxxxxxxx/
> >
> > --
> > Michal Hocko
> > SUSE Labs