Re: [PATCH] mm/page_alloc: detect allocation forbidden by cpuset and bail out early

From: Feng Tang
Date: Wed Sep 08 2021 - 04:13:01 EST


On Wed, Sep 08, 2021 at 09:06:24AM +0200, Michal Hocko wrote:
> On Wed 08-09-21 09:50:14, Feng Tang wrote:
> > On Tue, Sep 07, 2021 at 10:44:32AM +0200, Michal Hocko wrote:
> [...]
> > > While this is a good fix from the functionality POV I believe you can go
> > > a step further. Please add a detection to the cpuset code and complain
> > > to the kernel log if somebody tries to configure movable only cpuset.
> > > Once you have that in place you can easily create a static branch for
> > > cpuset_insane_setup() and have zero overhead for all reasonable
> > > configuration. There shouldn't be any reason to pay a single cpu cycle
> > > to check for something that almost nobody does.
> > >
> > > What do you think?
> >
> > I thought about the implementation, IIUC, the static_branch_enable() is
> > easy, it could be done when cpuset.mems is set with movable only nodes,
> > but disable() is much complexer,
>
> Do we care about disable at all? The point is to not have 99,999999%
> users pay overhead of the check which is irrelevant to them. Once
> somebody wants to use this "creative" setup then paying an extra check
> sounds perfectly sensible to me. If somebody cares enough then the
> disable logic could be implemented. But for now I believe we should be
> OK with only enable case.

Makes sense to me, thanks!

> > as we may need a global reference
> > counter to track the set/unset, and the unset could be the time when
> > freeing the cpuset data structure, also one cpuset.mems could be changed
> > runtime, and system could have multiple cpuset dirs (user space usage
> > could be creative or crazy :)).
> >
> > While checking cpuset code, I thought more about configuring cpuset with
> > movable only nodes, that we may still have normal usage: mallocing a big
> > trunk of memory and do some scientific calculation, or AI training. It
> > works with current code.
>
> It might work but it would be inherently subtle because a single
> non-movable allocation will throw the whole thing off the cliff.

Yes, this is a valid concern.

Though I think when there is really usage reuqirement for cpuset
binding to HBM (High Bandwidth Memory) or PMEM, we may need to
reconsider to loose the limit for GFP_HIGHUSER, as the GFP_KERNEL
type of allocation is permitted.

Thanks,
Feng

> I do not think we want to even pretend we support such a setup.
> --
> Michal Hocko
> SUSE Labs