Re: [PATCHSET cgroup/for-3.12] cgroup: make cgroup_event specific tomemcg

From: Michal Hocko
Date: Mon Aug 05 2013 - 15:16:52 EST

On Mon 05-08-13 12:29:58, Tejun Heo wrote:
> Hello, Michal.
> On Mon, Aug 05, 2013 at 06:01:07PM +0200, Michal Hocko wrote:
> > Could you be more specific about what is so "overboard" about this
> > interface? I am not familiar with internals much, so I cannot judge the
> > complexity part, but I thought that eventfd was intended for this kind
> > of kernel->userspace notifications.
> It's just way over-engineered like many other things in cgroup, most
> likely misguided by the appearance that cgroup could be delegated and
> accessed by multiple actors concurrently.

I keep hearing that over and over. And I also keep hearing that there
are users who do not like many simplifications because they are breaking
their usecases. Users are those who matter to me. Hey some of them are
even sane...

> The most clear example would be the vmpressure event. When it could
> have just called fsnotify_modify() unconditionally when the state
> changes, now it involves parsing, dynamic list of events and so on
> without actually adding any benefits.

I am neither author nor user of this interface but my understanding is
that there are different requirements from different usecases and it
would be hard to satisfy them without having a way for userspace
to tell the kernel what it is interested in. There was a discussion
about edge vs. all-events triggered signaling recently for example.

Besides that, is fsnotify really an interface to be used under memory
pressure? I might be wrong but from a quick look fsnotify depends on
GFP_KERNEL allocation which would be no-go for oom_control at least.
How does the reclaim context gets to struct file to notify? I am pretty
sure we would get to more and more questions when digging further.

I am all for simplifications, but removing interfaces just because you
feel they are "over-done" is not a way to go IMHO. In this particular
case you are removing an interface from cgroup core which has users,
and will have to support them for very long time. "It is just memcg
so move it there" is not a way that different subsystems should work
together and I am _not_ going to ack such a move. All the flexibility that
you are so complaining about is hidden from the cgroup core in register
callbacks and the rest is only the core infrastructure (registration and

And btw. a common notification interface at least makes things
consistent and prevents controllers to invent their one purpose

So I am really skeptical about this patch set. It doesn't give anything.
It just moves a code which you do not like out of your sight hoping that
something will change.

There were mistakes done in the past. And some interfaces are really too
flexible but that doesn't mean we should be militant about everything.

> For the usage ones, configurability makes some sense but even then
> just giving it a single array of event points of limited size would be
> sufficient.

This would be a question for users. I am not one of those so I cannot
tell you but I certainly cannot claim that something more coarse would
be sufficient either.

> It's just way over-done.

> > So you think that vmpressure, oom notification or thresholds are
> > an abuse of this interface? What would you consider a reasonable
> > replacement for those notifications? Or do you think that controller
> > shouldn't be signaling any conditions to the userspace at all?
> I don't think the ability to generate events are an abuse, just that
> the facility itself is way over-engineered. Just generate a file
> changed event unconditionally for vmpressure and oom and maybe
> implement configureable cadence or single set of threshold array for
> threshold events. These are things which can and should be done in a
> a few tens of lines of code with far simpler interface.

These are strong words without any justification.

Michal Hocko
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at