Re: [PATCH v2] vmpressure: implement strict mode

From: Luiz Capitulino
Date: Thu Jun 27 2013 - 09:34:51 EST


On Thu, 27 Jun 2013 11:26:16 +0200
Michal Hocko <mhocko@xxxxxxx> wrote:

> On Wed 26-06-13 23:17:12, Luiz Capitulino wrote:
> > Currently, an eventfd is notified for the level it's registered for
> > _plus_ higher levels.
> >
> > This is a problem if an application wants to implement different
> > actions for different levels. For example, an application might want
> > to release 10% of its cache on level low, 50% on medium and 100% on
> > critical. To do this, an application has to register a different
> > eventfd for each pressure level. However, fd low is always going to
> > be notified and and all fds are going to be notified on level critical.
> >
> > Strict mode solves this problem by strictly notifiying an eventfd
> > for the pressure level it registered for. This new mode is optional,
> > by default we still notify eventfds on higher levels too.
> >
> > Signed-off-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx>
>
> Two nits bellow but it looks good in general to me. You can add my
> Reviewed-by: Michal Hocko <mhocko@xxxxxxx>

Thanks.

> I still think that edge triggering makes some sense but that one might
> be rebased on top of this patch. We should still figure out whether the
> edge triggering is the right approach for the use case Hyunhee Kim wants
> it for so the strict mode should go first IMO.

I also think that edge triggering makes sense.

> > ---
> >
> > o v2
> >
> > - Improve documentation
> > - Use a bit to store mode instead of a bool
> > - Minor changelog changes
> >
> > Documentation/cgroups/memory.txt | 26 ++++++++++++++++++++++----
> > mm/vmpressure.c | 26 ++++++++++++++++++++++++--
> > 2 files changed, 46 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> > index ddf4f93..412872b 100644
> > --- a/Documentation/cgroups/memory.txt
> > +++ b/Documentation/cgroups/memory.txt
> > @@ -791,6 +791,22 @@ way to trigger. Applications should do whatever they can to help the
> > system. It might be too late to consult with vmstat or any other
> > statistics, so it's advisable to take an immediate action.
> >
> > +Applications can also choose between two notification modes when
> > +registering an eventfd for memory pressure events:
> > +
> > +When in "non-strict" mode, an eventfd is notified for the specific level
> > +it's registered for and higher levels. For example, an eventfd registered
> > +for low level is also going to be notified on medium and critical levels.
> > +This mode makes sense for applications interested on monitoring reclaim
> > +activity or implementing simple load-balacing logic. The non-strict mode
> > +is the default notification mode.
> > +
> > +When in "strict" mode, an eventfd is strictly notified for the pressure
> > +level it's registered for. For example, an eventfd registered for the low
> > +level event is not going to be notified when memory pressure gets into
> > +medium or critical levels. This allows for more complex logic based on
> > +the actual pressure level the system is experiencing.
>
> It would be also fair to mention that there is no guarantee that lower
> levels are signaled before higher so nobody should rely on seeing LOW
> before MEDIUM or CRITICAL.

I think this is implied. Actually, as an user of this interface I didn't
expect this to happen until I read the code.

>
> > +
> > The events are propagated upward until the event is handled, i.e. the
> > events are not pass-through. Here is what this means: for example you have
> > three cgroups: A->B->C. Now you set up an event listener on cgroups A, B
> > @@ -807,12 +823,14 @@ register a notification, an application must:
> >
> > - create an eventfd using eventfd(2);
> > - open memory.pressure_level;
> > -- write string like "<event_fd> <fd of memory.pressure_level> <level>"
> > +- write string like "<event_fd> <fd of memory.pressure_level> <level> [strict]"
> > to cgroup.event_control.
> >
> > -Application will be notified through eventfd when memory pressure is at
> > -the specific level (or higher). Read/write operations to
> > -memory.pressure_level are no implemented.
> > +Applications will be notified through eventfd when memory pressure is at
> > +the specific level or higher. If strict is passed, then applications
> > +will only be notified when memory pressure reaches the specified level.
> > +
> > +Read/write operations to memory.pressure_level are no implemented.
> >
> > Test:
> >
> > diff --git a/mm/vmpressure.c b/mm/vmpressure.c
> > index 736a601..ba5c17e 100644
> > --- a/mm/vmpressure.c
> > +++ b/mm/vmpressure.c
> > @@ -138,8 +138,16 @@ struct vmpressure_event {
> > struct eventfd_ctx *efd;
> > enum vmpressure_levels level;
> > struct list_head node;
> > + unsigned int mode;
>
> You would fill up a hole between level and node if you move it up on
> 64b. Doesn't matter much but why not do it...

You want me to respin?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/