Re: [PATCH] sched: psi: fix unprivileged polling against cgroups

From: Suren Baghdasaryan
Date: Thu Oct 26 2023 - 12:55:40 EST


On Thu, Oct 26, 2023 at 9:49 AM Luca Boccassi <bluca@xxxxxxxxxx> wrote:
>
> On Thu, 26 Oct 2023 at 17:41, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> >
> > 519fabc7aaba ("psi: remove 500ms min window size limitation for
> > triggers") breaks unprivileged psi polling on cgroups.
> >
> > Historically, we had a privilege check for polling in the open() of a
> > pressure file in /proc, but were erroneously missing it for the open()
> > of cgroup pressure files.
> >
> > When unprivileged polling was introduced in d82caa273565 ("sched/psi:
> > Allow unprivileged polling of N*2s period"), it needed to filter
> > privileges depending on the exact polling parameters, and as such
> > moved the CAP_SYS_RESOURCE check from the proc open() callback to
> > psi_trigger_create(). Both the proc files as well as cgroup files go
> > through this during write(). This implicitly added the missing check
> > for privileges required for HT polling for cgroups.
> >
> > When 519fabc7aaba ("psi: remove 500ms min window size limitation for
> > triggers") followed right after to remove further restrictions on the
> > RT polling window, it incorrectly assumed the cgroup privilege check
> > was still missing and added it to the cgroup open(), mirroring what we
> > used to do for proc files in the past.
> >
> > As a result, unprivileged poll requests that would be supported now
> > get rejected when opening the cgroup pressure file for writing.

Ah, I see the problem. In our discussion
https://lore.kernel.org/all/ZADj4YX4uftK%2FFrh@xxxxxxxxxxx/ we decided
to have the check in open() to fail early but we never considered
unprivileged processes which only poll and never create any triggers.
Makes sense.

> >
> > Remove the cgroup open() check. psi_trigger_create() handles it.
> >
> > Fixes: 519fabc7aaba ("psi: remove 500ms min window size limitation for triggers")
> > Cc: stable@xxxxxxxxxxxxxxx # 6.5+
> > Reported-by: Luca Boccassi <bluca@xxxxxxxxxx>
> > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>
> Acked-by: Luca Boccassi <bluca@xxxxxxxxxx>

Acked-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>

>
> Thank you very much for the quick fix - this was reported originally
> on the systemd bug tracker by Daniel Black (I do not have an email
> address):
>
> https://github.com/systemd/systemd/issues/29723
>
> It is very important for systemd services to be able to do this
> without capabilities, as using capabilities means in turn user
> namespaces cannot be used (PrivateUsers=yes in systemd parlance).
>
> Kind regards,
> Luca Boccassi