Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

From: Thomas Gleixner
Date: Fri Sep 28 2018 - 14:20:39 EST


Alexey,

On Fri, 28 Sep 2018, Alexey Budankov wrote:
> On 28.09.2018 17:02, Thomas Gleixner wrote:
> > But this does also require a proper analysis and documentation why it is
> > not a security risk to expose the i915 PMU or what potential security
> > issues this can create, so that the competent sysadmin can make a
> > judgement.
> >
> > And the same is required for all other PMUs which can be enabled in the
> > same way for unprivileged access. And we might as well come to the
> > conclusion via analysis that for some PMUs unpriviledged access is just not
> > a good idea and exclude them. I surely know a few which qualify for
> > exclusion, so the right approach is to provide this knob only when the risk
> > is analyzed and documented and the PMU has been flagged as candidate for
> > unpriviledged exposure. I.e. opt in and not opt out.
>
> If you ask me then, IMHO, unprivileged access to CBOX pmu looks unsafe
> and is now governed by traditional *core* perf_event_paranoid setting.
>
> But *core* paranoid >= 1 (per-process mode) prevents simultaneous perf
> record sampling and perf stat -I reading from IMC, UPI, PCIe and other
> uncore counters.
>
> This kind of monitoring could make process performance observability
> thru Perf subsystem more flexible and better tailored for cloud and
> cluster environments. However it requires fine-tuning control
> capabilities in order to still keep system as secure as possible.
>
> Could i915, IMC, UPI, PCIe pmus be safe enough to be governed by
> a separate perf_event_paranoid settings?

Just to make it clear. I'm not against separate settings at all. But I'm
against adding knobs for every PMU the kernel supports wholesale without
analysis and documentation just because we can and somebody wants it.

Right now we have a single knob, which is poorly documented and that should
be fixed first. But some googling gives you the information that allowing
unprivilegded access is a security risk. So the security focussed sysadmin
will deny access to the PMUs no matter what.

Now we add more knobs without documentation what the exposure and risk of
each PMU is. The proposed patch set does this wholesale for every PMU
supported by the kernel. So the GPU user will ask the sysadmin to allow him
access. How can he make an informed decision? If he grants it then the next
user comes around and wants it for something else arguing that the other
got it for the GPU already. How can he make an informed decision about
that one?

We provide the knobs, so it's also our responsibility towards our users to
give them the information about their usage and scope. And every single PMU
knob has a different scope.

The documentation of the gazillion of knobs in /proc and /sysfs is not
really brilliant, but we should really not continue this bad practice
especially not when these knobs have potentially security relevant
implcations. Yes, I know, writing documentation is work, but it's valuable
and is appreciated by our users.

To make this doable and not blocked by requiring every PMU to be analyzed
and documented at once, I suggested to make this opt-in. Do analysis for a
given PMU, decide whether it should be exposed at all. If so, document it
proper and flip the bit. That way this can be done gradually as the need
arises and we can exclude the riskier ones completely.

I don't think this is an unreasonable request as it does not require the
i915 folks to look at PMUs they are not familiar with and does not get
blocked by waiting on every PMU wizard on the planet to have time.

Start with something like Documentation/admin-guide/perf-security.rst or
whatever name fits better and add a proper documentation for the existing
knob. With the infrastructure for fine grained access control add the
general explanation for fine grained access control. With each PMU which
opt's in for the knob, add a section with guidance about scope and risk for
this particular one.

Thanks,

tglx