Re: [PATCH] x86,seccomp,prctl: Remove PR_TSC_SIGSEGV and seccomp TSC filtering

From: Peter Zijlstra
Date: Fri Oct 03 2014 - 16:43:09 EST


On Fri, Oct 03, 2014 at 01:22:22PM -0700, Andy Lutomirski wrote:
> > So the problem with the default deny is that its:
> > 1) pointless -- the attacker can do sys_perf_event_open() just fine;
>
> Not if the attacker is in a seccomp sandbox.

Clearly :-)

> > 2) and expensive -- the people trying to measure performance get the
> > penalty of the CR4 write.
>
> Does this matter for performance measuring? I'm not 100% clear on how all
> the perf_event stuff gets used in practice, but, by my very vague
> understanding, there are two main workflows:
>
> a) perf record, etc: one process creates a ringbuffer and wakes up
> rarely to record the contents. The process being recorded doesn't
> have a perf_event mapped, so the cr4 switch will only happen when
> waking up the perf process.
>
> perf record prints stuff like "[ perf record: Woken up 1 times to
> write data ]", which seems to confirm my understanding.
>
> b) self-monitoring. A task mmaps a perf_event, does rdpmc, does
> something, and does rdpmc again. In that case, there's no context
> switch.

Agreed, but with the default disable, the self-monitoring task will get
CR4 writes to its context switches and will be slower.

Whereas if we default on and only disable with seccomp then only the
seccomp tasks will suffer the burden of a CR4 write at context switch
time.

And I don't see a security implication in the default.

> > So I would suggest a default on, but allow a disable for the seccomp
> > users, which might have also disabled the syscall. Note that is is
> > possible to disable RDPMC while still allowing the syscall.
>
> Disabling RDPMC per-process while still allowing the syscall will need
> a bunch of work, right?

Not much, all you need to do is make
perf_event_mmmap_page::cap_user_rdpmc be 0 and userspace should never
use rdpmc. Currently we set that bit based on the global sysfs rdpmc
value, but we could easy bitwise AND it with a per process value.

> What happens if the same perf_event is mapped
> by two different users?

Not entirely clear on what you mean here.

> We could make the rule be that RDPMC is enabled if a perf event is
> mmapped or TIF_SECCOMP is clear, but I'd prefer to be convinced that
> there's an actual performance issue first. Ideally we can get this
> all working with no API or ABI change at all.

Well I just want to not have extra CR4 writes by default (and by default
I don't care about seccomp).

> P.S. Hey, Intel, let us context switch RDPMC accessibility of the
> individual counters, please :)

Ha sure, make it more complicated why don't you ;-) But yes, that has
come up before.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/