Re: [PATCH] x86,seccomp,prctl: Remove PR_TSC_SIGSEGV and seccomp TSC filtering

From: Andy Lutomirski
Date: Mon Oct 06 2014 - 12:44:54 EST


On Sat, Oct 4, 2014 at 1:13 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Oct 03, 2014 at 02:15:24PM -0700, Andy Lutomirski wrote:
>> On Fri, Oct 3, 2014 at 2:12 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Fri, Oct 03, 2014 at 02:04:53PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Oct 3, 2014 at 2:02 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> >
>> >> > Something like so.. slightly less ugly and possibly with more
>> >> > complicated conditions setting the cr4 if you want to fix tsc vs seccomp
>> >> > as well.
>> >>
>> >> This will crash anything that tries rdpmc in an allow-everything
>> >> seccomp sandbox. It's also not very compatible with my grand scheme
>> >> of allowing rdtsc to be turned off without breaking clock_gettime. :)
>> >
>> > Well, we clear cap_user_rdpmc, so everybody who still tries it gets what
>> > he deserves, no problem there.
>>
>> Oh, interesting.
>>
>> To continue playing devil's advocate, what if you do perf_event_open,
>> then mmap it, then start the seccomp sandbox?
>
> We update that cap bit on every update to the self-monitor state, and in
> a perfect world people would also check the cap bit every time they try
> and read it, and fall back to the syscall. So we could just clear it..
> but I can imagine reality ruining things here.

If nothing else, the fact that rdpmc fails with SIGSEGV instead of
with some nonsense value means that this will always be racy.

>
>> My draft patches are currently tracking the number of perf_event mmaps
>> per mm. I'm not thrilled with it, but it's straightforward. And I
>> still need to benchmark cr4 writes, which is tedious, because I can't
>> do it from user code.
>
> Should be fairly straight fwd from kernel space, get a tsc stamp,
> read+write cr4 1000 times, get another tsc read, and maybe do that
> several times. No?

I tried it. Rough numbers on my 2.7 GHz Sandy Bridge laptop

Writing to cr4 in VMX non-root (changing PCE) takes ~48ns. RMW cr4
takes rougly 51ns. IMO neither of these is enough to be worth
worrying *that* much about when switching into or out of a perf-using
task. But you might disagree with me.

Changing TSD takes 700ns, because KVM has the VMCS programmed wrong.
I'll send a patch.

I suspect that the same experiment on bare metal would run faster.


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/