Re: [PATCH] x86, kasan: add KASAN checks to atomic operations

From: Dmitry Vyukov
Date: Mon Mar 06 2017 - 09:25:23 EST


On Mon, Mar 6, 2017 at 2:01 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, Mar 06, 2017 at 01:58:51PM +0100, Peter Zijlstra wrote:
>> On Mon, Mar 06, 2017 at 01:50:47PM +0100, Dmitry Vyukov wrote:
>> > On Mon, Mar 6, 2017 at 1:42 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> > > KASAN uses compiler instrumentation to intercept all memory accesses.
>> > > But it does not see memory accesses done in assembly code.
>> > > One notable user of assembly code is atomic operations. Frequently,
>> > > for example, an atomic reference decrement is the last access to an
>> > > object and a good candidate for a racy use-after-free.
>> > >
>> > > Add manual KASAN checks to atomic operations.
>> > > Note: we need checks only before asm blocks and don't need them
>> > > in atomic functions composed of other atomic functions
>> > > (e.g. load-cmpxchg loops).
>> >
>> > Peter, also pointed me at arch/x86/include/asm/bitops.h. Will add them in v2.
>> >
>>
>> > > static __always_inline void atomic_add(int i, atomic_t *v)
>> > > {
>> > > + kasan_check_write(v, sizeof(*v));
>> > > asm volatile(LOCK_PREFIX "addl %1,%0"
>> > > : "+m" (v->counter)
>> > > : "ir" (i));
>>
>>
>> So the problem is doing load/stores from asm bits, and GCC
>> (traditionally) doesn't try and interpret APP asm bits.
>>
>> However, could we not write a GCC plugin that does exactly that?
>> Something that interprets the APP asm bits and generates these KASAN
>> bits that go with it?
>
> Another suspect is the per-cpu stuff, that's all asm foo as well.


+x86, Mark

Let me provide more context and design alternatives.

There are also other archs, at least arm64 for now.
There are also other tools. For KTSAN (race detector) we will
absolutely need to hook into atomic ops. For KMSAN (uses of unit
values) we also need to understand atomic ops at least to some degree.
Both of them will require different instrumentation.
For KASAN we are also more interested in cases where it's more likely
that an object is touched only by an asm, but not by normal memory
accesses (otherwise we would report the bug on the normal access,
which is fine, this makes atomic ops stand out in my opinion).

We could involve compiler (and by compiler I mean clang, because we
are not going to touch gcc, any volunteers?).
However, it's unclear if it will be simpler or not. There will
definitely will be a problem with uaccess asm blocks. Currently KASAN
relies of the fact that it does not see uaccess accesses and the user
addresses are considered bad by KASAN. There can also be a problem
with offsets/sizes, it's not possible to figure out what exactly an
asm block touches, we can only assume that it directly dereferences
the passed pointer. However, for example, bitops touch the pointer
with offset. Looking at the current x86 impl, we should be able to
handle it because the offset is computed outside of asm blocks. But
it's unclear if we hit this problem in other places.
I also see that arm64 bitops are implemented in .S files. And we won't
be able to instrument them in compiler.
There can also be other problems. Is it possible that some asm blocks
accept e.g. physical addresses? KASAN would consider them as bad.

We could also provide a parallel implementation of atomic ops based on
the new compiler builtins (__atomic_load_n and friends):
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
and enable it under KSAN. The nice thing about it is that it will
automatically support arm64 and KMSAN and KTSAN.
But it's more work.

Re per-cpu asm. I would say that it's less critical than atomic ops.
Static per-cpu slots are not subject to use-after-free. Dynamic slots
can be subject to use-after-free and it would be nice to catch bugs
there. However, I think we will need to add manual
poisoning/unpoisoning of dynamic slots as well.

Bottom line:
1. Involving compiler looks quite complex, hard to deploy, and it's
unclear if it will actually make things easier.
2. This patch is the simplest short-term option (I am leaning towards
adding bitops to this patch and leaving percpu out for now).
3. Providing an implementation of atomic ops based on compiler
builtins looks like a nice option for other archs and tools, but is
more work. If you consider this as a good solution, we can move
straight to this option.