Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]

From: Jann Horn
Date: Fri Feb 21 2020 - 10:45:11 EST


On Fri, Feb 21, 2020 at 3:24 PM David Howells <dhowells@xxxxxxxxxx> wrote:
>
> Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> > > + if (!s->s_watchers) {
> >
> > READ_ONCE() ?
>
> I'm not sure it matters. It can only be set once, and the next time we read
> it we're inside the lock. And at this point, I don't actually dereference it,
> and if it's non-NULL, it's not going to change.

I'd really like these READ_ONCE() things to be *anywhere* the value
can concurrently change, for two reasons:

First, it tells the reader "keep in mind that this value may
concurrently change in some way, don't just assume that it'll stay the
same".

But also, it tells the compiler that if it generates multiple loads
here and assumes that they return the same value, *really* bad stuff
may happen. GCC has some really fun behavior when compiling a switch()
on a value that might change concurrently without using READ_ONCE():
It sometimes generates multiple loads, where the first load is used to
test whether the value is in a specific range and then the second load
is used for actually indexing into a table of jump destinations. If
the value is concurrently mutated from an in-bounds value to an
out-of-bounds value, this code will load a jump destination from
random out-of-bounds memory.

An example:

$ cat gcc-jump.c
int blah(int *x, int y) {
switch (*x) {
case 0: return y+1;
case 1: return y*2;
case 2: return y-3;
case 3: return y^1;
case 4: return y+6;
case 5: return y-5;
case 6: return y|1;
case 7: return y&4;
case 8: return y|5;
case 9: return y-3;
case 10: return y&8;
case 11: return y|9;
default: return y;
}
}
$ gcc-9 -O2 -c -o gcc-jump.o gcc-jump.c
$ objdump -dr gcc-jump.o
[...]
0000000000000000 <blah>:
0: 83 3f 0b cmpl $0xb,(%rdi)
3: 0f 87 00 00 00 00 ja 9 <blah+0x9>
5: R_X86_64_PC32 .text.unlikely-0x4
9: 8b 07 mov (%rdi),%eax
b: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # 12 <blah+0x12>
e: R_X86_64_PC32 .rodata-0x4
12: 48 63 04 82 movslq (%rdx,%rax,4),%rax
16: 48 01 d0 add %rdx,%rax
19: ff e0 jmpq *%rax
[...]


Or if you want to see a full example that actually crashes:

$ cat gcc-jump-crash.c
#include <pthread.h>

int mutating_number;

__attribute__((noinline)) int blah(int *x, int y) {
switch (*x) {
case 0: return y+1;
case 1: return y*2;
case 2: return y-3;
case 3: return y^1;
case 4: return y+6;
case 5: return y-5;
case 6: return y|1;
case 7: return y&4;
case 8: return y|5;
case 9: return y-3;
case 10: return y&8;
case 11: return y|9;
default: return y;
}
}

int blah_num;
void *thread_fn(void *dummy) {
while (1) {
blah_num = blah(&mutating_number, blah_num);
}
}

int main(void) {
pthread_t thread;
pthread_create(&thread, NULL, thread_fn, NULL);
while (1) {
*(volatile int *)&mutating_number = 1;
*(volatile int *)&mutating_number = 100000000;
}
}
$ gcc-9 -O2 -pthread -o gcc-jump-crash gcc-jump-crash.c -ggdb -Wall
$ gdb ./gcc-jump-crash
[...]
(gdb) run
[...]
Thread 2 "gcc-jump-crash" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7db6700 (LWP 33237)]
0x00005555555551a2 in blah (x=0x555555558034 <mutating_number>, y=0)
at gcc-jump-crash.c:6
6 switch (*x) {
(gdb) x/10i blah
0x555555555190 <blah>: cmp DWORD PTR [rdi],0xb
0x555555555193 <blah+3>: ja 0x555555555050 <blah+4294966976>
0x555555555199 <blah+9>: mov eax,DWORD PTR [rdi]
0x55555555519b <blah+11>: lea rdx,[rip+0xe62] # 0x555555556004
=> 0x5555555551a2 <blah+18>: movsxd rax,DWORD PTR [rdx+rax*4]
0x5555555551a6 <blah+22>: add rax,rdx
0x5555555551a9 <blah+25>: jmp rax
0x5555555551ab <blah+27>: nop DWORD PTR [rax+rax*1+0x0]
0x5555555551b0 <blah+32>: lea eax,[rsi-0x3]
0x5555555551b3 <blah+35>: ret
(gdb)


Here's a presentation from Felix Wilhelm, a security researcher who
managed to find a case in the Xen hypervisor where a switch() on a
value in shared memory was exploitable to compromise the hypervisor
from inside a guest (see slides 35 and following):
<https://www.blackhat.com/docs/us-16/materials/us-16-Wilhelm-Xenpwn-Breaking-Paravirtualized-Devices.pdf>

I realize that a compiler is extremely unlikely to make such an
optimization decision for a simple "if (!a->b)" branch; but still, I
would prefer to have READ_ONCE() everywhere where it is semantically
required, not just everywhere where you can think of a concrete
compiler optimization that will break stuff.

> > > + ret = add_watch_to_object(watch, s->s_watchers);
> > > + if (ret == 0) {
> > > + spin_lock(&sb_lock);
> > > + s->s_count++;
> > > + spin_unlock(&sb_lock);
> >
> > Where is the corresponding decrement of s->s_count? I'm guessing that
> > it should be in the ->release_watch() handler, except that there isn't
> > one...
>
> Um. Good question. I think this should do the job:
>
> static void sb_release_watch(struct watch *watch)
> {
> put_super(watch->private);
> }
>
> And this then has to be set later:
>
> init_watch_list(wlist, sb_release_watch);

(And as in the other case, the s->s_count increment will probably have
to be moved above the add_watch_to_object(), unless you hold the
sb_lock around it?)