Re: Memory corruption due to word sharing

From: Jiri Kosina
Date: Wed Feb 01 2012 - 12:12:10 EST

On Wed, 1 Feb 2012, Linus Torvalds wrote:

> But the compiler turns the access to the bitfield (in a 32-bit aligned
> word) into a 64-bit access that accesses the word *next* to it.
> That word next to it might *be* the lock, for example.
> So we could literally have this kind of situation:
> struct {
> atomic_t counter;
> unsigned int val:4, other:4, data:24;
> };
> and if we write code like this:
> spin_lock(&somelock);
> s->data++;
> spin_unlock(&somelock);
> and on another CPU we might do
> atomic_inc(&counter);
> and the access to the bitfield will *corrupt* the atomic counter, even
> though both of them are perfectly fine!
> Quite frankly, if the bug is simply because gcc doesn't actually know
> or care about the underlying size of the bitmask, it is possible that
> we can find a case where gcc clearly is buggy even according to the
> original C rules.
> Honza - since you have access to the compiler in question, try
> compiling this trivial test-program:
> struct example {
> volatile int a;
> int b:1;
> };
> ..
> s->b = 1;
> ..
> and if that bitfield access actually does a 64-bit access that also
> touches 's->a', then dammit, that's a clear violation of even the
> *old* C standard, and the gcc people cannot just wave away their bugs
> by saying "we've got standads, pttthththt".
> And I suspect it really is a generic bug that can be shown even with
> the above trivial example.

I have actually tried exactly this earlier today (because while looking at
this, I had an idea that putting volatile in place could be a workaround,
causing gcc to generate a saner code), but it doesn't work either:

# cat x.c
struct x {
long a;
volatile unsigned int lock;
unsigned int full:1;

wrong(struct x *ptr)
ptr->full = 1;

int main()
# gcc -O2 x.c
# gdb -q ./a.out
Reading symbols from /root/a.out...done.
(gdb) disassemble wrong
Dump of assembler code for function wrong:
0x40000000000005c0 <+0>: [MMI] adds r32=8,r32
0x40000000000005c1 <+1>: nop.m 0x0
0x40000000000005c2 <+2>: mov r15=1;;
0x40000000000005d0 <+16>: [MMI] ld8 r14=[r32];;
0x40000000000005d1 <+17>: nop.m 0x0
0x40000000000005d2 <+18>: dep r14=r15,r14,32,1;;
0x40000000000005e0 <+32>: [MIB] st8 [r32]=r14
0x40000000000005e1 <+33>: nop.i 0x0
0x40000000000005e2 <+34>: br.ret.sptk.many b0;;

In my opinion, this is a clear bug in gcc (while the original problem,
without explitict volatile, is not a C spec violation per se, it's just
very inconvenient :) ).

Jiri Kosina
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at