Re: [patch] uninline init_waitqueue_*() functions

From: Linus Torvalds
Date: Wed Jul 05 2006 - 19:12:14 EST




On Wed, 5 Jul 2006, Linus Torvalds wrote:
> >
> > c0fb6137: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
> > c0fb613e: 00
> > c0fb613f: c7 44 24 08 01 00 00 movl $0x1,0x8(%esp)
> > c0fb6146: 00
> > c0fb6147: c7 43 60 00 00 00 00 movl $0x0,0x60(%ebx)
> > c0fb614e: 8b 44 24 08 mov 0x8(%esp),%eax
> > c0fb6152: 89 43 5c mov %eax,0x5c(%ebx)

Btw, this is even worse than usual.

I've seen the "create struct on the stack and copy it" before, but looking
closer, this is doing something I haven't noticed before: that double
assignment to the stack is _really_ strange.

I wonder why it first stores a zero to the stack location, and then
overwrites it with the proper spinlock initialized (one).

As far as I can tell, the spinlock initializer should really end up being
a perfectly normal "(struct spinlock_t) { 1 }", and I don't see where the
zero comes from.

It may actually be that we're double penalized because for the lock
"counter", we use a "volatile unsigned int", and that "0" is from some
internal gcc "initialize all base structures to zero" to make sure that
any padding gets zeroed, and then the "volatile" means that gcc ends up
not optimizing it away, even though it was a totally bogus store that
didn't even exist in the sources.

I wonder if we should remove the "volatile". There really isn't anything
_good_ that gcc can do with it, but we've seen gcc code generation do
stupid things before just because "volatile" seems to just disable even
proper normal working.

[ Test-case built as time passes ]

Try compiling this example file with "-fomit-frame-pointer -O2 -S" to see
the effect:

horrid:
subl $16, %esp
movl $1, 12(%esp)
movl 12(%esp), %eax
movl %eax, a1
movl $1, b1
addl $16, %esp
ret

notbad:
movl $1, a2
movl $1, b2
ret


I really think that "volatile" in the kernel sources is _always_ a kernel
bug. It certainly seems to be so in this case.

(But at least with the compiler version I'm using, I'm not seeing that
extra unnecessary "movl $0" - I have gcc version 4.1.1 here)

Does removing just the "volatile" on the "slock" member shrink the size of
the kernel noticeably? And do you see an extra "movl $0" in this case with
your compiler version?

Maybe removing the volatile allows us to keep the initializer the way it
is (although going by past behaviour, I've seen gcc generate horrible code
in more complicated settings, so maybe the reason it works well on this
case without the "volatile" is just that it was simple enough that gcc
wouldn't mess up?)

Linus
---
struct duh {
volatile int i;
};

void horrid(void)
{
extern struct duh a1;
extern struct duh b1;

a1 = (struct duh) { 1 };
b1.i = 1;
}

struct gaah {
int i;
};

void notbad(void)
{
extern struct gaah a2;
extern struct gaah b2;

a2 = (struct gaah) { 1 };
b2.i = 1;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/