I am seing a really strange problem with one of my drivers (the AceNIC
Gigabit Ethernet one) when I compile it statically into the kernel using
2.2.0pre4 and gcc-2.7.2.3 as it comes with Red Hat 5.2. When compiled as
a module the code seems to be fine.
It looks to me as it is related to the use of __initfunc() and I am
wondering if we have a generic __initfunc() problem on the x86 .... if
it hits me I would think it could easily hit other people. Question is
whether we need to do something drastical like disabling __initfunc on
the x86 for gcc 2.7.2.3 and older for 2.2?
The code in question looks like this:
__initfunc(void ace_copy(struct ace_regs *regs, void *src, u32 dest, int size))
{
int tsize;
u32 tdest;
if (size <= 0)
return;
while(size > 0){
tsize = min(((~dest & (ACE_WINDOW_SIZE - 1)) + 1),
min(size, ACE_WINDOW_SIZE));
tdest = dest & (ACE_WINDOW_SIZE - 1);
regs->WinBase = dest & ~(ACE_WINDOW_SIZE - 1);
If I remove the __initfunc() from it the generated code looks like this
which works fine:
0x179c <ace_copy>: pushl %ebp
0x179d <ace_copy+1>: pushl %edi
0x179e <ace_copy+2>: pushl %esi
0x179f <ace_copy+3>: pushl %ebx
0x17a0 <ace_copy+4>: movl 0x1c(%esp,1),%ebp
0x17a4 <ace_copy+8>: cmpl $0x0,0x20(%esp,1)
0x17a9 <ace_copy+13>: jle 0x1816 <ace_copy+122>
0x17ab <ace_copy+15>: nop
0x17ac <ace_copy+16>: movl %ebp,%eax
0x17ae <ace_copy+18>: notl %eax
0x17b0 <ace_copy+20>: andl $0x7ff,%eax
0x17b5 <ace_copy+25>: incl %eax
0x17b6 <ace_copy+26>: movl 0x20(%esp,1),%ecx
0x17ba <ace_copy+30>: cmpl $0x800,%ecx
0x17c0 <ace_copy+36>: jbe 0x17c7 <ace_copy+43>
With the __initfunc() the code looks like this which has a certain
similarity with random garbage in some places:
0x35368 <ace_copy>: pushl %ebp
0x35369 <ace_copy+1>: pushl %edi
0x3536a <ace_copy+2>: pushl %esi
0x3536b <ace_copy+3>: pushl %ebx
0x3536c <ace_copy+4>: movl 0x1c(%esp,1),%ebp
0x35370 <ace_copy+8>: cmpl $0x0,0x20(%esp,1)
0x35375 <ace_copy+13>: jle 0x353e2 <ace_copy+122>
0x35377 <ace_copy+15>: addb %cl,0x25d0f7e8(%ecx) <------- !!!
0x3537d <ace_copy+21>: incl (%edi)
0x3537f <ace_copy+23>: addb %al,(%eax)
0x35381 <ace_copy+25>: incl %eax
0x35382 <ace_copy+26>: movl 0x20(%esp,1),%ecx
0x35386 <ace_copy+30>: cmpl $0x800,%ecx
0x3538c <ace_copy+36>: jbe 0x35393 <ace_copy+43>
It looks to me that it could be related to branch alignment or something
thus when put in the init section, the padding code (nop) after a branch
is not generated properly.
I would be happy to find that this is a bug in my particular code - but
I can't see how it would cause it. Oh and I've seen this on several
machines, I am pretty sure it is not bad memory on this particular box.
Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/