Re: More on 2.1.129 oops

Philipp Rumpf (prumpf@jcsbs.lanobis.de)
Sun, 22 Nov 1998 21:31:10 +0100


On Sun, Nov 22, 1998 at 09:45:51PM +1100, Richard Gooch wrote:
> then I get an oops. Further, having
> CONFIG_BLK_DEV_IDE=y
> CONFIG_BLK_DEV_IDEDISK=n
> CONFIG_BLK_DEV_CMD640=n
> CONFIG_BLK_DEV_RZ1000=n
> CONFIG_BLK_DEV_IDEPCI=n
> CONFIG_BLK_DEV_IDEDMA=n
> CONFIG_IDEDMA_AUTO=n
>
> is still OK. Unfortunately, this information probably doesn't help

I have got a theory for that:

In do_basic_setup, we have:

kernel_thread(bdflush, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND);
kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND);

in assembly, this looks:

Register setup:
movl $0x78,%eax
xorl %edx,%edx
movl $0xc012a58c,%ecx
movl $0xf00,%ebx

Save stack pointer:
movl %esp,%esi

Clone:
int $0x80

"Exit" if stack pointer unchanged:
cmpl %esp,%esi
je c01e8b20 <do_basic_setup+40>

else call function (bdflush)
pushl %edx
call *%ecx
and exit:
movl $0x1,%eax
int $0x80

The problem we have is that this code is in the init
section and will be cleared by free_initmem(), which
is called immediately after do_basic_setup().

To ensure that the calls can be made (i.e., that
call *%ecx is not overwritten by zeroes), we must have
a guarantee that clone() will return for the child
process first.

I do not think this is the case, which will cause the
error described by Richard (look at the two Oopses and
at the pids in his initial bug report. Furthermore, this
is the reason the stack pointer looked strange to Linus)

The reason for Richard not to get the error if he compiles
the kernel with CONFIG_BLK_DEV_IDE could be that the code
in device_setup() takes much less time when this option
is disabled.

There are two workarounds that come to my mind:

1) use kernel_thread as a real function, not as an inline
function, so that it's code would not be deleted.

2) delay, use a semaphore or find some other way to notify
the parent process it's childs are now safely executing.

Philipp Rumpf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/