Re: Race condition in module load causing undefined symbols

From: Steve Lord
Date: Fri Jun 10 2005 - 14:10:09 EST


Andrew Morton wrote:
Stephen Lord <lord@xxxxxxx> wrote:

I am having troubles getting any recent kernel to boot successfully
on one of my machines, a generic 2.6GHz P4 box with HT enabled
running an updated Fedora Core 3 distro. This is present in
2.6.12-rc6. It does not manifest itself with the Fedora Core
kernels which have identical initrd contents as far as the
init script and the set of modules included goes.

The problem manifests itself as various undefined symbols from
module loads.


Peculiar. Module loading is all synchronous, isn't it?


Hmm, now that I found the code, yes it is. insmod itself appears
to do no fancy foot work either.



...
The failures are different on different boots, sometimes the ata_piix
module cannot find symbols from libata, sometimes ext3 cannot find jbd
symbols, sometimes dm modules cannot find things from dm-mod, usually
it is a combination of these. End result is a panic when it cannot
find the root device.

From the behavior, it appears that a module load is returning
control to user space before the previous one has got its symbols
loaded.


I wonder if rather than the intermittency being time-based, it is
load-address-based? For example, suppose there's a bug in the symbol
lookup code?

Have you tried using a different gcc version?


Don't have one handy at the moment, I am away from the machine right
now as well. I have been updating the machine using redhat's update
tools, so the compiler should be the same one I have here:

gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.fc3)

That should also be a fairly common compiler variant.

I presume this is what redhat does their kernel builds with, so that
should be the same too. Shouldn't the memory map be pretty much
identical on each boot? Things are pretty deterministic at this
stage in the process, and the symbol match failures are not always
the same.

If this was a memory problem it seems like I would see more random
oopses than this. I added more memory to the machine a month or so
back, and had to detune the bios settings a little to make it stable.
It would be odd that a 2.6.11 kernel was rock solid and a 2.6.12-rc6
falls over so quickly if that was the case.

I can play with the init script some and maybe dump out the symbol
table after an insmod.

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/