Re: [3.8-rc3 -> 3.8-rc4 regression] Re: [PATCH] module, async:async_synchronize_full() on module init iff async is used

From: Tejun Heo
Date: Tue Dec 03 2013 - 10:20:02 EST


Hello,

On Tue, Dec 03, 2013 at 08:28:43AM -0600, Josh Hunt wrote:
> You're right. Thanks for pointing this out. I did not realize there
> was a bug in the init script. The version of initramfs-tools I was
> using had the following bug:
> https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911
>
> Updating to 0.99ubuntu13.4 of initramfs-tools resolved my boot hangs.
>
> I did try using the workaround as suggested by Linus. In my setup the
> dm_init() code was hit, however it still appeared to be too late at
> times. I also tried moving the call to async_synchronize_full() above
> the for loop and it still had the same issue (patch attached.) Out of
> around 10 reboot tests it failed to find root 1 or 2 times.
>
> The ubuntu scripts don't ever actually call do_mount() if it can't
> find the device. It seems to rely on some udev functionality to tell
> it when the device is present, and if that fails it just bails out.
>
> This change has introduced a regression. However, I only noticed it
> b/c my init script had a bug which caused it not to wait around for
> the device to appear.

Hmmm.... so, read the bug report, digged and asked around a bit.
Here's the root problem - ubuntu's initramfs uses a tool to wait for
the root device which uses libudev to listen for the device event;
unfortunately, its rx buffer is not set large enough and the receiver
isn't fast enough, which means that netlink broadcast messages from
the kernel can overrun the buffer. When that happens, it sets an
error on the socket, so the next recv fails with -ENOBUFS. If that
happens, the wait for root aborts immediately and initramfs proceeds
to mount non-existent root device.

The only thing which changes by these patches is the timing of events.
The problem likely wasn't as exposed before because things were slow
enough so that either the messages could be consumed fast enough or
there's enough delay between libata module load and the root device
wait hiding the bug in the wait logic.

So, yeah, it's a full blown timing bug. I'm not sure what we can do
to work around from kernel side except for randomly slowing things
down or forcefully enlarging rx buffer size. There really is no
interlocking to take advantage of. :(

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/