Re: [3.8-rc3 -> 3.8-rc4 regression] Re: [PATCH] module, async:async_synchronize_full() on module init iff async is used

From: Josh Hunt
Date: Wed Dec 04 2013 - 18:02:00 EST


On Tue, Dec 3, 2013 at 9:19 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Tue, Dec 03, 2013 at 08:28:43AM -0600, Josh Hunt wrote:
>> You're right. Thanks for pointing this out. I did not realize there
>> was a bug in the init script. The version of initramfs-tools I was
>> using had the following bug:
>> https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911
>>
>> Updating to 0.99ubuntu13.4 of initramfs-tools resolved my boot hangs.
>>
>> I did try using the workaround as suggested by Linus. In my setup the
>> dm_init() code was hit, however it still appeared to be too late at
>> times. I also tried moving the call to async_synchronize_full() above
>> the for loop and it still had the same issue (patch attached.) Out of
>> around 10 reboot tests it failed to find root 1 or 2 times.
>>
>> The ubuntu scripts don't ever actually call do_mount() if it can't
>> find the device. It seems to rely on some udev functionality to tell
>> it when the device is present, and if that fails it just bails out.
>>
>> This change has introduced a regression. However, I only noticed it
>> b/c my init script had a bug which caused it not to wait around for
>> the device to appear.
>
> Hmmm.... so, read the bug report, digged and asked around a bit.
> Here's the root problem - ubuntu's initramfs uses a tool to wait for
> the root device which uses libudev to listen for the device event;
> unfortunately, its rx buffer is not set large enough and the receiver
> isn't fast enough, which means that netlink broadcast messages from
> the kernel can overrun the buffer. When that happens, it sets an
> error on the socket, so the next recv fails with -ENOBUFS. If that
> happens, the wait for root aborts immediately and initramfs proceeds
> to mount non-existent root device.
>
> The only thing which changes by these patches is the timing of events.
> The problem likely wasn't as exposed before because things were slow
> enough so that either the messages could be consumed fast enough or
> there's enough delay between libata module load and the root device
> wait hiding the bug in the wait logic.
>
> So, yeah, it's a full blown timing bug. I'm not sure what we can do
> to work around from kernel side except for randomly slowing things
> down or forcefully enlarging rx buffer size. There really is no
> interlocking to take advantage of. :(

So there used to be a call to async_synchronize_full() in
ata_host_register(), but it was removed by
f29d3b23238e1955a8094e038c72546e99308e61 as part of some fastboot
changes. Adding it back (in the attached patch) seems to resolve the
issue when using the broken initrd. I'm guessing adding it back isn't
an option, but I wanted to point it out.

--
Josh
Index: b/drivers/ata/libata-core.c
===================================================================
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6181,12 +6181,14 @@ int ata_host_register(struct ata_host *h
/* perform each probe asynchronously */
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap = host->ports[i];
async_schedule(async_port_probe, ap);
}

+ async_synchronize_full();
+
return 0;

err_tadd:
while (--i >= 0) {
ata_tport_delete(host->ports[i]);
}