Re: 3.4.0-02580-g72c04af regression on sparc64 - partitions notrecognized

From: James Bottomley
Date: Wed May 23 2012 - 18:56:55 EST


On Wed, 2012-05-23 at 14:04 -0400, David Miller wrote:
> From: Meelis Roos <mroos@xxxxxxxx>
> Date: Wed, 23 May 2012 19:46:46 +0300 (EEST)
>
> CC:'ing interested parties.
>
> >> > Just tested 3.4.0-02580-g72c04af on about 10 machines. While most of
> >> > them work (including 3 different sparc64 machines with real scsi disks),
> >> > Sun Netra X1 with pata_ali and IDE disk consistently fails to boot. sda
> >> > is recognized but no partitions. 3.3.0 works fine, as did something
> >> > around 3.4-rc7 (plain 3.4 not tested yet). No other IDE machines tested
> >> > yet since I have none with remote console at the moment.
> >>
> >> If 3.4.0-final is OK, start bisecting from v3.4.0 until 72c04af. One
> >> possibility could be the sparc64 NOBOOTMEM conversion that went into
> >> the merge window.
> >
> > Bisecting leads to this commit:
> >
> > a7a20d103994fd760766e6c9d494daa569cbfe06 is the first bad commit
> > commit a7a20d103994fd760766e6c9d494daa569cbfe06
> > Author: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Date: Thu Mar 22 17:05:11 2012 -0700
> >
> > [SCSI] sd: limit the scope of the async probe domain

My theory is that this is an init problem: The assumption in a lot of
our code is that async_synchronize_full() waits for everything ... even
the domain specific async schedules, which isn't true.

The code in init that makes this assumption is wait_for_device_probe().
There's also a fun async_synchronize_full() in init_post() that assumes
it can free the init memory after, which would fail badly if anything in
init used an async domain.

So either we fix the assumptions or we can't use domain specific async
schedules.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/