Re: sata_sil24 broken since 2.6.23-rc4-mm1

From: Torsten Kaiser
Date: Thu Sep 27 2007 - 13:34:47 EST


On 9/27/07, Jeff Garzik <jeff@xxxxxxxxxx> wrote:
> Torsten Kaiser wrote:
> > I compared the dmesg form good and bad boots with -rc7-mm1 but could
> > not see any difference, so do you think that these additional
> > diagnostics could show a difference?
> > Or could you suggest any other debugging options I should try?
>
> I think since its a reproducible problem, I think it's easiest to get
> you straight to git-bisect. In this case, that would be

Sorry, but I don't think that will work.
It seems that I am able to reproduce the bug, but not reliable. And my
current best guess to make it happen involves the step "leaf the
computer powered off for 8 hours".
I estimate that even with the 8 hour pause only at ~50% of the boots
one drive fails. So I have no safe point to mark a bisect step as
'good'.

> a) start with a known good point (v2.6.22? v2.6.23?) and
> known bad point (HEAD, aka the most recent commit in
> libata-dev.git#upstream)

Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1.
I will try to look at the diff between these revisions some more, but
the change in sata_sil24.c looked like a perfect match for the
symptoms I was seeing.

What I just noticed, as I wanted two re-add the drive to the RAID:
This time it was not sda, but sdb that was kicked. But otherwise the
errors are perfectly identical.

I will try to make a 2.6.23-rc3.5-mm1 to narrow it down some more...

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/