Re: sata_sil24 broken since 2.6.23-rc4-mm1
From: Tejun Heo
Date: Sun Sep 30 2007 - 13:41:40 EST
Torsten Kaiser wrote:
> That boot ended in a minimal initrd environment that normally only
> starts the RAID5 and then opens contained encrypted real root.
> I was just able to push the output from dmesg through the serial link,
> but had no man pages to tell me about -s ...
> And that kind of error was until now a one-of-a-kind one. All other
> errors where not "internal error" but "timeout".
> But one time I had another SGT related error:
> Sep 11 19:19:24 treogen [ 33.340000] ata1.00: exception Emask 0x20
> SAct 0x1 SErr 0x0 action 0x2
> Sep 11 19:19:24 treogen [ 33.340000] ata1.00: irq_stat 0x00020002,
> PCI master abort while fetching SGT
> Sep 11 19:19:24 treogen [ 33.340000] ata1.00: cmd
> 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
> Sep 11 19:19:24 treogen [ 33.340000] res
> 50/00:00:af:ea:42/00:00:25:00:00/e0 Emask 0x20 (host bus error)
> What I find kind of interessing is, that while I got three different
> error codes the cmd part of the output was always the same.
That's NCQ write command. You'll be using it a lot if you're rebuilding
md5. It seems like something is going wrong with request DMA or sg
mapping. Maybe some change in block/*.[hc]?
> It's not just 2.6.23-rc4-mm1. All -mm's after rc4 are broken for me.
> Confirmed breakage on -rc4-mm1, -rc6-mm1 and -rc8-mm1. I'm just
> narrowing on rc4-mm1 because that was the first version to break.
> I'm currently trying to bisect 2.6.23-rc4-mm1. Here is the current status:
Have you tested 2.6.23-rc4 without mm patches? It could be something
introduced between -rc3 and 4.
> [the 2.6.23-rc4-mm1 series-file has 2013 lines]
> Up to (incl.) x86_64-convert-to-clockevents.patch (line 747): 2 good boots
> Up to (incl.) x86_64-cleanup-struct-irqaction-initializers-patch
> (line779): 2 good boots
> Up to (incl.) slub-optimize-cacheline-use-for-zeroing.patch (line
> 1045): 1 failed
> Up to (incl.) fix-discrepancy-between-vdso-based... (line1461): 1 good, 1 failed
> Next try: up to patch fs-remove-some-aop_truncated_page.patch
> That means from the patches added to the rc4 variant of the mm-kernel
> the following are remaining:
> But due to the unreliable nature of the bug, I can't be to sure about that.
Yeah, that's what I'm worried about. Bisection is extremely difficult
if errors are intermittent and takes long time to reproduce.
> Next version is compiled, now again switching the PC off for an hour...
Thanks a lot. Much appreciated.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/