Re: Slab corruption in 2.6.16-rc5-mm2

From: Jesper Juhl
Date: Mon Mar 06 2006 - 15:33:27 EST


On Monday 06 March 2006 21:06, Linus Torvalds wrote:
>
> On Mon, 6 Mar 2006, Linus Torvalds wrote:
> >
> > So it's either an aic7xxx bug, or it's generic SCSI.
> >
> > Considering that we've had other slab corruption issues (the reason I was
> > looking closely at yours), generic SCSI isn't out of the question.
> >
> > If you were a git user, doing a bisection run would be useful since you
> > seem to be able to recreate it at will. Oh, well. Testign that one patch
> > would still help.
>
> Hmm.. This appended patch may or may not help.
>
> It overwrites the SCSI command "req" pointer when the request has been
> done. The request cannot be used afterwards, so anybody accessing it would
> be a bug. I think.
>

With the retry code removed and your req poisoning patch on top I just got this :

Slab corruption: start=f727c5a8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f727c55c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f727c5f4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=f727c5a8, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c02934db>](sr_do_ioctl+0x11b/0x270)
000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Prev obj: start=f727c55c, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=f727c5f4, len=64
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01813ee>](free_fdtable_rcu+0x6e/0x150)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

and another, probably unrelated, thing I just noticed in my dmesg output:

initcall at 0xc0428240: init_hpet_clocksource+0x0/0x90(): returned with error code -19


> HOWEVER. I noticed something else strange. Your slab corruption report
> says
>
> Slab corruption: start=f72948a0, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> ...
>
> and the scary thing is that "len=64".
>
> The thing is, SCSI uses "SCSI_SENSE_BUFFERSIZE" to determine the maximum
> sense size to copy, and what do we have, if not
>
> include/scsi/scsi_cmnd.h:#define SCSI_SENSE_BUFFERSIZE 96
>
> ie a 64-byte buffer is simply TOO DAMN SMALL!
>
> Now, the thing is, the 64 comes from "sizeof(struct request_sense)", which
> is what "struct packet_command *" uses. We can change that sizeof() to
> just use SCSI_SENSE_BUFFERSIZE, but that still makes me worry about

Building a kernel with that change on top of the other ones atm.


/ Jesper

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/