Re: New 2.6.24.2 SG_IO SCSI problems

From: Mike Christie
Date: Fri Feb 22 2008 - 17:26:57 EST


Mark Hounschell wrote:
Mark Hounschell wrote:
Mike Christie wrote:
Mike Christie wrote:
Mark Hounschell wrote:
I seem to have run into some sort of regression in the SG_IO
interface of 2.6.24.2. I have an application that up until 2.6.24
worked fine. The 2.6.23.16 kernel works fine.

During reads I get these kernel messages. Writes and other functions
_seem_ OK. Actually basic
reads are working. Its with large BC reads using an io_vec list that
the problem shows up.

Are you doing SG_IO to the sg device (/dev/sg*) or to the block device
(/dev/sdX)?
If you are doing SG_IO to the sg device, then I know of one regression
(well not regression exactly, but I fixed a bug but the patch got
partially overwritten by another patch and that caused a new bug). Both
bugs are fixed in 2.6.25-rc2. Could you try that out if you are doing
SG_IO to the sg device.

Yes, I'm using /dev/sg*. And yes again I'll checkout 2.6.25-rc2 ASIC.

Thanks
Mark
-
2.6.25-rc2 does fix the problem I'm having. I don't suppose there is a patch
lying around for 2.6.24.2??


I attached a backport of the patch from Tony (added as cc) that is in 2.6.25-rc2. Could you try it out against 2.6.24.2 just to make sure it was this patch, then we can send it to stable. Backport
76d78300a6eb8b7f08e47703b7e68a659ffc2053
to 2.6.24

>From Tony Battersby:

When sending a SCSI command to a tape drive via the SCSI Generic (sg)
driver, if the command has a data transfer length more than
scatter_elem_sz (32 KB default) and not a multiple of 512, then I either
hit BUG_ON(!valid_dma_direction(direction)) in dma_unmap_sg() or else
the command never completes (depending on the LLDD).

When constructing scatterlists, the sg driver rounds up the scatterlist
element sizes to be a multiple of 512. This can result in
sum(scatterlist lengths) > bufflen. In this case, scsi_req_map_sg()
incorrectly sets bio->bi_size to sum(scatterlist lengths) rather than to
bufflen. When the command completes, req_bio_endio() detects that
bio->bi_size != 0, and so it doesn't call bio_endio(). This causes the
command to be resubmitted, resulting in BUG_ON or the command never
completing.

This patch makes scsi_req_map_sg() set bio->bi_size to bufflen rather
than to sum(scatterlist lengths), which fixes the problem.

Signed-off-by: Mike Christie <michaelc@xxxxxxxxxxx>

--- linux-2.6.24.2/drivers/scsi/scsi_lib.c 2008-02-10 23:51:11.000000000 -0600
+++ linux-2.6.24.2.work/drivers/scsi/scsi_lib.c 2008-02-22 16:20:09.000000000 -0600
@@ -298,7 +298,6 @@ static int scsi_req_map_sg(struct reques
page = sg_page(sg);
off = sg->offset;
len = sg->length;
- data_len += len;

while (len > 0 && data_len > 0) {
/*