Re: Scatter-gather list constraints

From: Jens Axboe
Date: Thu Jun 26 2008 - 08:54:55 EST


On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> On Thu, 26 Jun 2008 08:35:59 +0200
> Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
>
> > On Thu, Jun 26 2008, FUJITA Tomonori wrote:
> > > On Thu, 26 Jun 2008 11:06:03 +0900
> > > FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote:
> > >
> > > > On Wed, 25 Jun 2008 10:23:00 -0400 (EDT)
> > > > Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > > On Wed, 25 Jun 2008, FUJITA Tomonori wrote:
> > > > >
> > > > > > > For example, suppose an I/O request starts out with two S-G elements
> > > > > > > of 1536 bytes and 2048 bytes respectively, and the DMA requirement is
> > > > > > > that all elements except the last must have length divisible by 1024.
> > > > > > > Then the request could be broken up into three requests of 1024, 512,
> > > > > > > and 2048 bytes.
> > > > > >
> > > > > > I can't say that it's easy to implement a clean mechanism to break up
> > > > > > a request into multiple requests until I see a patch.
> > > > >
> > > > > And I can't write a patch without learning a lot more about how the
> > > > > block core works.
> > > > >
> > > > > > What I said is that you think that this is about extending something
> > > > > > in the block layer but it's about adding a new concept to the block
> > > > > > layer.
> > > > >
> > > > > Is it? What does the block layer do when it receives an I/O request
> > > > > that don't satisfy the other constraints (max_sectors or
> > > > > dma_alignment_mask, for example)?
> > > >
> > > > As I explained, you need something new.
> > > >
> > > > I don't think that max_sectors works as you expect.
> > >
> > > The block layer looks at max_sectors when merging two things (or add
> > > one to another). So the test fails, it doesn't merge them.
> > >
> > >
> > > > dma_alignment_mask is not used in the FS path. And I think that
> > > > dma_alignment_mask doens't solve your problems.
> > >
> > > If dma_alignment_mask test fails, the block layer allocates temporary
> > > buffers and does memory copies.
> >
> > I don't think adding anything in the general IO path makes a lot of
> > sense, this is a really screwy case. I don't mind adding work-arounds to
> > the block layer to cater for hardware weirdness, but this is getting a
> > little silly. We could provide a helper function for 'bouncing' this
> > request and thus reuse the block bounce buffer for this, but I'm not
> > even sure how to simply express this generically. As it is likely of no
> > use outside of this specific case, putting it in the driver (or usb
> > layer, if you expect more of these similar cases) is the best option.
>
> Yeah, agreed, as I wrote in the first mail:
>
> http://marc.info/?l=linux-kernel&m=121430416329618&w=2
>
> I guess that a generic mechanism reserving some buffers in the block
> layer might work for them. I also need such a mechnism to convert sg
> and st to use the block layer (yeah, it's overdue but still on my todo
> list).

On the fs side, just setting a hw block size of 1k should fix the
problem, since that'd be your minimum transfer size AND alignment there
even for O_DIRECT IO.

So that leaves SG_IO (and similar) issued IO, which are typically really
small (and thus not an issue, since it'll be a single sg element). For
the bigger ones, sg elements should be tightly packed (eg page size)
except the last one.

Alan, in what specific cases have you observed IO requests that violate
the rules you gave? The example of:

"For example, suppose an I/O request starts out with two S-G elements of
1536 bytes and 2048 bytes respectively, and the DMA requirement is"

really sounds concocted, have you ever seen something like that?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/