Re: 2.6.9-rc2 bio sickness with large writes
From: Jeff V. Merkey
Date: Mon Sep 20 2004 - 14:59:15 EST
Jens Axboe wrote:
page and offset sematics in the interface are also somewhat burdensome.
Wouldn't a more reasonable
interface for async IO be:
address
length
address
length
rather than
page structure
offset in page structure
page structure
offset in page structure
No, because { address, length } cannot fully describe all memory in any
given machine.
This response I don't understand. How memory is described in a machine
for DMA addressibility
is pretty standard (with the exception of memory on intel machine in 32
bit systems above 4GBthat need page tables) --
a physical numerical address. But who is going to DMA into memory not in
the address space.
Any chunk of memory has a page associated with it, but it may not have a
kernel address mapping associated with it. So some identifier was needed
other than a virtual address, a page is as good as any so making one up
would be silly.
Once you understand this, it doesn't seem so odd. You need to pass in a
single page or sg table to map for dma anyways, the sg table holds page
pointers as well.
I can assume from the interface as designed that if you pass an offset
for a page that is not page aligned,
and ask for a 4K write, then you will end up dropping the data on the
floor than spans beyond the end of the page.
What kind of bogus example is that? Asking for a 4K write from a 4K page
but asking to start 1K in that page is just stupid and not even remotely
valid.
Hardware doesn't care about page boundries. It sees hardware addresses
and lengths, at
least most SG hardware I've worked with does. For ease of submission, an
interface that
takes <address,length> would suffice. Why on earth would someone need a
context
pointer into the kernel's page tables to submit an SG into a device,
apart from performing
virtual-to-physical translation?
It's not difficult at all. Apparently you don't understand it so you
think it's difficult, that's only natural. But you have access to the
page mapping of any given piece of data always, or if you have the
virtual address only it's trivial to go to the { page, offset } mapping.
No, I do understand, and using page/offset at a low level SG interface
IS burdensome.
I mean, if this is what I have to support I guess I have to use it, but
it will be just another
section of code where I have another **FAT** layer to waste more CPU
cycles calculating
offset/page (oh yeah I have to lookup the struct page * structure also)
when it would be much
simpler to just submit address/len in i386 systems. With this type of
interface, If I have for instance
an on-disk structure that starts in the middle of a 4K page due to other
headers, etc. than spans
a page, I cannot just submit the address and length, I have to break it
into two bio requests instead
of one with a for () loop from hell and calculate the offsets and rumage
around in memory looking
up struct page * addresses.
I can only imagine that you are used to a very different interface on
some other OS so you think it's difficult to use. Most of your
complaints seem to be based on false assumptions or because you don't
understand why certain design decisions were made.
No. I am used to programming to hardware with SG devices that all OS
use. Is there somewhere a page based
SG device (other than SCI) for disk drive?. I don't think so, I think
they operate address/len, address/len, etc.
:-)
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/