Re: O_DIRECT question

From: Phillip Susi
Date: Thu Jan 11 2007 - 10:53:50 EST


Viktor wrote:
OK, madvise() used with mmap'ed file allows to have reads from a file
with zero-copy between kernel/user buffers and don't pollute cache
memory unnecessarily. But how about writes? How is to do zero-copy
writes to a file and don't pollute cache memory without using O_DIRECT?
Do I miss the appropriate interface?

Not only that but mmap/madvise do not allow you to perform async io. You still have to just try and touch the memory and take the page fault to cause a read. This is unacceptable to an application that is trying to manage multiple IO streams and keep the pipelines full; it needs to block only when there is no more work it can do.

Even with only a single io stream the application needs to keep one side reading and one side writing. To make this work without O_DIRECT would require a method to asynchronously fault pages in, and the kernel would have to utilize that method when processing aio writes so as not to block the calling process if the buffer it asked to write is not in memory.

I was rather disappointed years ago by NT's lack of an async page fault path, which prevented me from developing an ftp server that could do zero copy IO, but still use the filesystem cache and avoid NT's version of O_DIRECT.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/