Re: O_DIRECT question

From: Linus Torvalds
Date: Thu Jan 11 2007 - 14:02:15 EST




On Thu, 11 Jan 2007, Trond Myklebust wrote:
>
> For NFS, the main feature of interest when it comes to O_DIRECT is
> strictly uncached I/O. Replacing it with POSIX_FADV_NOREUSE won't help
> because it can't guarantee that the page will be thrown out of the page
> cache before some second process tries to read it. That is particularly
> true if some dopey third party process has mmapped the file.

You'd still be MUCH better off using the page cache, and just forcing the
IO (but _with_ all the page cache synchronization still active). Which is
trivial to do on the filesystem level, especially for something like NFS.

If you bypass the page cache, you just make that "dopey third party
process" problem worse. You now _guarantee_ that there are aliases with
different data.

Of course, with NFS, the _server_ will resolve any aliases anyway, so at
least you don't get file corruption, but you can get some really strange
things (like the write of one process actually happening before, but being
flushed _after_ and overriding the later write of the O_DIRECT process).

And sure, the filesystem can have its own alias avoidance too (by just
probing the page cache all the time), but the fundamental fact remains:
the problem is that O_DIRECT as a page-cache-bypassing mechanism is
BROKEN.

If you have issues with caching (but still have to allow it for other
things), the way to fix them is not to make uncached accesses, it's to
force the cache to be serialized. That's very fundamentally true.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/