Re: O_DIRECT question

From: Nick Piggin
Date: Thu Jan 11 2007 - 01:10:44 EST


Linus Torvalds wrote:

On Wed, 10 Jan 2007, Linus Torvalds wrote:

So don't use O_DIRECT. Use things like madvise() and posix_fadvise() instead.


Side note: the only reason O_DIRECT exists is because database people are too used to it, because other OS's haven't had enough taste to tell them to do it right, so they've historically hacked their OS to get out of the way.

As a result, our madvise and/or posix_fadvise interfaces may not be all that strong, because people sadly don't use them that much. It's a sad example of a totally broken interface (O_DIRECT) resulting in better interfaces not getting used, and then not getting as much development effort put into them.

So O_DIRECT not only is a total disaster from a design standpoint (just look at all the crap it results in), it also indirectly has hurt better interfaces. For example, POSIX_FADV_NOREUSE (which _could_ be a useful and clean interface to make sure we don't pollute memory unnecessarily with cached pages after they are all done) ends up being a no-op ;/

Sad. And it's one of those self-fulfilling prophecies. Still, I hope some day we can just rip the damn disaster out.

Speaking of which, why did we obsolete raw devices? And/or why not just
go with a minimal O_DIRECT on block device support? Not a rhetorical
question -- I wasn't involved in the discussions when they happened, so
I would be interested.

O_DIRECT is still crazily racy versus pagecache operations. Chris Mason's
recent patches to attempt to fix it, while actually doing quite a fine
job, are very intrusive and complex for such a sad corner case.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/