Re: O_DIRECT fails in some kernel and FS

From: Chris Mason (mason@suse.com)
Date: Mon Feb 04 2002 - 10:31:06 EST


On Monday, February 04, 2002 10:13:37 AM -0500 Jeff Garzik <jgarzik@mandrakesoft.com> wrote:

> Andi Kleen wrote:
>>
>> Jeff Garzik <garzik@havoc.gtf.org> writes:
>>
>> > On Sat, Feb 02, 2002 at 02:16:41PM -0600, Stephen Lord wrote:
>> > > Can't you fall back to buffered I/O for the tail? OK it complicates the
>> > > code, probably a lot, but it keeps things sane from the user's point of
>> > > view.
>> >
>> > For O_DIRECT, IMHO you should fail not fallback. You're simply lying
>> > to the underlying program otherwise.
>>
>> It's just impossible to write a tail which is smaller than a disk block
>> without another buffer.
>
> I argue, for reiserfs:
>
> For O_DIRECT writes, the preferred behavior is to write disk blocks
> obtained through the normal methods (get_block, etc.), and fully support
> inodes for which file tails do not exist.

Done ;-)

>
> For O_DIRECT reads, if the data is determined to be in a file tail,
> ->direct_IO should either (a) fail or (b) dump the file tail to a normal
> disk block before performing ->direct_IO.

The current patch does A. Another option is to change the reiserfs open
code to detect the tail and do an -EINVAL for o_direct. This gives the
application a better way to fall back to normal open methods than returning
an error during the read.

Just to restate, the current O_DIRECT code can never hit a reiserfs tail in
the normal case. By definition, reiserfs tails are not block aligned, and
O_DIRECT writes are. The only time it is a concern is with a screwy
interaction between expanding truncates and tails on kernels < 2.4.17.
Since most O_DIRECT users are databases, and tails are never created on
files > 16k in size, I don't expect anyone to ever see the reiserfs
triggered -EINVAL from the current patch (famous last words).

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Feb 07 2002 - 21:00:34 EST