Re: msync() behaviour broken for MS_ASYNC, revert patch?

From: Linus Torvalds
Date: Wed Mar 31 2004 - 19:09:36 EST




On Wed, 1 Apr 2004, Stephen C. Tweedie wrote:
>
> On Wed, 2004-03-31 at 23:37, Linus Torvalds wrote:
>
> > If you care about the data hitting the disk, you have to use fsync() or
> > similar _anyway_, and pretending anything else is just bogus.
>
> You can make the same argument for either implementation of MS_ASYNC.

Exactly.

Which is why I say that the implementation cannot matter, because user
space would be _buggy_ if it depended on some timing issue.

> And there's at least one way in which the "submit IO now" version can be
> used meaningfully --- if you've got several specific areas of data in
> one or more mappings that need flushed to disk, you'd be able to
> initiate IO with multiple MS_ASYNC calls and then wait for completion
> with either MS_SYNC or fsync().

Why wouldn't you be able to do that with the current one?

Tha advantage of the current MS_ASYNC is absolutely astoundingly HUGE:
because we don't wait for in-progress IO, it can be used to efficiently
synchronize multiple different areas, and then after that waiting for them
with _one_ single fsync().

In contrast, the "wait for queued IO" approach can't sanely do that,
exactly because it will wait in the middle, depending on other activity at
the same time. It will always have the worry that it happens to do the
msync() at the wrong time, and then wait synchronously when it shouldn't.

More importanrtly, the current behaviour makes certain patterns _possible_
that your suggested semantics simply cannot do efficiently. If we have
data records smaller than a page, and want to mark them dirty as they
happen, the current msync() allows that - it doesn't matter that another
datum was marked dirty just a moment ago. Then, you do one fsync() only
when you actually want to _commit_ a series of updates before you change
the index.

But if we want to have another flag, with MS_HALF_ASYNC, that's certainly
ok by me. I'm all for choice. It's just that I most definitely want the
choice of doing it the way we do it now, since I consider that to be the
_sane_ way.

> It's very much visible, just from a performance perspective, if you want
> to support "kick off this IO, I'm going to wait for the completion
> shortly."

That may well be worth a call of its own. It has nothing to do with memory
mapping, though - what you're really looking for is fasync().

And yes, I agree that _that_ would make sense. Havign some primitives to
start writeout of an area of a file would likely be a good thing.

I'd be perfectly happy with a set of file cache control operations,
including

- start writeback in [a,b]
- wait for [a,b] stable
- and maybe "punch hole in [a,b]"

Then you could use these for write() in addition to mmap(), and you can
first mark multiple regions dirty, and then do a single wait (which is
clearly more efficient than synchronously waiting for multiple regions).

But none of these have anything to do with what SuS or any other standard
says about MS_ASYNC.

> But whether that's a legal use of MS_ASYNC really depends on what the
> standard is requiring. I could be persuaded either way. Uli?

My argument was that a standard CANNOT say anything one way or the other,
because the behaviour IS NOT USER-VISIBLE! A program fundamentally cannot
care, since the only issue is a pure implementation issue of "which queue"
the data got queued onto.

Bringing in a standards body is irrelevant. It's like trying to use the
bible to determine whether protons have a positive charge.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/