Re: Linux 2.6.29

From: Linus Torvalds
Date: Thu Apr 02 2009 - 18:49:19 EST




On Thu, 2 Apr 2009, Jeff Garzik wrote:
>
> Dumb VM question, then: I understand the logic behind the write-throttling
> part (some of my own userland code does something similar), but,
>
> Does this imply adding fadvise to your overwrite.c example is (a) not
> noticable, (b) potentially less efficient, (c) potentially more efficient?

For _that_ particular load it was more of a "it wasn't the issue". I
wanted to get timely writeouts, because otherwise they bunch up and become
unmanageable (with even the people who are not actually writing end up
waiting for the writeouts).

Once the pages are clean, it just didn't matter. The VM did the balancing
right enough that I stopped caring. With other access patterns (ie if the
pages ended up on the active list) the situation might have been
different.

> Or IOW, does fadvise purely put pages on the cold list as your
> sync_file_range incantation does, or something different?

sync_file_range() doesn't actually put the pages on the inactive list, but
since the program was just a streaming one, they never even left it.

But no, fadvise actually tries to actually invalidate the pages (ie gets
rid of them, as opposed to moving them to the inactive list).

Another note: I literally used that program just for whole-disk testing,
so the behavior on an actual filesystem may or may not match. But I just
tested on ext3 on my desktop, and got

1.734 GB written in 30.38 (58 MB/s)

until I ^C'd it, and I didn't have any sound skipping or anything like
that. Of course, that's with those nice Intel SSD's, so that doesn't
really say anything.

Feel free to give it a try. It _should_ maintain good write speed while
not disturbing the system much. But I bet if you added the "fadvise()" it
would disturb things even _less_.

My only point is really that you _can_ do streaming writes well, but at
the same time I do think the kernel makes it too hard to do it with
"simple" applications. I'd love to get the same kind of high-speed
streaming behavior by just doing a simple "dd if=/dev/zero of=bigfile"

And I really think we should be able to.

And no, we clearly are _not_ able to do that now. I just tried with "dd",
and created a 1.7G file that way, and it was stuttering - even with my
nice SSD setup. I'm in my MUA writing this email (obviously), and in the
middle it just totally hung for about half a minute - because it was
obviously doing some fsync() for temporary saving etc while the "sync" was
going on.

With the "overwrite.c" thing, I do get short pauses when my MUA does
something, but they are not the kind of "oops, everything hung for several
seconds" kind.

(Full disclosure: 'alpine' with the local mbox on one disk - I _think_
that what alpine does is fsync() temporary save-files, but it might also
be checking email in the background - I have not looked at _why_ alpine
does an fsync, but it definitely does. And 5+ second delays are very
annoying when writing emails - much less half a minute).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/