Re: sync_file_range(SYNC_FILE_RANGE_WRITE) blocks?

From: Pavel Machek
Date: Sun Jun 01 2008 - 07:40:04 EST


Hi!

> > > > All I can say so far is that I find the same as you do:
> > > > SYNC_FILE_RANGE_WRITE (after writing) takes a significant amount of time,
> > > > more than half as long as when you add in SYNC_FILE_RANGE_WAIT_AFTER too.
> > > >
> > > > Which make the sync_file_range call pretty pointless: your usage seems
> > > > perfectly reasonable to me, but somehow we've broken its behaviour.
> > > > I'll be investigating ...
> > >
> > > It will block on disk queue fullness - sysrq-W will tell.
> >
> > Ah, thank you. What a disappointment, though it's understandable.
> > Doesn't that very severely limit the usefulness of the system call?
>
> A bit. The request queue size is runtime tunable though.

Which /sys is that? What happens if I set the queue size to pretty
much infinity, will memory management die horribly?

> I expect major users of this system call will be applications which do
> small-sized overwrites into large files, mainly databases. That is,
> once the application developers discover its existence. I'm still
> getting expressions of wonder from people who I tell about the
> five-year-old fadvise().

Hey, you have one user now, its called s2disk. But for this call to be
useful, we'd need asynchronous variant... is there such thing?

Okay, I can fork and do the call from another process, but...

> > I admit the flag isn't called SYNC_FILE_RANGE_WRITE_WITHOUT_WAITING,
> > but I don't suppose Pavel and I are the only ones misled by it.
>
> Yup, this caveat/restriction should be in the manpage.

Michael, this is something for you I guess?

And andrew, something for you:

---

SYNC_FILE_RANGE_WRITE may and will block. Document that.

Signed-off-by: Pavel Machek <pavel@xxxxxxx>

---
commit 5db78da3d8e6fa527bfe384ded2ff7c835592fe2
tree 4c405e07be12f0a2260492fb43d19802ff7ebab1
parent 0ea376de01be797f9563c2c2464149f8f0af6329
author Pavel <pavel@xxxxxxxxxx> Sun, 01 Jun 2008 13:39:25 +0200
committer Pavel <pavel@xxxxxxxxxx> Sun, 01 Jun 2008 13:39:25 +0200

fs/sync.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 228e17b..54e9f20 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -139,7 +139,8 @@ asmlinkage long sys_fdatasync(unsigned i
* before performing the write.
*
* SYNC_FILE_RANGE_WRITE: initiate writeout of all those dirty pages in the
- * range which are not presently under writeback.
+ * range which are not presently under writeback. Notice that even this this
+ * may and will block if you attempt to write more than request queue size.
*
* SYNC_FILE_RANGE_WAIT_AFTER: wait upon writeout of all pages in the range
* after performing the write.


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/