Re: sync_file_range(SYNC_FILE_RANGE_WRITE) blocks?

From: Pavel Machek
Date: Sun Jun 01 2008 - 18:21:25 EST


Hi!

> > > I expect major users of this system call will be applications which do
> > > small-sized overwrites into large files, mainly databases. That is,
> > > once the application developers discover its existence. I'm still
> > > getting expressions of wonder from people who I tell about the
> > > five-year-old fadvise().
> >
> > Hey, you have one user now, its called s2disk. But for this call to be
> > useful, we'd need asynchronous variant... is there such thing?
>
> Well if you're asking the syscall to shove more data into the block
> layer than it can concurrently handle, sure, the block layer will
> block. It's tunable...

No, no, I don't want to overload block layer. All I want is ...

> > Okay, I can fork and do the call from another process, but...
>
> I sense a strangeness. What are you actually trying to do with all of this?

Okay, so I have around 400MB of data, I want it compressed, optionally
encrypted and written to partition.

Now, if I do it "naturally", I do writes, followed by fsync.

That's bad, because kernel does not start write out immediately, and
we waste time with idle disk. (If data compress really well, or
encryption is off, this is significant).

So we improve on this, by doing sync_file_range(SYNC_FILE_RANGE_WRITE)
periodically. That keeps the disk busy, but occassionaly blocks the
cpu... wasting time (which mostly hurts in compression+encryption
case).

So... how can I keep _both_ cpu and disk busy?

> Bear in mind that sync_file_range() doesn't sync metadata (ie: indirect
> blocks). So if they weren't already known to have been written, the
> data isn't safe.

I'm not trying to use this for correctness; I'm optimizing for
speed. At the end, I do fsync() anyway.

> > - * range which are not presently under writeback.
> > + * range which are not presently under writeback. Notice that even this this
> > + * may and will block if you attempt to write more than request queue size.
>
> um, OK. I'll fix the grammar a bit there.

Thanks.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/