Re: [RFC 00/11] DAX fsynx/msync support

From: Dave Chinner
Date: Sun Nov 01 2015 - 18:37:23 EST


On Fri, Oct 30, 2015 at 12:51:40PM -0700, Dan Williams wrote:
> On Fri, Oct 30, 2015 at 12:43 PM, Ross Zwisler
> <ross.zwisler@xxxxxxxxxxxxxxx> wrote:
> > On Fri, Oct 30, 2015 at 11:34:07AM -0700, Dan Williams wrote:
> >> This is great to have when the flush-the-world solution ends up
> >> killing performance. However, there are a couple mitigating options
> >> for workloads that dirty small amounts and flush often that we need to
> >> collect data on:
> >>
> >> 1/ Using cache management and pcommit from userspace to skip calls to
> >> msync / fsync. Although, this does not eliminate all calls to
> >> blkdev_issue_flush as the fs may invoke it for other reasons. I
> >> suspect turning on REQ_FUA support eliminates a number of those
> >> invocations, and pmem already satisfies REQ_FUA semantics by default.
> >
> > Sure, I'll turn on REQ_FUA in addition to REQ_FLUSH - I agree that PMEM
> > already handles the requirements of REQ_FUA, but I didn't realize that it
> > might reduce the number of REQ_FLUSH bios we receive.
>
> I'll let Dave chime in, but a lot of the flush requirements come from
> guaranteeing the state of the metadata, if metadata updates can be
> done with REQ_FUA then there is no subsequent need to flush.

No need for cache flushes in this case, but we still need the IO
scheduler to order such operations correctly.

> >> 2/ Turn off DAX and use the page cache. As Dave mentions [1] we
> >> should enable this control on a per-inode basis. I'm folding in this
> >> capability as a blkdev_ioctl for the next version of the raw block DAX
> >> support patch.
> >
> > Umm...I think you just said "the way to avoid this delay is to just not use
> > DAX". :) I don't think this is where we want to go - we are trying to make
> > DAX better, not abandon it.
>
> That's a bit of an exaggeration. Avoiding DAX where it is not
> necessary is not "abandoning DAX", it's using the right tool for the
> job. Page cache is fine for many cases.

Think btrfs - any file that uses COW can't use DAX for write.
Everything has to be buffered, unless the nodatacow flag is set, and
then DAX can be used. Indeed, on ext4 if you are using file
encryption you can't use DAX.

IOWs, we already know that we have to support mixed DAX/non-DAX
access within the same filesystem, so I'm with Dan here...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/