Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io

From: Verma, Vishal L
Date: Thu May 05 2016 - 17:42:20 EST


On Thu, 2016-05-05 at 08:15 -0700, Dan Williams wrote:
> On Thu, May 5, 2016 at 7:24 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx>
> wrote:
> >
> > On Mon, May 02, 2016 at 06:41:51PM +0300, Boaz Harrosh wrote:
> > >
> > > >
> > > > All IO in a dax filesystem used to go through dax_do_io, which
> > > > cannot
> > > > handle media errors, and thus cannot provide a recovery path
> > > > that can
> > > > send a write through the driver to clear errors.
> > > >
> > > > Add a new iocb flag for DAX, and set it only for DAX mounts. In
> > > > the IO
> > > > path for DAX filesystems, use the same direct_IO path for both
> > > > DAX and
> > > > direct_io iocbs, but use the flags to identify when we are in
> > > > O_DIRECT
> > > > mode vs non O_DIRECT with DAX, and for O_DIRECT, use the
> > > > conventional
> > > > direct_IO path instead of DAX.
> > > >
> > > Really? What are your thinking here?
> > >
> > > What about all the current users of O_DIRECT, you have just made
> > > them
> > > 4 times slower and "less concurrent*" then "buffred io" users.
> > > Since
> > > direct_IO path will queue an IO request and all.
> > > (And if it is not so slow then why do we need dax_do_io at all?
> > > [Rhetorical])
> > >
> > > I hate it that you overload the semantics of a known and expected
> > > O_DIRECT flag, for special pmem quirks. This is an incompatible
> > > and unrelated overload of the semantics of O_DIRECT.
> > Agreed - makig O_DIRECT less direct than not having it is plain
> > stupid,
> > and I somehow missed this initially.
> Of course I disagree because like Dave argues in the msync case we
> should do the correct thing first and make it fast later, but also
> like Dave this arguing in circles is getting tiresome.
>
> >
> > This whole DAX story turns into a major nightmare, and I fear all
> > our
> > hodge podge tweaks to the semantics aren't helping it.
> >
> > It seems like we simply need an explicit O_DAX for the read/write
> > bypass if can't sort out the semantics (error, writer
> > synchronization)
> > just as we need a special flag for MMAP.
> I don't see how O_DAX makes this situation better if the goal is to
> accelerate unmodified applications...
>
> Vishal, at least the "delete a file with a badblock" model will still
> work for implicitly clearing errors with your changes to stop doing
> block clearing in fs/dax.c.ÂÂThis combined with a new -EBADBLOCK (as
> Dave suggests) and explicit logging of I/Os that fail for this reason
> at least gives a chance to communicate errors in files to suitably
> aware applications / environments.

Agreed - I'll send out a series that has just the zeroing changes, and
drop the dax_io fallback/O_DIRECT tweak for now while we figure out the
right thing to do. That should get us to a place where we still have dax
in the presence of errors, and have _a_ path for recovery.

> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@xxxxxxxxxxxx
> https://lists.01.org/mailman/listinfo/linux-nvdimm