Re: [PATCH V4 07/13] fs: Add locking for a dynamic address space operations state

From: Christoph Hellwig
Date: Mon Feb 24 2020 - 12:56:08 EST


On Sat, Feb 22, 2020 at 09:44:19AM +1100, Dave Chinner wrote:
> > I am very much against this. There is a reason why we don't support
> > changes of ops vectors at runtime anywhere else, because it is
> > horribly complicated and impossible to get right. IFF we ever want
> > to change the DAX vs non-DAX mode (which I'm still not sold on) the
> > right way is to just add a few simple conditionals and merge the
> > aops, which is much easier to reason about, and less costly in overall
> > overhead.
>
> *cough*
>
> That's exactly what the original code did. And it was broken
> because page faults call multiple aops that are dependent on the
> result of the previous aop calls setting up the state correctly for
> the latter calls. And when S_DAX changes between those calls, shit
> breaks.

No, the original code was broken because it didn't serialize between
DAX and buffer access.

Take a step back and look where the problems are, and they are not
mostly with the aops. In fact the only aop useful for DAX is
is ->writepages, and how it uses ->writepages is actually a bit of
an abuse of that interface.

So what we really need it just a way to prevent switching the flag
when a file is mapped, and in the normal read/write path ensure the
flag can't be switch while I/O is going on, which could either be
done by ensuring it is only switched under i_rwsem or equivalent
protection, or by setting the DAX flag once in the iocb similar to
IOCB_DIRECT.

And they easiest way to get all this done is as a first step to
just allowing switching the flag when no blocks are allocated at
all, similar to how the rt flag works. Once that works properly
and use cases show up we can relax the requirements, and we need
to do that in a way without bloating the inode and various VFS
code paths, as DAX is a complete fringe feature, and Ñwitching
the flag and runtime is the fringe of a fringe. It just isn't worth
bloating the inode and wasting tons of developer time on it.