Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

From: Trond Myklebust
Date: Thu Sep 08 2022 - 22:14:40 EST


On Fri, 2022-09-09 at 01:10 +0000, Trond Myklebust wrote:
> On Fri, 2022-09-09 at 11:07 +1000, NeilBrown wrote:
> > On Fri, 09 Sep 2022, NeilBrown wrote:
> > > On Fri, 09 Sep 2022, Trond Myklebust wrote:
> > >
> > > >
> > > > IOW: the minimal condition needs to be that for all cases
> > > > below,
> > > > the
> > > > application reads 'state B' as having occurred if any data was
> > > > committed to disk before the crash.
> > > >
> > > > Application                             Filesystem
> > > > ===========                             =========
> > > > read change attr <- 'state A'
> > > > read data <- 'state A'
> > > >                                         write data -> 'state B'
> > > >                                         <crash>+<reboot>
> > > > read change attr <- 'state B'
> > >
> > > The important thing here is to not see 'state A'.  Seeing 'state
> > > C'
> > > should be acceptable.  Worst case we could merge in wall-clock
> > > time
> > > of
> > > system boot, but the filesystem should be able to be more helpful
> > > than
> > > that.
> > >
> >
> > Actually, without the crash+reboot it would still be acceptable to
> > see
> > "state A" at the end there - but preferably not for long.
> > From the NFS perspective, the changeid needs to update by the time
> > of
> > a
> > close or unlock (so it is visible to open or lock), but before that
> > it
> > is just best-effort.
>
> Nope. That will inevitably lead to data corruption, since the
> application might decide to use the data from state A instead of
> revalidating it.
>

The point is, NFS is not the only potential use case for change
attributes. We wouldn't be bothering to discuss statx() if it was.

I could be using O_DIRECT, and all the tricks in order to ensure that
my stock broker application (to choose one example) has access to the
absolute very latest prices when I'm trying to execute a trade.
When the filesystem then says 'the prices haven't changed since your
last read because the change attribute on the database file is the
same' in response to a statx() request with the AT_STATX_FORCE_SYNC
flag set, then why shouldn't my application be able to assume it can
serve those prices right out of memory instead of having to go to disk?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx