Re: Nanosecond fs timestamp support: sad
From: Matt Mackall
Date: Fri Jul 22 2011 - 20:08:06 EST
On Sat, 2011-07-23 at 08:59 +1000, NeilBrown wrote:
> On Fri, 22 Jul 2011 18:31:58 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
> > On Fri, Jul 22, 2011 at 06:10:39PM -0400, bfields wrote:
> > > On Fri, Jul 22, 2011 at 11:47:32PM +0200, Andi Kleen wrote:
> > > > On Fri, Jul 22, 2011 at 04:11:42PM -0500, Matt Mackall wrote:
> > > > > On Fri, 2011-07-22 at 22:59 +0200, Andi Kleen wrote:
> > > > > > > Indeed. Only usefully exists on ext4 and requires extra system calls.
> > > > > >
> > > > > > Not sure what you mean? It's in stat(2), just like the timestamps.
> > > > >
> > > > > I don't see anything that looks like a version or generation number in
> > > > > either the man pages, the asm-generic/stat.h, or glibc's asm/stat.h.
> > > > > Pointer?
> > > >
> > > > Hmm you're right. I thought it was in there, but apparently not.
> > > > I think it should be added there though. We still have some unused
> > > > fields.
> > >
> > > But last I checked I thought it was only ext4 that actually incremented
> > > the i_version on IO, and even then only when given a (non-default) mount
> > > option.
> > >
> > > My notes on what needs to be done there:
> > >
> > > - collect data to determine whether turning on i_version causes
> > > any significant performance regressions.
> > > - Last I talked to him, Ted Tso recommended running
> > > Bonnie on a local disk, since it does a lot of little
> > > writes, which is somewhat of a worst case, as it will
> > > generate extra metadata updates for each write.
> > > Compare total wall-clock time, number of iops, and
> > > number of bytes (using some kind of block tracing).
> > > - If there aren't any problems, turn it on by default, and we're
> > > done.
> > (Well,and talk the other filesystem implementors into doing it.)
> But does anyone apart from NFSv4 actually *want* i_version as opposed to the
> more-generally-useful precise timestamps?
In theory, a microsecond timestamp (ie gtod) may already not be good
enough for all applications. But i_version also doesn't allow comparing
> If not, we probably should tell NFSv4 to use timestamps and focus on making
> them work well.
> The timestamp used doesn't need to update ever nanosecond. I think if it
> were just updated on every userspace->kernel transition (or effective
> equivalents inside kernel threads) that would be enough capture all
> causality. I wonder how that would be achieved.. I wonder if RCU machinery
> could help - doesn't it keep track of when threads schedule ... or something?
- we only need to go to higher resolution when two events happen in the
same time quantum
- this applies at both the level of seconds and jiffies
- if the only file touched in a given quantum gets touched ago, we don't
need to update its timestamp if stat wasn't also called on it in this
- we never need to use a higher resolution than the global
For instance, if a machine is idle, except for writing to a single file
once a second, 1s resolution suffices.
If a machine is idle, except for writing to the same file 1000 times per
second, and no one is watching it, 1s still suffices (inode is dirtied
once per second).
Any time two files are touched in the same second, the second one (and
later files) needs jiffies resolution. Similarly, any time two files are
touched in the same jiffy, the second one should use gtod().
The global status bits needed to track this could be managed fairly
efficiently with cmpxchg.
(Arguably, we should supply > 1s resolution whether they're strictly
needed or not on filesystems with nanosecond support, so that people
casually inspecting timestamps don't wonder where their nanoseconds
Mathematics is the supreme nostalgia of our time.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/