Re: Proposal: Use hi-res clock for file timestamps

From: J. Bruce Fields
Date: Thu Aug 19 2010 - 18:48:26 EST


On Thu, Aug 19, 2010 at 12:44:13PM +1000, Neil Brown wrote:
> On Wed, 18 Aug 2010 22:08:03 -0400
> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
>
> > On Thu, Aug 19, 2010 at 10:52:18AM +1000, Neil Brown wrote:
> > > On Thu, 19 Aug 2010 09:41:36 +1000
> > > Neil Brown <neilb@xxxxxxx> wrote:
> > >
> > > > So I agree that this is probably more of an issue for directories than for
> > > > files, and that implementing it just for directories would be a sensible
> > > > first step with lower expected overhead - just my reasoning seems to be a bit
> > > > different.
> > >
> > > Just to be sure we are on the same page:
> > > file_update_time would always refer to current_nfsd_time, but nfsd would
> > > only update current_nfsd_time when a directory was examined (and the other
> > > conditions were met).
> > >
> > >
> > > So my current thinking on how this would look - names have been changed:
> > >
> > > - global timespec 'current_fs_precise_time' is zeroed when
> > > current_kernel_time moves backwards and is protected by a seqlock
> > >
> > > - current_fs_time would be
> > > now = max(current_kernel_time(), current_fs_precise_time)
> > > return timespec_trunc(now, sb->s_time_gran)
> > > (with appropriate seqlock protection)
> > >
> > > - new function in fs/inode.c
> > > get_precise_time(timestamp)
> >
> > Odd name for something that returns nothing of interest;
> > bump_precise_time() might be closer?
> >
> > And unique_time might be better than precise_time, since the property
> > we're asking for is that mtime on a changed file by new? (Or
> > versioned_time?)
>
> Agreed on both counts, tough I'm not keen on 'bump' myself.
> got_unique_time()
> because that it what we just did... I prefer the name to reflect why the
> function is called, rather than what the function is expected to do about it.
> never_use_this_timestamp_again(timestamp)
> :-?

Maybe "retire" for a pithier version of never_use_again:

/**
* retire_timestamp - prevent a timestamp from being reused as an mtime.
* @timestamp
*
* Advance the clock used to generate mtimes to guarantee that the
* given timestamp will not be reused on any future mtime update.
* This allows the given timestamp to be passed back to users such as
* nfs clients which need the guarantee that mtimes will always change
* on file updates.
*
* Depending on the filesystem's s_time_gran this may not be an ironclad
* guarantee.
*/

?

>
>
> >
> > > cft = current_fs_time()
> > > if (timestamp == cft)
> > /*
> > * Make sure the next mtime stored will be
> > * something different from timestamp:
> > */
> > > write_seqlock()
> > > if cft == current_fs_precise_time
> > > current_fs_precise_time.tv_nsec++
> > > else if cft > current_fs_precise_time
> >
> > What's the cft < current_fs_precise_time case?
>
> The current_fs_precise_time has been incremented with a resolution higher
> than s_time_gran. i.e. s_time_gran > 1.
> I'm not really sure what we want to do about that.
> Maybe we should be incrementing tv_nsec by s_time_gran as long as that is
> significantly less than jiffies_to_usec(1)*1000, but I don't know what I mean
> by 'significantly'.

How about just scratching "significantly" and saying "less"? As long as
we know jiffies is the default time source for mtimes, that should be
safe, shouldn't it?

> The only values I can find for s_time_gran in current code are 1, 100, 1000
> and 1000000000.

I didn't even know there were any other than 1 and a billion. OK!

> All those are either way bigger than a jiffie or significantly smaller, but
> suppose a filesystem came along that chose 1000000 (i.e. millisecond
> timestamps) - should we increment tv_nsec by 1000000, or not, or cross that
> bridge when we come to it?
>
> For reference:
> default is 1000000000 (this would cover ext2, ext3, reiserfs, fat, sysv, ...)
> cifs, smbfs, ntfs are 100
> udf, ceph are 1000
> rest (btrfs, ext4, gfs2, jfs, nilfs, ocfs2, xfs and virtual filesystems) are 1

Interesting list, thanks!

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/