Re: explicit dcache <-> user-space cache coherency, sys_mark_dir_clean(), O_CLEAN

From: Jamie Lokier
Date: Mon Feb 23 2004 - 12:35:37 EST


Ingo Molnar wrote:
> > Another issue is that most machines don't have nanosecond resolution
> > clocks (e.g. m68k is limited to timer interrupt resolution, and some
> > x86 machines cannot use the cycle counter), [...]
>
> this is not an issue, with the monotonicity solution i suggested: if
> prev_time == curr_time then curr_time.tv_nsec++.

So did you read this paragraph?:

Important: you *don't* have to delay unless someone has _read_ the
mtime since the last time it was set, _and_ if the mtime wouldn't be
changed by the current operation, _and_ if you cannot simply increment
the low-order bits of mtime due to known limits on the system clock
resolution.

According to that logic, you are correct that the delay isn't
required, and tv_nsec++ works, except on very fast machines which
don't exist yet, _or_ on highly concurrent machines (maybe the 256-way
NUMA boxes?)

When it's called a "generation counter" or "uniqueness stamp", there
is no problem just changing the number.

When it's called a "timestamp", programs will also compare the
timestamps of _different_ files to see which order they were written.
("make" is the simplest example.) Therefore it's important to get
this order logically correct.

> > The right place to put the delay is in the kernel, when mtime is set
> > or when it is read, or both.
>
> no need to delay anything - just do the tv_nsec++ thing!

As you yourself pointed out, that doesn't work on machines in 5 years
time: Think 40GHz machines and dynamic translation which optimises
fast paths to a few highly concurrent cycles. Then it's easy to
imagine tv_nsec incrementing multiple times within a nanosecond.

Natural selection isn't the problem: programs assuming it is logically
dependable and returning erroneous results is the problem.

There are _lots_ of programs that could use a reliable indicator of
whether a file has changed. tv_nsec when it isn't stored isn't one of
them, so it would be a shame to have a half-engineered solution that
complex programs like, say, a Java generated-code-precacher must not
use because of rare logical failures.

I've offered a solution that offers correct results for _all_
filesystems and _all_ machines, and it will behave exactly like your
code if you run it with XFS or a similar filesystem, it's efficient
(the delay is minimal and only done when needed), and logically
dependable. So please consider it.

Thanks :)
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/