Re: explicit dcache <-> user-space cache coherency, sys_mark_dir_clean(), O_CLEAN

From: Jamie Lokier
Date: Fri Feb 20 2004 - 08:27:55 EST


Ingo Molnar wrote:
> - it's a synchronous solution that avoids signals, and is thus
> usable/robust in libraries too.

I do like the robustness, due to the way the flag is associated with
dirfd.

> - dnotify _forces_ action. mark_dir_clean() you can use if there's use
> and there's no overhead if the Samba workload is completely silent
> and there are only POSIX users. I.e. it should scale better than
> dnotify.

Firstly, it doesn't scale better than dnotify in this scenario if
you're using signals and one-shot mode. With one-shot mode, there's
also no overhead if there are only POSIX users.

Secondly, dnotify is synchronous if you _block_ the dnotify signal and
use sigtimedwait() to collect events.

I'd personally like to see the robustness problem solved with epoll
or something similar extended to queue more general events to
userspace, more reliably.

Hey! That's an idea: Use select/poll on the dirfd to read this bit.
That gives you the flexibility to wait, or collect events from
multiple directories. (Philosophicaly, every event should be
accessible through select/poll/epoll, right?).

> We can export the 'directory clean bit' to userspace, via the same page
> pinning and mapping techniques used by futexes. User-space could
> register a 'clean bit' address via a new syscall, which the dcache then
> uses from that point on. Thus there would be only a single syscall when
> Samba sets up a directory cache in user-space [which needs those
> readdir() calls so performance is down the drain anyway], which syscall
> lets userspace register a machine-word address to serve as the
> 'directory is clean' flag. Userspace and kernelspace will set this flag
> possibly in parallel which is not a problem as long as userspace uses
> atomic ops. This approach introduces some page pinning allocation
> overhead but that's easy to solve. User-space would of course condense
> the pinned range. Kernel-space would see very minimal overhead from
> having the bit in an indirect pointer - at least on 64-bit systems where
> all kernel RAM is mapped.

That is _far_ too overcomplicated to do just for this one obscure bit
of information.

If you're going to do kernel-triggers-futex event transmission (like
CLONE_CLEARTID already does), it would be nice to have a framework for
more general event types, like pending-but-blocked signals, dnotify
events (individual ones, not all funnelled through one signal number
as they are now), fd-ready-to-read events and such. The sort of
things that select/poll/epoll ought to be able to wait for, and in
most cases can.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/