Re: [RFC][PATCH] inotify 0.10.0

From: Ray Lee
Date: Thu Sep 30 2004 - 20:24:56 EST


Okay, now I'm sure I'm not communicating well. Please let me try again,
and I'll see if I can do better this time.

On Thu, 2004-09-30 at 10:48 -0700, Paul Jackson wrote:
> Ray wrote:
> > However, dnotify has shown that forcing userspace to cache the
> > entire contents of all the directories it wishes to watch doesn't scale
> > well.
>
> The dnotify cache isn't just an optional performance tweak, caching
> inode to name. It's an essential tracking of much of the stat
> information of any file of interest, in order to have _any_ way of
> detecting which file changed.

Agreed.

> The inotify cache could just have the last few most recently used
> entries or some such - it's just a space vs time performance tradeoff.

Not the case.

> So passing back an inode number doesn't come close to reintroducing
> the forced tracking of all the interesting stat data of every file.

It certainly forces userspace to track all file names and inodes, at
least. Userspace wishes to know about deletes and renames. Unless it
caches everything, it won't know what was deleted, or what got renamed.
For that matter, passing back the inode number might also cause
confusion for hardlinked files in the same directory, though I'd have to
think about that for a bit to be sure.

Perhaps I'm missing something. I haven't missed anything yet today,
which means I'm overdue.

> > dnotify already forces an O(N) workload per directory onto
> > userspace, and that has shown itself to not work well.
>
> Just because there exists an O(N) solution that has been shown to fail
> doesn't mean all O(N) solutions fail.

<shrug> Okay, I'll agree with you in theory. This is in fact true for
several classes of multiplication algorithms, for example. (Who's gonna
do an FFT multiply of two two-digit numbers, I ask; no one.) So, yeah, I
get your point. Honest.

> If the constants change enough,
> then the actual numbers (how many changes per minute, how many compute
> cycles and memory bytes, what's the likely cache hit rate for a given
> size cache, ...) need to be re-examined. I see no evidence of that
> re-examination -- am I missing something?

This is one of the parts I'm not communicating well. For any 'event'
that inotify wants to relay, the kernel knows the inode and the name.
You're saying pass the inode number, as it's smaller and makes for an
easier and higher speed interface to get changes to userspace (if I
understand you correctly).

I'm saying, sure, if you look at that corner of the problem, you're
right. That's ignoring how this gets used, however, where userspace
ultimately wants to know the name of the file that's changed. So, by
your method, we scan the entire directory, looking for the inode that
matches via the getdents call.

So, the kernel ended up sending the name of the changed file *anyway*,
but userspace had to ask for it. And, oh, the kernel sent the name of
every other file in that directory too. (Okay, maybe half, but then
again maybe not if you want to catch hardlinked files.)

> > the kernel *has* that information already, ...
>
> Frustrating, isn't it. The information is right there, and we're trying
> to keep you from getting to it.
>
> Whatever ...

No, that's not my point at all. I'm not saying Big Bad Kernel Developers
won't let me have my candy. I'm saying that if the kernel has the
information already, and we're not passing it to userspace, and
userspace *needs* that information, then all we've done is guarantee
another long set of syscalls while it tries to pull the directory
contents to match up item for item against its cache of previously known
file states.

Or, in other words, I don't see how tossing away needed information is
going to make anything more efficient in this case. Userspace, like a
bulldog, will be coming back knocking on the kernel's door to get that
information one way or another.

~ ~

I've been avoiding saying this, as this really will be a bit more
complex than even my suggestions, but perhaps everyone would be happier
if we crammed all this through a netlink socket instead. Got me.

Ray

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/