I am well aware of that, I'm simply saying that sucks. Doing a recursive chmod or setfacl on a large directory tree is slow as all hell.
Doing it in the kernel won't make it any faster.
You can't safely preserve POSIX semantics that way. For example, even without *ANY* ability to read /etc/shadow, I can easily "ln /etc/shadow /tmp/shadow", assuming they are on the same filesystem. If the /etc/shadow permissions depend on inherited ACLs to enforce access then that one little command just made your shadow file world-readable/writeable. Oops.
Think about it this way:
Permissions depend on *what* something is, not *where* it is. Under
Linux you can leave the digital equivalent of a $10,000 piece of jewelry lying around in /var/www and not have to worry about it being compromised as long as you set your permissions properly (not that I recommend it). Moving the piece of jewelry around your house does not change what it *is* (and by extension does not change the protection required on it), any more than "ln /etc/shadow /tmp/shadow" (or "mv") changes what *it* is. If your /house is really extraordinarily secure then you could leave the jewelry lying around as /house/gems.bin with permissions 0777, but if somebody had a back-door to /house (an open fd, a careless typo, etc) then you'd have the same issues.
Not necessarily. When I do "vim some-file-in-current-directory", for example, the kernel does *NOT* look up the path of my current directory. It does (in pseudocode):
if (starts_with_slash(filename)) {
entry = task->cwd;
} else {
entry = task->root;
}
while (have_components_left(filename)
entry = lookup_next_component(filename);
return entry;
That's not even paying attention to functions like "fchdir" or their interactions with "chroot" and namespaces. I can probably have an open directory handle to a volume in a completely different namespace, a volume which isn't even *MOUNTED* in my current fs namespace. Using that file-handle I believe I can "fchdir", "openat", etc, in a completely different namespace. I can do the same thing with a chroot, except there I can even "escape":
/* Switch into chroot. Doesn't drop root privs */
chdir("/some/dir/somewhere");
chroot(".");
/* Malicious code later on */
chdir("/");
chroot("another_dir");
chdir("../../../../../../../../..");
chroot(".");
/* Now I'm back in the real root filesystem */
The locking penalty is because the path-lookup is *not* implied. The above chroot example shows that in detail. If you have to do the lookup in *reverse* on every open operation then you have to either:
(A) Store lots of security context with every open directory (cwd included). When a directory you have open is moved, you still have full access to everything inside it since your handle's data hasn't changed.
See above. Linux has a very *very* powerful VFS nowadays. It also has support for shared subtrees across private namespaces, allowing things like polyinstantiation. For example you can configure a Linux box so every user gets their own empty /tmp directory created and mounted on their namespace's /tmp when they log in, but all the rest of the system mounts are all shared.