Re: proper place to discuss kernel 'bloatedness'?

Alexander Viro (viro@math.psu.edu)
Mon, 8 Feb 1999 12:05:25 -0500 (EST)


On 8 Feb 1999, Derek Atkins wrote:

> [ Note: I have not been following this thread, so I've only seen the
> last, oh, 6 messages. I'm sorry if I'm replying out of context. -derek ]
Umgh... Main part is in "Re: 2.2.2-pre2". The stuff here is a
tangential flamage with Victor. He decided to spread a bit of hogwash.

> Realistically, all you need to do to _not_ screw up binary module
> compatibility to do this 'move' is:
> 1) make a vfs_generic_rename() function which is exported
> by the kernel (for use in FS modules)

Huh???

> 2) DO NOT change the inode_operations structure at all
> 3) Change the vfs rename() system call to check if the
> inode_operations rename() function is NULL, and if so have it call
> the vfs_generic_rename()
> 4) Remove the FS-specific rename functions from the filesystems
> which can use the generic method.

And those ones would be...??? WTF is 'generic' rename, after all?
Sorry, Derek, it just doesn't make sense. Proposed change: (the only one
visible from AFS) - d_move() is called from VFS now. Not from foo_rename()
as it used to do. Other things (can be silently ignored):
a) serialization of rename()'s on a single fs. If you do it
already it will not harm, if you don't - it closes an actual bug (think
rename("/a/a","/b/b/b/b/b") racing with rename("/b/b","/a/a/a/a")).
b) POSIX brokenness^W semantics wrt rename(foo,bar) where foo and
bar are links to the same inode is enforced in VFS. No need to care about
it.
c) is_subdir() stuff is checked in VFS now. You are free to
recheck it, indeed - no harm will happen.
d) Race (happened in *all* filesystems in official kernel that
do support rename()) - if you are doing rename over the existing directory
you can't just check that there is no other owners and merrily check the
emptiness. Somebody may achieve the target since it's still hashed and
start creating something in it. It's a real race and I'ld ask you to check
your code for it. Again, *all* filesystems in the official tree were
affected and it easily leads to fs corruption. That's why we need to do
the thing - now vfs_rename() unhashes the target if needed and does
d_rehash() after the call of ->rename() method. *Then* it does d_move().

Please, *check* your code. The whole idea behind this change is to
solve rename() problem once and forever. New semantics for ->rename()
method being: do fs-specific tests and either do relevant fs changes and
return 0 or return an error code. Don't move dentry - it's VFS task and it
will do that. Again, the point being to drag piece of VFS duplicated in
*all* filesystems back to VFS and spare them from that stuff forever.

> The result of this is that the kernel interfaces do not change (adding
> a new interface does not consitute a change in the interface), but you
> get the added benefit that the filesystem itself can decide whether it
> wants to use the generic code or implement rename itself.

WHAT generic code may know about layout of filesystem and
fs-specifc actions required to move the file?

> I don't see this as being a major problem, assuming it is done
> "right." I can only assume that you do not plan to remove the
> rename() inode_operation.

Indeed I don't. Sorry, I'm not an idiot.

> If you do plan to actually _change_ the inode_operation, please wait
> until 2.3.

Derek, see above. If your code doesn't do stuff mentioned above
you have to fix it anyway, right? Let's take the generic tests away from
filesystems and do them right in VFS instead of fixing the same bugs in
each fs in existence. If we'll find more races - fine, they will be fixed
in VFS not affecting fs drivers. The less part of VFS is scattered over
the fs drivers the less dependent they are on each other. It *reduces*
odds of future breakage.
ObInodeOperations: I don't know whether you did it or not, but
adding several NULLs after the end of each struct inode_operations may be
a good idea. It will reduce chances of breaking your stuff if there will
be *additions* to inode_operations.
Could you get in touch with me? I suspect that it would be useful
for everybody.
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/