Re: SLUB defrag pull request?

From: Christoph Lameter
Date: Wed Oct 22 2008 - 11:53:12 EST


On Wed, 22 Oct 2008, Miklos Szeredi wrote:

On Tue, 21 Oct 2008, Christoph Lameter wrote:
The only way that a secure reference can be established is if the
slab page is locked. That requires a spinlock. The slab allocator
calls the get() functions while the slab lock guarantees object
existence. Then locks are dropped and reclaim actions can start with
the guarantee that the slab object will not suddenly vanish.

Yes, you've made up your mind, that you want to do it this way. But
it's the _wrong_ way, this "want to get a secure reference for use
later" leads to madness when applied to dentries or inodes. Try for a
minute to think outside this template.

For example dcache_lock will protect against dentries moving to/from
d_lru. So you can do this:

take dcache_lock
check if d_lru is non-empty

The dentry could have been freed even before we take the dcache_lock. We cannot access d_lru without a stable reference to the dentry.

take sb->s_umount
free dentry
release sb->s_umount
release dcache_lock

Yeah, locking will be more complicated in reality. Still, much less
complicated than trying to do the same across two separate phases.

Why can't something like that work?

Because the slab starts out with a series of objects left in a slab. It needs to do build a list of objects etc in a way that is independent as possible from the user of the slab page. It does that by locking the slab page so that free operations stall until the reference has been established. If it would not be shutting off frees then the objects could vanish under us.

We could also avoid frees by calling some cache specific method that locks out frees before and after. But then frees would stall everywhere and every slab cache would have to check a global lock before freeing objects (there would be numerous complications with RCU free etc etc).

Slab defrag only stops frees on a particular slab page.

The slab defrag approach also allows the slab cache (dentry or inodes here) to do something else than free the object. It would be possible f.e. to move the object by allocating a new entry and moving the information to the new dentry. That would actually be better since it would preserve the objects and just move them into the same slab page.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/