Re: [PATCH 17/18] fs: icache remove inode_lock

From: Nick Piggin
Date: Fri Oct 15 2010 - 16:51:00 EST


On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote:
> On Fri, Oct 15, 2010 at 05:41:50PM +1100, Nick Piggin wrote:
> > You're worried about mere mortals reviewing and understanding it...
> > I don't really know. If you understand inode locking today, you
> > can understand the inode scaling series quite easily. Ditto for
> > dcache. rcu-walk path walking is trickier, but it is described in
> > detail in documentation and changelog.
> >
> > And you can understand the high level approach without exactly
> > digesting every detail at once. The inode locking work goes to
> > break up all global locks:
> >
> > - a single inode object is protected (to the same level as
> > inode_lock) with i_lock. This makes it really trivial for
> > filesystems to lock down the object without taking a global
> > lock.
>
> Which is unnecessarily wide, and results in i_lock having to have
> list locks nested inside it, and that leads to the lock
> inversion try-lock mess that several people have complained about.

Gee, you keep repeating this so often that you have me doubting
myself a tiny bit, so I have to check.

$ grep spin_trylock fs/inode.c fs/fs-writeback.c
fs/inode.c: if (!spin_trylock(&inode->i_lock)) {
fs/inode.c: if (!spin_trylock(&old->i_lock)) {
fs/inode.c: if (!spin_trylock(&old->i_lock)) {
fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) {
fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) {

This is your unmaintainable mess? You decided to rewrite your own
vfs-scale tree because I wanted i_lock to protect the icache state (and
offered you very good reason for)?

Well, surely they must be horrible complex and unmaintainable
beasts....

repeat:
/*
* Don't use RCU walks, common case of no old inode
* found requires hash lock.
*/
spin_lock_bucket(b);
hlist_bl_for_each_entry(old, node, &b->head, i_hash) {
if (old->i_ino != ino)
continue;
if (old->i_sb != sb)
continue;
if (old->i_state & (I_FREEING|I_WILL_FREE))
continue;
if (!spin_trylock(&old->i_lock)) {
spin_unlock_bucket(b);
cpu_relax();
goto repeat;
}

Nope, no big deal. The rest are much the same. So thanks for the
repeated suggestion, but I'll actually prefer to keep my regular i_lock
locking scheme where you don't need to look up the documentation and
think hard about coherency between protected and unprotected parts of
the inode whenever you use it. I didn't stumble upon my locking design
by chance.

If you think a few trylocks == impending doom, then xfs is looking
pretty poorly at this point. So I would ask that you stop making things
up about my patch series. If you dislike the trylocks so much that you
think it is worth breaking the i_lock regularity or using RCU or
whatever, then please propose them as incremental patches to the end
of my series where you can see they logically will fit. You know I will
argue that locking consistency is more important for maintainability
than these few trylocks, however.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/