Re: [PATCH v7 1/4] spinlock: A new lockref structure for locklessupdate of refcount

From: Al Viro
Date: Sun Sep 01 2013 - 19:30:40 EST


On Sun, Sep 01, 2013 at 03:48:01PM -0700, Linus Torvalds wrote:
> I made DEFINE_LGLOCK use DEFINE_PER_CPU_SHARED_ALIGNED for the
> spinlock, so that each local lock gets its own cacheline, and the
> total loops jumped to 62M (from 52-54M before). So when I looked at
> the numbers, I thought "oh, that helped".
>
> But then I looked closer, and realized that I just see a fair amount
> of boot-to-boot variation anyway (probably a lot to do with cache
> placement and how dentries got allocated etc). And it didn't actually
> help at all, the problem is stilte there, and lg_local_lock is still
> really really high on the profile, at 8% cpu time:
>
> - 8.00% lg_local_lock
> - lg_local_lock
> + 64.83% mntput_no_expire
> + 33.81% path_init
> + 0.78% mntput
> + 0.58% path_lookupat
>
> which just looks insane. And no, no lg_global_lock visible anywhere..
>
> So it's not false sharing. But something is bouncing *that* particular
> lock around.

Hrm... It excludes sharing between the locks, all right. AFAICS, that
won't exclude sharing with plain per-cpu vars, will it? Could you
tell what vfsmount_lock is sharing with on that build? The stuff between
it and files_lock doesn't have any cross-CPU writers, but with that
change it's the stuff after it that becomes interesting...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/