Re: [PATCH 02/17] fs: icache lock s_inodes list

From: Nick Piggin
Date: Sat Oct 16 2010 - 22:04:00 EST


On Sat, Oct 16, 2010 at 08:42:57PM -0400, Christoph Hellwig wrote:
> On Sun, Oct 17, 2010 at 04:09:11AM +1100, Nick Piggin wrote:
> > If you want it to be scalable within a single sb, it needs to be
> > per cpu. If it is per-cpu it does not need to be per-sb as well
> > which just adds bloat.
>
> Right now the patches split up the inode lock and do not add
> per-cpu magic. It's not any more work to move from per-sb lists
> to per-cpu locking if we eventually do it than moving from global
> to per-cpu.

But it's more work to do per-sb lists than a global list, and as
I'm going to a per-cpu locking anyway it's a strange transition
to go from per-sb to per-cpu (rather than per-sb, per-cpu). In short,
the fact that I build up the locking transformations starting with
global locks is just not something that can be held against my
patch set (unless you really disagree with the whole concept of
how the series is structured).

>
> I'm not entirely convinced moving s_inodes to a per-cpu list is a good
> idea. For now per-sb is just fine for disk filesystems as they have
> much more fs-wide cachelines they touch for inode creatation/deletion
> anyway, and for sockets/pipes a variant of your patch to not ever
> add them to s_inodes sounds like the better approach.

Traditional filesystems on slow spinning disk are not the main
problem. It's very fast ssds and storage servers. XFS actually with
its per-AG lock splitting can already have problems on small servers
with not-incredibly-fast storage with per-sb scalability bottlenecks.

And if the VFS is not scalable, then the contention doesn't even
get pushed into the filesystem so the fs developers never even _see_
the locking problems to fix them.

I'm telling you it will be increasingly a problem because cores and
storage speeds continue to increase, and also people want to manage
more storage with fewer filesystems. It's obvious that it will be a
problem.

I've already got per cpu locking in vfsmounts and files lock, so it's
not magic.


> If we eventually hit the limit for disk filesystems I have some better
> ideas to solve this. One is to abuse whatever data sturcture we use
> for the inode hash also for iterating over all inodes - we only
> iterate over them in very few places, and none of them is a fast path.

Doing your handwaving about changing data types and better ideas
is just not helpful. _If_ you do have some better ideas, and _if_ we
change the data structure, _then_ it's trivial to change from percpu
locking to your better idea. It just doesn't work as an argument to
slow progress.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/