Re: [patch 1/6] fs: icache RCU free inodes

From: Nick Piggin
Date: Sun Nov 14 2010 - 23:21:27 EST


On Mon, Nov 15, 2010 at 12:00:27PM +1100, Dave Chinner wrote:
> On Fri, Nov 12, 2010 at 12:24:21PM +1100, Nick Piggin wrote:
> > On Wed, Nov 10, 2010 at 9:05 AM, Nick Piggin <npiggin@xxxxxxxxx> wrote:
> > > On Tue, Nov 09, 2010 at 09:08:17AM -0800, Linus Torvalds wrote:
> > >> On Tue, Nov 9, 2010 at 8:21 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> > >> >
> > >> > You can see problems using this fancy thing :
> > >> >
> > >> > - Need to use slab ctor() to not overwrite some sensitive fields of
> > >> > reused inodes.
> > >> >  (spinlock, next pointer)
> > >>
> > >> Yes, the downside of using SLAB_DESTROY_BY_RCU is that you really
> > >> cannot initialize some fields in the allocation path, because they may
> > >> end up being still used while allocating a new (well, re-used) entry.
> > >>
> > >> However, I think that in the long run we pretty much _have_ to do that
> > >> anyway, because the "free each inode separately with RCU" is a real
> > >> overhead (Nick reports 10-20% cost). So it just makes my skin crawl to
> > >> go that way.
> > >
> > > This is a creat/unlink loop on a tmpfs filesystem. Any real filesystem
> > > is going to be *much* heavier in creat/unlink (so that 10-20% cost would
> > > look more like a few %), and any real workload is going to have much
> > > less intensive pattern.
> >
> > So to get some more precise numbers, on a new kernel, and on a nehalem
> > class CPU, creat/unlink busy loop on ramfs (worst possible case for inode
> > RCU), then inode RCU costs 12% more time.
> >
> > If we go to ext4 over ramdisk, it's 4.2% slower. Btrfs is 4.3% slower, XFS
> > is about 4.9% slower.
>
> That is actually significant because in the current XFS performance
> using delayed logging for pure metadata operations is not that far
> off ramdisk results. Indeed, the simple test:
>
> while (i++ < 1000 * 1000) {
> int fd = open("foo", O_CREAT|O_RDWR, 777);
> unlink("foo");
> close(fd);
> }
>
> Running 8 instances of the above on XFS, each in their own
> directory, on a single sata drive with delayed logging enabled with
> my current working XFS tree (includes SLAB_DESTROY_BY_RCU inode
> cache and XFS inode cache, and numerous other XFS scalability
> enhancements) currently runs at ~250k files/s. It took ~33s for 8 of
> those loops above to complete in parallel, and was 100% CPU bound...

David,

This is 30K inodes per second per CPU, versus nearly 800K per second
number that I measured the 12% slowdown with. About 25x slower. How you
are trying to FUD this as doing anything but confirming my hypothesis, I
don't know and honestly I don't want to know so don't try to tell me.

That you are still at this campaign of negative and destructive crap
baffles me. All the effort you've put into negativity and obstruction,
you could have gone and got some *actual* numbers. But no, you're
obviously more interested in FUD.

I don't know what is funnier, that I keep responding to you, or that you
keep expecting me to reply when you ignore all _my_ comments about your
patches and ignore all the reasons I have given to want to merge things
my way (eg. my response to SLAB_DESTROY_BY_RCU patch where you ignored
all my feedback, and you ignore this entire thread about how and why I
want to approach rcu-walk in the way I do).

But that's it. I have explained my position, offered reasonable answers
to all questions and objections, shown good numbers, and given
strategies that regressions can be solved with. That's all I need to do.

I acknowledge the very small potential for regressions with inode-RCU
for a very small number of users. I also weigh that against complexity
and reviewability, and against the very large speedups for very many
users that rcu-walk can give. And also offered approaches for ways that
future work can resolve any regressions. You ignored all that.

You show me no respect or cortesy and seem to take me as a big joke. So
at this point I'm not interested in your handwaving or opinions. Is that
clear? Until you 1) treat me the way you expect to be treated, and 2)
actaully have something constructive, do do not cc me. I do not care.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/