Re: [BUG] Lockless patches cause hardlock under heavy IO

From: Nick Piggin
Date: Mon Jun 23 2008 - 20:14:41 EST


On Monday 23 June 2008 23:05, Paul E. McKenney wrote:
> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote:
> > On Monday 23 June 2008 13:51, Ryan Hope wrote:
> > > well i get the hardlock on -mm with out using reiser4, i am pretty
> > > sure is swap related
> >
> > The guys seeing hangs don't use PREEMPT_RCU, do they?
> >
> > In my swapping tests, I found -mm3 to be stable with classic RCU, but
> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather
> > quickly. First crash was in find_get_pages so I suspected lockless
> > pagecache doing something subtly wrong with the RCU API, but I just got
> > another crash in __d_lookup:
>
> Could you please send me a repeat-by? (At least Alexey is no longer
> alone!)

OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably
important to reproduce it (but the fact that I'm reproducing oopses
with << PAGE_SIZE objects like dentries and radix tree nodes indicates
that there is even more free-before-grace activity going undetected --
if you construct a test case using full pages, it might become even
easier to detect with DEBUG_PAGEALLOC).

2 socket, 8 core x86 system.

I mounted two tmpfs filesystems, one contains a single large file
which is formatted as 1K block size ext3 and mounted loopback, the
other is used directly. Linux kernel source is unpacked on each mount
and concurrent make -j128 on each. This pushes it pretty hard into
swap. Classic RCU survived another 5 hours of this last night.

But that's a fairly convoluted test for an RCU problem. I expect it
should be easier to trigger with something more targetted...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/