Re: [BUG] Lockless patches cause hardlock under heavy IO

From: Peter Zijlstra
Date: Sun Jun 22 2008 - 11:07:53 EST


On Sun, 2008-06-22 at 10:37 -0400, Ryan Hope wrote:
> Well I couldn't stop playing with this... I am pretty sure the cause
> of the hardlocks is in the second half of the patches (the speculative
> page ref patches). I reversed all of those patches so that just the
> GUP patchs were included and no more hardlocks... then I applied the
> concurrent page cache patches from the -rt branch include 1 OLD
> speculative page ref patch and this caused hardlocks for peopel again.
> However enabling heap randomization fixed the hardlocks for one of the
> users and the disabling swap fixed the issue of the other user. I hope
> this helps.

What are people doing to make it hang?

> On Thu, Jun 19, 2008 at 4:19 AM, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> > On Thursday 19 June 2008 18:12, Peter Zijlstra wrote:
> >> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote:
> >> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and
> >> > they caused a hardlock under heavy IO:
> >>
> >> What kind of machine, how much memory, how many spindles, what
> >> filesystem and what is heavy load?
> >>
> >> Furthermore, try the NMI watchdog with serial/net-console to capture its
> >> output.
> >
> >
> > Good suggestions. A trace would be really helpful.
> >
> > As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be
> > a good idea to turn on if you haven't already.
> >
> > BTW. what was the reason for applying those patches? Did you hit the
> > problem with -mm also, and hope to narrow it down?
> >
> >
> >> > x86-implement-pte_special.patch
> >> > mm-introduce-get_user_pages_fast.patch
> >> > mm-introduce-get_user_pages_fast-fix.patch
> >> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch
> >> > x86-lockless-get_user_pages_fast.patch
> >> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch
> >> > x86-lockless-get_user_pages_fast-fix.patch
> >> > x86-lockless-get_user_pages_fast-fix-2.patch
> >> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch
> >> > x86-lockless-get_user_pages_fast-fix-warning.patch
> >> > dio-use-get_user_pages_fast.patch
> >> > splice-use-get_user_pages_fast.patch
> >> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch
> >> > #
> >> > mm-readahead-scan-lockless.patch
> >> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch
> >> > #mm-speculative-page-references.patch: clameter saw bustage
> >> > mm-speculative-page-references.patch
> >> > mm-speculative-page-references-fix.patch
> >> > mm-speculative-page-references-fix-fix.patch
> >> > mm-speculative-page-references-hugh-fix3.patch
> >> > mm-lockless-pagecache.patch
> >> > mm-spinlock-tree_lock.patch
> >> > powerpc-implement-pte_special.patch
> >> >
> >> > I am on an x86_64. I dont know what other info you need...
> >
> > Can you isolate it to one of the two groups of patches? I suspect it
> > might be the latter so you might try that first -- this version of
> > speculative page references is very nice in theory but it is a little
> > more complex to implement the slowpaths so it could be an error there.
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/