Re: [BUG] Lockless patches cause hardlock under heavy IO

From: Nick Piggin
Date: Thu Jun 19 2008 - 04:20:18 EST


On Thursday 19 June 2008 18:12, Peter Zijlstra wrote:
> On Wed, 2008-06-18 at 17:15 -0400, Ryan Hope wrote:
> > I applied the following patches from 2.6-26-rc5-mm3 to 2.6.26-rc6 and
> > they caused a hardlock under heavy IO:
>
> What kind of machine, how much memory, how many spindles, what
> filesystem and what is heavy load?
>
> Furthermore, try the NMI watchdog with serial/net-console to capture its
> output.


Good suggestions. A trace would be really helpful.

As Arjan suggested, debug options especially CONFIG_DEBUG_VM would be
a good idea to turn on if you haven't already.

BTW. what was the reason for applying those patches? Did you hit the
problem with -mm also, and hope to narrow it down?


> > x86-implement-pte_special.patch
> > mm-introduce-get_user_pages_fast.patch
> > mm-introduce-get_user_pages_fast-fix.patch
> > mm-introduce-get_user_pages_fast-checkpatch-fixes.patch
> > x86-lockless-get_user_pages_fast.patch
> > x86-lockless-get_user_pages_fast-checkpatch-fixes.patch
> > x86-lockless-get_user_pages_fast-fix.patch
> > x86-lockless-get_user_pages_fast-fix-2.patch
> > x86-lockless-get_user_pages_fast-fix-2-fix-fix.patch
> > x86-lockless-get_user_pages_fast-fix-warning.patch
> > dio-use-get_user_pages_fast.patch
> > splice-use-get_user_pages_fast.patch
> > x86-support-1gb-hugepages-with-get_user_pages_lockless.patch
> > #
> > mm-readahead-scan-lockless.patch
> > radix-tree-add-gang_lookup_slot-gang_lookup_slot_tag.patch
> > #mm-speculative-page-references.patch: clameter saw bustage
> > mm-speculative-page-references.patch
> > mm-speculative-page-references-fix.patch
> > mm-speculative-page-references-fix-fix.patch
> > mm-speculative-page-references-hugh-fix3.patch
> > mm-lockless-pagecache.patch
> > mm-spinlock-tree_lock.patch
> > powerpc-implement-pte_special.patch
> >
> > I am on an x86_64. I dont know what other info you need...

Can you isolate it to one of the two groups of patches? I suspect it
might be the latter so you might try that first -- this version of
speculative page references is very nice in theory but it is a little
more complex to implement the slowpaths so it could be an error there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/