Re: SLUB BUG: check_slab called with interrupts enabled

From: Rik van Riel
Date: Wed Jun 15 2011 - 11:17:02 EST


On 06/15/2011 11:03 AM, Christoph Lameter wrote:
On Wed, 15 Jun 2011, Rik van Riel wrote:

Hi Christoph,

last night I got an interesting backtrace running 3.0-rc3
(Fedora Rawhide kernel package). Unfortunately netconsole
seems to be incompatible with KVM at the moment, so I had
to capture the oops on my digital camera and will be
transcribing just the backtrace.

Essentially, kernel 3.0-rc3 hit this bug:

static int check_slab(struct kmem_cache *s, struct page *page)
{
int maxobj;

VM_BUG_ON(!irqs_disabled());

The call trace:

check_slab
alloc_debug_processing
__slab_alloc
kmem_cache_alloc
bvec_alloc_bs
bio_alloc_bioset
bio_alloc
mpage_alloc
do_mpage_readpage
... followed by ext4 and VFS code, obviously innocent

__slab_alloc() disables interrupts so alloc_debug_processing() should not
run into this issue.

There are no additional special slub patches applied right? Because some
of the patches under discussion change the interrupt disable handling a
bit.

Just the two attached ones, which don't seem to touch the
code path in question...

--
All rights reversed
From linux-fsdevel-owner@xxxxxxxxxxxxxxx Fri May 13 10:04:18 2011
From: Mel Gorman <mgorman@xxxxxxx>
To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>,
Colin King <colin.king@xxxxxxxxxxxxx>,
Raghavendra D Prabhu <raghu.prabhu13@xxxxxxxxx>,
Jan Kara <jack@xxxxxxx>, Chris Mason <chris.mason@xxxxxxxxxx>,
Christoph Lameter <cl@xxxxxxxxx>,
Pekka Enberg <penberg@xxxxxxxxxx>,
Rik van Riel <riel@xxxxxxxxxx>,
Johannes Weiner <hannes@xxxxxxxxxxx>,
linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>,
linux-mm <linux-mm@xxxxxxxxx>,
linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>,
linux-ext4 <linux-ext4@xxxxxxxxxxxxxxx>,
Mel Gorman <mgorman@xxxxxxx>
Subject: [PATCH 3/4] mm: slub: Do not take expensive steps for SLUBs speculative high-order allocations
Date: Fri, 13 May 2011 15:03:23 +0100
Message-Id: <1305295404-12129-4-git-send-email-mgorman@xxxxxxx>
X-Mailing-List: linux-fsdevel@xxxxxxxxxxxxxxx

To avoid locking and per-cpu overhead, SLUB optimisically uses
high-order allocations and falls back to lower allocations if they
fail. However, by simply trying to allocate, the caller can enter
compaction or reclaim - both of which are likely to cost more than the
benefit of using high-order pages in SLUB. On a desktop system, two
users report that the system is getting stalled with kswapd using large
amounts of CPU.

This patch prevents SLUB taking any expensive steps when trying to use
high-order allocations. Instead, it is expected to fall back to smaller
orders more aggressively. Testing was somewhat inconclusive on how much
this helped but it makes sense that falling back to order-0 allocations
is faster than entering compaction or direct reclaim.

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
---
mm/page_alloc.c | 3 ++-
mm/slub.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9f8a97b..057f1e2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
{
int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
const gfp_t wait = gfp_mask & __GFP_WAIT;
+ const gfp_t can_wake_kswapd = !(gfp_mask & __GFP_NO_KSWAPD);

/* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */
BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH);
@@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
*/
alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH);

- if (!wait) {
+ if (!wait && can_wake_kswapd) {
/*
* Not worth trying to allocate harder for
* __GFP_NOMEMALLOC even if it can't schedule.
diff --git a/mm/slub.c b/mm/slub.c
index 98c358d..c5797ab 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
* Let the initial higher-order allocation fail under memory pressure
* so we fall-back to the minimum order allocation.
*/
- alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL;
+ alloc_gfp = (flags | __GFP_NOWARN | __GFP_NO_KSWAPD) &
+ ~(__GFP_NOFAIL | __GFP_WAIT | __GFP_REPEAT);

page = alloc_slab_page(alloc_gfp, node, oo);
if (unlikely(!page)) {
--
1.7.3.4
From linux-fsdevel-owner@xxxxxxxxxxxxxxx Fri May 13 10:04:00 2011
From: Mel Gorman <mgorman@xxxxxxx>
To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>,
Colin King <colin.king@xxxxxxxxxxxxx>,
Raghavendra D Prabhu <raghu.prabhu13@xxxxxxxxx>,
Jan Kara <jack@xxxxxxx>, Chris Mason <chris.mason@xxxxxxxxxx>,
Christoph Lameter <cl@xxxxxxxxx>,
Pekka Enberg <penberg@xxxxxxxxxx>,
Rik van Riel <riel@xxxxxxxxxx>,
Johannes Weiner <hannes@xxxxxxxxxxx>,
linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>,
linux-mm <linux-mm@xxxxxxxxx>,
linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>,
linux-ext4 <linux-ext4@xxxxxxxxxxxxxxx>,
Mel Gorman <mgorman@xxxxxxx>
Subject: [PATCH 2/4] mm: slub: Do not wake kswapd for SLUBs speculative high-order allocations
Date: Fri, 13 May 2011 15:03:22 +0100
Message-Id: <1305295404-12129-3-git-send-email-mgorman@xxxxxxx>
X-Mailing-List: linux-fsdevel@xxxxxxxxxxxxxxx

To avoid locking and per-cpu overhead, SLUB optimisically uses
high-order allocations and falls back to lower allocations if they
fail. However, by simply trying to allocate, kswapd is woken up to
start reclaiming at that order. On a desktop system, two users report
that the system is getting locked up with kswapd using large amounts
of CPU. Using SLAB instead of SLUB made this problem go away.

This patch prevents kswapd being woken up for high-order allocations.
Testing indicated that with this patch applied, the system was much
harder to hang and even when it did, it eventually recovered.

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
---
mm/slub.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9d2e5e4..98c358d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1170,7 +1170,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
* Let the initial higher-order allocation fail under memory pressure
* so we fall-back to the minimum order allocation.
*/
- alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
+ alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL;

page = alloc_slab_page(alloc_gfp, node, oo);
if (unlikely(!page)) {
--
1.7.3.4