[PATCH] kmemleak: Fix scheduling-while-atomic bug

From: Ingo Molnar
Date: Wed Jul 01 2009 - 03:54:02 EST



* Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx> wrote:

> Gitweb: http://git.kernel.org/linus/acf4968ec9dea49387ca8b3d36dfaa0850bdb2d5
> Commit: acf4968ec9dea49387ca8b3d36dfaa0850bdb2d5
> Parent: 4698c1f2bbe44ce852ef1a6716973c1f5401a4c4
> Author: Catalin Marinas <catalin.marinas@xxxxxxx>
> AuthorDate: Fri Jun 26 17:38:29 2009 +0100
> Committer: Catalin Marinas <catalin.marinas@xxxxxxx>
> CommitDate: Fri Jun 26 17:38:29 2009 +0100
>
> kmemleak: Slightly change the policy on newly allocated objects

I think one of the kmemleak fixes that went upstream yesterday
caused the following scheduling-while-holding-the-tasklist-lock
regression/crash on x86:

BUG: sleeping function called from invalid context at mm/kmemleak.c:795
in_atomic(): 1, irqs_disabled(): 0, pid: 1737, name: kmemleak
2 locks held by kmemleak/1737:
#0: (scan_mutex){......}, at: [<c10c4376>] kmemleak_scan_thread+0x45/0x86
#1: (tasklist_lock){......}, at: [<c10c3bb4>] kmemleak_scan+0x1a9/0x39c
Pid: 1737, comm: kmemleak Not tainted 2.6.31-rc1-tip #59266
Call Trace:
[<c105ac0f>] ? __debug_show_held_locks+0x1e/0x20
[<c102e490>] __might_sleep+0x10a/0x111
[<c10c38d5>] scan_yield+0x17/0x3b
[<c10c3970>] scan_block+0x39/0xd4
[<c10c3bc6>] kmemleak_scan+0x1bb/0x39c
[<c10c4331>] ? kmemleak_scan_thread+0x0/0x86
[<c10c437b>] kmemleak_scan_thread+0x4a/0x86
[<c104d73e>] kthread+0x6e/0x73
[<c104d6d0>] ? kthread+0x0/0x73
[<c100959f>] kernel_thread_helper+0x7/0x10
kmemleak: 834 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

The bit causing it is highly dubious:

static void scan_yield(void)
{
might_sleep();

if (time_is_before_eq_jiffies(next_scan_yield)) {
schedule();
next_scan_yield = jiffies + jiffies_scan_yield;
}
}

It is called deep inside the codepath and in a conditional way, and
that is what crapped up when one of the new scan_block() uses grew a
tasklist_lock dependency. Also, we dont need another 'yield'
primitive in the MM code, we have priorities and other scheduling
mechanisms to throttle background scanning just fine.

The minimal fix below removes scan_yield() and adds a cond_resched()
to the outmost (safe) place of the scanning thread. This solves the
regression.

Ingo

----------------->