Re: Linux 3.18 released

From: Andy Lutomirski
Date: Wed Dec 10 2014 - 19:39:08 EST


On 12/08/2014 10:39 AM, Vince Weaver wrote:
> On Sun, 7 Dec 2014, Linus Torvalds wrote:
>
>> I'd love to say that we've figured out the problem that plagues 3.17
>> for a couple of people, but we haven't. At the same time, there's
>> absolutely no point in having everybody else twiddling their thumbs
>> when a couple of people are actively trying to bisect an older issue,
>> so holding up the release just didn't make sense. Especially since
>> that would just have then held things up entirely over the holiday
>> break.
>>
>> So the merge window for 3.19 is open, and DaveJ will hopefully get his
>> bisection done (or at least narrow things down sufficiently that we
>> have that "Ahaa" moment) over the next week. But in solidarity with
>> Dave (and to make my life easier too ;) let's try to avoid introducing
>> any _new_ nasty issues, ok?
>
> It's probably unrelated to DaveJ's issue, but my perf_event fuzzer still
> quickly locks the kernel pretty solid on 3.18.
>
> Just 5 minutes of testing managed to trip over the following issue that
> dates back to at least 3.15-rc7

Out of curiosity, can you see if this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/paranoid-and-more&id=38e49874d0ab18276f753f5784420b091f4be6eb

makes the problem much worse? (Don't take the whole series there --
just cherry-pick the one patch.)

--Andy

>
> My notes say last time I tracked down the issue as so:
>
> What happens is in kernel/core/events.c find_get_context()
> somehow perf_lock_task_context() returns NULL
> due to !atomic_inc_not_zero(&ctx->refcount)
> but task->perf_event_ctxp[ctxn] still has a valid value.
>
> There are multiple perf related issues like this that are hard to track
> down. They are borderline heisenbugs that are possibly race conditions,
> so bisecting doesn't work and even things like enablibg ftrace will make
> the issue go away (or crash ftrace itself).
>
> This particular manifestation of the bug (or bugs) wedges things but I can
> use alt-sysrq from the serial console to see where it is stuck (see
> below; the CPU is stuck in a loop).
>
>
> [ 2225.916004] [<ffffffff810e61e9>] ? get_page_from_freelist+0x55/0x781
> [ 2225.916004] [<ffffffff810e6a7c>] __alloc_pages_nodemask+0x167/0x6dc
> [ 2225.916004] [<ffffffff8101a4a3>] ? intel_pmu_enable_all+0x28/0xa4
> [ 2225.916004] [<ffffffff8111f0b3>] kmem_getpages+0x58/0xec
> [ 2225.916004] [<ffffffff81120278>] cache_grow+0xad/0x1d8
> [ 2225.916004] [<ffffffff81120021>] ____cache_alloc+0x237/0x2ce
> [ 2225.916004] [<ffffffff811216b9>] __kmalloc+0x8f/0xf2
> [ 2225.916004] [<ffffffff810dc35d>] ? T.1336+0xe/0x10
> [ 2225.916004] [<ffffffff810dc35d>] T.1336+0xe/0x10
> [ 2225.916004] [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
> [ 2225.916004] [<ffffffff810dca33>] find_get_context+0x138/0x1c7
> [ 2225.916004] [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2225.916004] [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2225.916004] [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
>
> [ 2256.708004] [<ffffffff810d7e36>] ? put_ctx+0x40/0x61
> [ 2256.708004] [<ffffffff810dcaa4>] find_get_context+0x1a9/0x1c7
> [ 2256.708004] [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2256.708004] [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2256.708004] [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
>
> [ 2303.796003] [<ffffffff810fa6cb>] ? kmalloc_slab+0x7f/0x8d
> [ 2303.796003] [<ffffffff81121653>] __kmalloc+0x29/0xf2
> [ 2303.796003] [<ffffffff810dc35d>] ? T.1336+0xe/0x10
> [ 2303.796003] [<ffffffff810dc35d>] T.1336+0xe/0x10
> [ 2303.796003] [<ffffffff810dc8ca>] alloc_perf_context+0x20/0x51
> [ 2303.796003] [<ffffffff810dca33>] find_get_context+0x138/0x1c7
> [ 2303.796003] [<ffffffff810dd029>] SYSC_perf_event_open+0x48b/0x870
> [ 2303.796003] [<ffffffff810dd41c>] SyS_perf_event_open+0xe/0x10
> [ 2303.796003] [<ffffffff81560016>] system_call_fastpath+0x16/0x1b
>
> Vince
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/