Re: Kernel OOPS in function_graph_tracer due to the 44259b1. Moreoopses in tracing...

From: Andrew Lutomirski
Date: Mon May 30 2011 - 21:24:18 EST


On Mon, May 30, 2011 at 7:46 PM, Witold Baryluk
<baryluk@xxxxxxxxxxxxxxxx> wrote:
> On 05-30 16:14, Andrew Lutomirski wrote:
>> On Mon, May 30, 2011 at 12:10 PM, Witold Baryluk
>> <baryluk@xxxxxxxxxxxxxxxx> wrote:
>> > Hi,
>> >
>> > I found yesterday a problem when booting system on Pentium-M, 32-bit.
>> >
>> > I got approximetly this
>> >
>> > [    2.459170] Testing tracer function_graph:
>> > [    2.466979] BUG: unable to handle kernel paging request at e421cc10
>>
>> >
>> > Reverting commit 44259b1abfaa8bb819d25d41d71e8e33e25dd36a on top of current
>> > kernel make bug disapear.
>> >
>> > Disabling CONFIG_FUNCTION_GRAPH_TRACER also make bug dispear.
>> >
>>
>> Of course, the most trivial of my patches was the one with the most
>> significant bug.  Can you try this fix:
>>
>> http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=89e1be50c68eb5e58b873dce87bbac627ee18d1f
>>
>> --Andy
>
> Well, to add more.
>
> It fixed most of crashes and definietl one with function graph tracer.
>
> However in 1/10 of boots I still got some kind of crash, oops or panic.
>


>
> ....
> [    0.035682] CPU: Intel Pentium III (Katmai) stepping 03
> [    0.038048] ftrace: allocating 6263 entries in 13 pages
> [    0.050386] BUG: unable to handle kernel paging request at 8a51553a
> [    0.051031] IP: [<c10587cd>] tick_handle_periodic+0x1d/0x90
> [    0.051705] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> ...
> CRASH

This is oops1.txt. The faulting code is:

000003a0 <tick_handle_periodic>:
3a0: 55 push %ebp
3a1: 89 e5 mov %esp,%ebp
3a3: 57 push %edi
3a4: 56 push %esi
3a5: 53 push %ebx
3a6: 83 ec 0c sub $0xc,%esp
3a9: e8 fc ff ff ff call 3aa <tick_handle_periodic+0xa>
3ae: 89 c7 mov %eax,%edi
3b0: e8 fc ff ff ff call 3b1 <tick_handle_periodic+0x11>
3b5: 89 45 f0 mov %eax,-0x10(%ebp)
3b8: e8 63 ff ff ff call 320 <tick_periodic>
3bd: 83 7f 28 03 cmpl $0x3,0x28(%edi)

^^^ fault was in the dereference of edi + 0x28.

3c1: 74 0d je 3d0 <tick_handle_periodic+0x30>
3c3: 83 c4 0c add $0xc,%esp
3c6: 5b pop %ebx
3c7: 5e pop %esi
3c8: 5f pop %edi
3c9: 5d pop %ebp
3ca: c3 ret

The stack trace is garbage, though.

The offending C code is probably this:

if (dev->mode != CLOCK_EVT_MODE_ONESHOT)
return;

I would guess that this isn't related to the vdso changes, and I'm
mostly out of ideas.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/