Re: Imon module Oops and kernel hang

From: Jarod Wilson
Date: Mon Jul 18 2011 - 11:04:57 EST


On Jul 17, 2011, at 9:36 AM, Andy Walls wrote:

> On Sun, 2011-07-17 at 11:38 +1000, Chris W wrote:
>> On 17/07/11 10:43, Andy Walls wrote:
>>> This is an obviously repeatable NULL pointer dereference in
>>> rc_g_keycode_from_table(). The faulting line of code in both cases
>>> disasembles to:
>>>
>>> 1e: 8b 80 dc 00 00 00 mov 0xdc(%eax),%eax
>>>
>>> %eax obviously holds the value 0 (NULL). But I'm having a hard time
>>> telling to where exactly that line of assembly corresponds in
>>> rc_g_keycode_from_table(). And I can't tell from the source which data
>>> structure has something at offset 0xdc that gets derefernced early:
>>> struct rc_dev or struct rc_map.
>>>
>>> Could you provide the output of
>>>
>>> $ locate rc-core.ko
>>> $ objdump -d -l /blah/blah/drivers/media/rc/rc-core.ko
>>>
>>> for the rc_g_keycode_from_table() function?
>>
>>
>> I have a few copies lying about now.
>
>> This is from my current running kernel
>> /lib/modules/2.6.38-gentoo-r6/kernel/drivers/media/rc/rc-core.ko
>>
>> and the partial objdump and corresponding oops/crash output:
>
> Thanks.
>
>
>> 00000450 <rc_g_keycode_from_table>:
>> rc_g_keycode_from_table():
>> 450: 55 push %ebp
>> 451: 89 e5 mov %esp,%ebp
>> 453: 57 push %edi
>> 454: 56 push %esi
>> 455: 53 push %ebx
>> 456: 83 ec 24 sub $0x24,%esp
>
> Store the (struct rc_dev *) "dev" input argument on the stack
>> 459: 89 45 e8 mov %eax,-0x18(%ebp)
>
> local_irq_save(flags):
>> 45c: 9c pushf
>> 45d: 8f 45 ec popl -0x14(%ebp)
>> 460: fa cli
>
> preempt_disable():
>> 461: 89 e0 mov %esp,%eax
>> 463: 25 00 e0 ff ff and $0xffffe000,%eax
>> 468: ff 40 14 incl 0x14(%eax)
>
> Move (struct rc_dev *) dev into %eax
>> 46b: 8b 45 e8 mov -0x18(%ebp),%eax
>
> &dev->rc_map->lock
>> 46e: 8b 80 d4 00 00 00 mov 0xd4(%eax),%eax
>
> But that's where the Oops always happens, so the "dev" input argument to
> the function is NULL.
>
> Someone familiar with the driver/media/rc/imon.c file needs to figure
> out how rc_g_keycode_from_table() can be called with a NULL first
> argument: ictx->rdev is NULL when
> rc_g_keycode_from_table(ictx->rdev,...) is called.
>
> There might be some race at driver initialization, given that at first
> look ictx-rdev being NULL seems impossible. Your stack backtraces
> always indicate some sort of IRQ context, so maybe that matters.

Thanks much for the analysis, Andy. I see the problem now. The intf0 urb
callback is wired up before the rc_dev is, and the callback is what makes
the rc_g_keycode_from_table call. And as far as I know, all 0xffdc devices
have the nasty trait of constantly triggering interrupts, even when there's
no valid keydata. (SoundGraph fixed that in later devices). The code has
been like this for some time, and I do have 0xffdc devices, none of which
have hit this somehow, but the fix looks simple and obvious enough. I'll
try to get something sent along later today.

--
Jarod Wilson
jarod@xxxxxxxxxxxx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/