Re: Kernel oops with bluetooth usb dongle

From: Thomas Gleixner
Date: Sat Feb 16 2008 - 19:00:08 EST


On Sat, 16 Feb 2008, Quel Qun wrote:
> > Please also enable CONFIG_DEBUG_LIST=y, which should catch the place
> > where a list corruption happens.
>
> Thank you for the hand holding. I must admit I do not anything
> about kernel debugging.
>
> With or without nohz=off, the crashes are very similar. It looks
> like it fails to execute list_add_tail(&timer->entry, vec), line 294
> of kernel/timer.c.

Yup, that's what I suspected. The list is corrupted:

> list_add corruption. prev->next should be next (c047e704), but was 00000000. (prev=ddcca118).

> list_add corruption. prev->next should be next (c047e77c), but was 00000001. (prev=f644bd98).

Unfortunately we only see that the list is corrupted but not which
code caused it. This looks like something forgot to delete the timer
before freeing the datastructure which contains it.

Can you please enable CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y
and give it another try?

If we can not catch it that way, I'll whip up a patch which points us
to the code which added the offending timer.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/