Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

From: Linus Torvalds
Date: Fri Nov 10 2017 - 15:08:11 EST


On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
>
> Yes it's accessing the list. Here is the faddr2line output.

Ok, so it's a corrupted timer list. Which is not a big surprise.

It's

next->pprev = pprev;

in __hlist_del(), and the trapping instruction decodes as

mov %rdx,0x8(%rax)

with %rax having the value dead000000000200,

Which is just LIST_POISON2.

So we've deleted that entry twice - LIST_POISON2 is what hlist_del()
sets pprev to after already deleting it once.

Although in this case it might not be hlist_del(), because
detach_timer() also sets entry->next to LIST_POISON2.

Which is pretty bogus, we are supposed to use LIST_POISON1 for the
"next" pointer. Oh well. Nobody cares, except for the list entry
debugging code, which isn't run on the hlist cases.

Adding Thomas Gleixner to the cc. It should not be possible to delete
the same timer twice.

Linus