Re: [Linux v4.2] workqueue: llvmlinux: acpid: BUG: sleeping function called from invalid context at kernel/workqueue.c:2680

From: Jiri Kosina
Date: Fri Sep 25 2015 - 08:00:20 EST


On Thu, 24 Sep 2015, Sedat Dilek wrote:

> >> >> > >> [ 24.705767] [<ffffffff8149287d>] dump_stack+0x7d/0xa0
> >> >> > >> [ 24.705774] [<ffffffff810cbf7a>] ___might_sleep+0x28a/0x2a0
> >> >> > >> [ 24.705779] [<ffffffff810cbc7f>] __might_sleep+0x4f/0xc0
> >> >> > >> [ 24.705784] [<ffffffff810ae8ff>] start_flush_work+0x2f/0x290
> >> >> > >> [ 24.705789] [<ffffffff810ae8ac>] flush_work+0x5c/0x80
> >> >> > >> [ 24.705792] [<ffffffff810ae86a>] ? flush_work+0x1a/0x80
> >> >> > >> [ 24.705799] [<ffffffff810eddcd>] ? trace_hardirqs_off+0xd/0x10
> >> >> > >> [ 24.705804] [<ffffffff810ad938>] ? try_to_grab_pending+0x48/0x360
> >> >> > >> [ 24.705810] [<ffffffff81917e13>] ? _raw_spin_lock_irqsave+0x73/0x80
> >> >> > >> [ 24.705814] [<ffffffff810aecf9>] __cancel_work_timer+0x179/0x260
> >> >>
> >> >> This one is even more strange. It says that flush_work() is being called
> >> >> from __cancel_work_timer() with IRQs disabled, but flags are explicitly
> >> >> restored just one statement before that, and usbhid_close() explicitly
> >> >> calls cancel_work_sync() after unconditionally enabling interrupts.
> >> >>
> >> >> So I am not able to make any sense of either of the traces really.
> >> >>
> >> >> Are you seeing this with the same .config with GCC-compiled kernel as
> >> >> well?
> >> >
> >> > Actually could you please provide disassembly of your
> >> > __cancel_work_timer()?
> >> >
> >>
> >> Disassembly of which file - corresponding workqueue or hid file?
> >
> > make kernel/workqueue.o
> > objdump -Dr kernel/workqueue.o
> >
> > and copy/paste output for __cancel_work_timer function.
> >
>
> It was a bit long, so I attached the full output.

216a: ff 14 25 00 00 00 00 callq *0x0
216d: R_X86_64_32S pv_irq_ops+0x8
2171: 4c 89 ef mov %r13,%rdi
2174: e8 00 00 00 00 callq 2179 <__cancel_work_timer+0x189>
2175: R_X86_64_PC32 flush_work-0x4

So the ordering generated by compiler is correct here.

This however:

[ 24.824639] hardirqs last enabled at (7913): [<ffffffff8192a2b2>] _raw_spin_unlock_irq+0x32/0x60
[ 24.824646] hardirqs last disabled at (7914): [<ffffffff81120e37>] del_timer_sync+0x37/0x110

combined with the stacktrace above, doesn't still make too much sense; IRQ
disable/enable is properly paired in del_timer_sync() and timer base lock
locking.

Could you look into the generated assembly whether the sequence

local_irq_save(flags);
lock_map_acquire(&timer->lockdep_map);
lock_map_release(&timer->lockdep_map);
local_irq_restore(flags);

in del_timer_sync() hasn't been optimized in some damaging way?

--
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/