Re: Crashes with 874bbfe600a6 in 3.18.25

From: Jiri Slaby
Date: Wed Feb 03 2016 - 04:35:52 EST


On 01/26/2016, 02:09 PM, Thomas Gleixner wrote:
> On Tue, 26 Jan 2016, Petr Mladek wrote:
>> On Tue 2016-01-26 10:34:00, Jan Kara wrote:
>>> On Sat 23-01-16 17:11:54, Thomas Gleixner wrote:
>>>> On Sat, 23 Jan 2016, Ben Hutchings wrote:
>>>>> On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote:
>>>>>>> Looks like it requires more than trivial backport (I think). Tejun?
>>>>>>
>>>>>> The timer migration has changed quite a bit. Given that we've never
>>>>>> seen vmstat work crashing in 3.18 era, I wonder whether the right
>>>>>> thing to do here is reverting 874bbfe600a6 from 3.18 stable?
>>>>>
>>>>> It's not just 3.18 that has this; 874bbfe600a6 was backported to all
>>>>> stable branches from 3.10 onward. Only the 4.2-ckt branch has
>>>>> 22b886dd10180939.
>>>>
>>>> 22b886dd10180939 fixes a bug which was introduced with the timer wheel
>>>> overhaul in 4.2. So only 4.2/3 should have it backported.
>>>
>>> Thanks for explanation. So do I understand right that timers are always run
>>> on the calling CPU in kernels prior to 4.2 and thus commit 874bbfe600a6 (to
>>> run timer for delayed work on the calling CPU) doesn't make sense there? If
>>> that is true than reverting the commit from older stable kernels is
>>> probably the easiest way to resolve the crashes.
>>
>> The commit 874bbfe600a6 ("workqueue: make sure delayed work run in
>> local cpu") forces the timer to run on the local CPU. It might be correct
>> for vmstat. But I wonder if it might break some other delayed work
>> user that depends on running on different CPU.
>
> The default of add_timer() is to run on the current cpu. It only moves the
> timer to a different cpu when the power saving code says so. So 874bbfe600a6
> enforces that the timer runs on the cpu on which queue_delayed_work() is
> called, but before that commit it was likely that the timer was queued on the
> calling cpu. So there is nothing which can depend on running on a different
> CPU, except callers of queue_delayed_work_on() which provide the target cpu
> explicitely. 874bbfe600a6 does not affect those callers at all.
>
> Now, what's different is:
>
> + if (cpu == WORK_CPU_UNBOUND)
> + cpu = raw_smp_processor_id();
> dwork->cpu = cpu;
>
> So before that change dwork->cpu was set to WORK_CPU_UNBOUND. Now it's set to
> the current cpu, but I can't see how that matters.

What happens in later kernels, when the cpu is offlined before the
delayed_work timer ticks? In stable 3.12, with the patch, this scenario
results in an oops:
#5 [ffff8c03fdd63d80] page_fault at ffffffff81523a88
[exception RIP: __queue_work+121]
RIP: ffffffff81071989 RSP: ffff8c03fdd63e30 RFLAGS: 00010086
RAX: ffff88048b96bc00 RBX: ffff8c03e9bcc800 RCX: ffff880473820478
RDX: 0000000000000400 RSI: 0000000000000004 RDI: ffff880473820458
RBP: 0000000000000000 R8: ffff8c03fdd71f40 R9: ffff8c03ea4c4002
R10: 0000000000000000 R11: 0000000000000005 R12: ffff880473820458
R13: 00000000000000a8 R14: 000000000000e328 R15: 00000000000000a8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff8c03fdd63e68] call_timer_fn at ffffffff81065611
#7 [ffff8c03fdd63e98] run_timer_softirq at ffffffff810663b7
#8 [ffff8c03fdd63f00] __do_softirq at ffffffff8105e2c5
#9 [ffff8c03fdd63f68] call_softirq at ffffffff8152cf9c
#10 [ffff8c03fdd63f80] do_softirq at ffffffff81004665
#11 [ffff8c03fdd63fa0] smp_apic_timer_interrupt at ffffffff8152d835
#12 [ffff8c03fdd63fb0] apic_timer_interrupt at ffffffff8152c2dd

The CPU was 168, and that one was offlined in the meantime. So
__queue_work fails at:
if (!(wq->flags & WQ_UNBOUND))
pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
else
pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
^^^ ^^^^ NODE is -1
\ pwq is NULL

if (last_pool && last_pool != pwq->pool) { <--- BOOM

Any ideas?

thanks,
--
js
suse labs