Re: workqueue panic in 3.4 kernel

From: Lei Wen
Date: Thu Mar 07 2013 - 10:22:46 EST


Tejun,

On Thu, Mar 7, 2013 at 9:15 AM, Lei Wen <adrian.wenl@xxxxxxxxx> wrote:
> Hi Tejun,
>
> On Thu, Mar 7, 2013 at 3:14 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
>> Hello, Lei.
>>
>> On Wed, Mar 06, 2013 at 10:39:15PM +0800, Lei Wen wrote:
>>> We find a race condition as below:
>>> CPU0 CPU1
>>> timer interrupt happen
>>> __run_timers
>>> __run_timers::spin_lock_irq(&base->lock)
>>> __run_timers::spin_unlock_irq(&base->lock)
>>>
>>> __cancel_work_timer
>>>
>>> __cancel_work_timer::del_timer
>>>
>>> __cancel_work_timer::wait_on_work
>>>
>>> __cancel_work_timer::clear_work_data
>>> __run_timers::call_timer_fn(timer, fn, data);
>>> delayed_work_timer_fn::get_work_cwq
>>> __run_timers::spin_lock_irq(&base->lock)
>>>
>>> It is possible for __cancel_work_timer to be run over cpu1 __BEFORE__
>>> cpu0 is ready to
>>> run the timer callback, which is delayed_work_timer_fn in our case.
>>
>> If del_timer() happens after the timer starts running, del_timer()
>> would return NULL and try_to_grab_pending() will be called which will
>> return >=0 iff if successfully steals the PENDING bit (ie. it's the
>> sole owner of the work item). If del_timer() happens before the timer
>> starts running, the timer function would never run.
>
> If del_timer() happen before __run_timers() is called, while timer irq
> already happen,
> would it return 1 for the timer is still not detached in __run_timers()?
> If it is possible, then we would call try_to_grab_pending(), so that
> work->data would
> be cleared in this way.
>
>>
>> clear_work_data() happens iff the work item is confirmed to be idle.
>> At this point, I'm pretty skeptical this is a bug in workqueue itself
>> and strongly suggest looking at the crashing workqueue user.
>
> Also I am not very familiar with workqueue mechanism, how many place
> in kernel would
> clear the work->data beside the clear_work_data()?
>
> From the memory, I cannot find any hint for work structure being destroyed.
> So the only possible seems to me is the work->data be set by someone on purpose.
>
> crash> struct delayed_work 0xbf03d544 -x
> struct delayed_work {
> work = {
> data = {
> counter = 0x300
> },
> entry = {
> next = 0xbf03d548,
> prev = 0xbf03d548
> },
> func = 0xbf014b00
> },
> timer = {
> entry = {
> next = 0x0,
> prev = 0x200200
> },
> expires = 0x12b638b,
> base = 0xc0844e01,
> function = 0xc014c7a0 <delayed_work_timer_fn>,
> data = 0xbf03d544,
> slack = 0xffffffff,
> start_pid = 0xffffffff,
> start_site = 0x0,
> start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> }
> }

I captured a trace log, which shows my previous suspicion is true:
__cancel_work_timer is called before delayed_work_timer_fn, so that
work->data is cleared.

And when __cancel_work_timer is called, the timer is still pending,
so del_timer would return 1, thus no try_to_grab_pending would be called.

But it is very strange that in __run_timers, it still get the same timer.
Then its callback, delayed_work_timer_fn, would be called, which cause
the issue.

The detach_timer in __cancel_work_timer should already move the timer
from all list, am I right?
Could it happen for the timer_list be queued twice, like queue over two cpu?
If not, how could it happen?

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/