Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) onresume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhapsrelated to preempt_count leakage in keventd)

From: Oleg Nesterov
Date: Wed Nov 11 2009 - 15:32:18 EST


On 11/11, Rafael J. Wysocki wrote:
>
> On Wednesday 11 November 2009, Oleg Nesterov wrote:
> >
> > Rafael, could you reproduce the problem with the debugging patch below?
> > It tries to detect the case when the pending work was corrupted and
> > prints its work->func (saved in the previous item). It should work
> > if the work_struct was freed and poisoned, or if it was re-initialized.
> > See ck_work().
>
> I applied the patch and this is the result of 'dmesg | grep ERR' after 10-or-so
> consecutive suspend-resume and hibernate-resume cycles:
>
> [ 129.008689] ERR!! btusb_waker+0x0/0x27 [btusb]
> [ 166.477373] ERR!! btusb_waker+0x0/0x27 [btusb]
> [ 203.983665] ERR!! btusb_waker+0x0/0x27 [btusb]
> [ 241.636547] ERR!! btusb_waker+0x0/0x27 [btusb]
>
> which kind of confirms my previous observation that the problem was not
> reproducible without Bluetooth.

Great, thanks.

> So, it looks like the bug is in btusb_destruct(), which should call
> cancel_work_sync() on data->waker before freeing 'data'. I guess it should
> do the same for data->work.

Or. btusb_suspend() and btusb_close() do cancel_work_sync(data->work),
perhaps they should cancel data->waker as well, I dunno.

I added Oliver to cc.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/