Re: Soft lockups in __cancel_work_timer()

From: Tejun Heo
Date: Fri Feb 06 2015 - 13:01:45 EST


Hello,

On Fri, Feb 06, 2015 at 06:11:56PM +0100, Rabin Vincent wrote:
> task1 task2 worker
>
> add to busy hash
> clear pending
> exec work
>
> __cancel_work_timer()
> try_to_grab_pending()
> get pending, return 0
> set work cancelling
> flush_work()
> wait_for_completion()
>
> remove from busy_hash
>
> __cancel_work_timer()
> while (forever) {
> try_to_grab_pending()
> return -ENOENT due to cancelling
> flush_work
> return false due to already gone
> }
>
>
> On kernels with CONFIG_PREEMPT disabled, this causes a permanent soft
> lockup of the system.

Ah, drat.

> Adding a cond_resched() to the loop in cancel_delayed_work_sync(), like
> below, helps the simple !PREEMPT case, but does not completely fix the
> problem, because the problem can still be seen if the thread which sees
> the ENOENT has for example SCHED_FIFO scheduling class, both on systems
> with CONFIG_PREEMPT enabled and on !PREEMPT with the cond_resched().
>
> We've seen this problem with the cancel_delayed_work() call in
> jffs2_sync_fs(), but I've attached a testing patch which forces the
> problem on current mainline without the need for jffs2.
...
> @@ -2741,6 +2741,9 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork)
> */
> if (unlikely(ret == -ENOENT))
> flush_work(work);
> +
> + if (ret < 0)
> + cond_resched();
> } while (unlikely(ret < 0));

Well, an obvious thing to do would be

if (unlikely(ret == -ENOENT)) {
if (!flush_work(work))
schedule_timeout(1);
}

It was gonna block for the work item anyway so I don't think
schedule_timeout() there is a problem. That said, given that we're
guaranteed to be able to dereference @work there, there probably is a
better way. I'll think more about it.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/