Re: [PATCH] workqueue: give a protection when get_work_pwq return NULL.

From: Tejun Heo
Date: Fri Sep 19 2014 - 09:24:42 EST


Hello,

On Fri, Sep 19, 2014 at 12:12:04PM +0800, jun.zhang@xxxxxxxxx wrote:
> From: zhang jun <jun.zhang@xxxxxxxxx>
>
> if pwq==NULL, system could panic. next is the panic log.
> [12973.660792] BUG: unable to handle kernel NULL pointer dereference at 00000004
> [12973.668787] IP: [<c125b8ab>] process_one_work+0x2b/0x3e0

Well, it shouldn't be NULL for a pending work_struct.

> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1947,9 +1947,19 @@ __acquires(&pool->lock)
> {
> struct pool_workqueue *pwq = get_work_pwq(work);
> struct worker_pool *pool = worker->pool;
> - bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
> int work_color;
> struct worker *collision;
> +
> + if (pwq == NULL) {
> + pr_err("BUG: pwq is NULL. data: 0x%08lx @ work: 0x%p\n",
> + atomic_long_read(&work->data), work);
> + WARN_ON(1);
> + move_linked_works(work, &worker->scheduled, NULL);
> + return;
> + }
> +
> + bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;

Jesus, please don't do this. The problem must be root-caused before
adding random workaround code. This is most likely the queued
work_struct being corrupted somehow (e.g. being prematurely freed or
whatnot). It makes zero sense to add random check in workqueue code
for that.

Nacked-by: Tejun Heo <tj@xxxxxxxxxx>

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/