Re: [Workqueue] crash in process_one_work

From: Arun KS
Date: Wed Oct 08 2014 - 08:00:45 EST


Hello Tejun,

On Mon, Oct 6, 2014 at 9:02 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello, Arun.
>
> On Mon, Sep 29, 2014 at 09:40:50PM +0530, Arun KS wrote:
> ...
>> The value of data is 0xffffffe0, which is basically the value after an
>> INIT_WORK() or WORK_DATA_INIT().
>> This can happen if a driver calls INIT_WORK on same struct work again
>> after queuing it.
>>
>> From the above details of the work_struct shows that the work is
>> queued from kernel/async.c. async_schedule dynamically allocates the
>> work_struct and queues it to system_unbonded_wq. And possibility of
>> calling INIT_WORK on same work is not there.
>>
>> After inspecting ramdump for async_entry structure in kernel/async.c
>>
>> crash> struct async_entry ed7cf140
>> struct async_entry {
>> domain_list = {
>> next = 0xed7cf140,
>> prev = 0xed7cf140
>> },
>> global_list = {
>> next = 0xed7cf148,
>> prev = 0xed7cf148
>> },
>> work = {
>> data = {
>> counter = 0xffffffe0
>> },
>> entry = {
>> next = 0xed7cf154,
>> prev = 0xed7cf154
>> },
>> func = 0xc0140ac4 <async_run_entry_fn>
>> },
>> cookie = 0x263e5,
>> func = 0xc074dda0 <dapm_post_sequence_async>,
>> data = 0xed48432c,
>> domain = 0xe5457dec
>> }
>>
>> the func points to dapm_post_sequence_async. and you can see the
>> domain_list and global_list is empty. Which shows that the work has
>> finished execution and there is no pending execution in async.
>>
>> But how come this struct work was with work queue data structures?
>> Is there any corner case in work queue which can miss unlinking the
>> struct_work from pool_workqueue after executing them?
>
> I sure hope not. How reproducible is the issue? Can you try w/
> CONFIG_DEBUG_OBJECTS_WORK enabled?

Thanks for replying.
That was a problem with one of our driver. It was freeing the
memory(struct work) without flushing workqueue.
We caught faulty driver by adding a BUG_ON() in INIT_WORK and looking
at the func pointer in work_struct( which will be pointing to the
faulty driver work function)

1) faulty driver queue_work to system_unbownded_wq
2) free work_struct memory, but it is still queued in the work queue.
3) another driver request the memory from SLAB, go the same memory, it INIT_WORK
4) process work try to execute the work queued by the faulty driver,
result in a crash.


Thanks,
Arun

>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/