回复: [PATCH v3] kthread: Work could not be queued when worker being destroyed

From: Zhang, Qiang
Date: Mon Jul 06 2020 - 21:27:34 EST


I'm very sorry that there are some problems with my change.
as follows:

[ 1.203300] loop: module loaded
[ 1.204599] megasas: 07.714.04.00-rc1
[ 1.211124] spi_qup 78b7000.spi: IN:block:16, fifo:64, OUT:block:16, fifo:64
[ 1.211509] ------------[ cut here ]------------
[ 1.217238] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:819
kthread_queue_work+0x90/0xa0
[ 1.221832] Modules linked in:
[ 1.229554] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.8.0-rc3-next-20200706 #1
[ 1.232683] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 1.240237] pstate: 40000085 (nZcv daIf -PAN -UAO BTYPE=--)
[ 1.246918] pc : kthread_queue_work+0x90/0xa0
[ 1.252211] lr : kthread_queue_work+0x2c/0xa0
[ 1.256722] sp : ffff80001002ba50
[ 1.261061] x29: ffff80001002ba50 x28: ffff00003b868000
[ 1.264363] x27: ffff00003fcf63c0 x26: ffff00003b868680
[ 1.269744] x25: ffff00003b868400 x24: ffff00003d116810
[ 1.275039] x23: ffff800012025304 x22: ffff00003b8683bc
[ 1.280335] x21: 0000000000000000 x20: ffff00003b8683f8
[ 1.285630] x19: ffff00003b8683b8 x18: 0000000000000000
[ 1.290925] x17: 0000000000000000 x16: ffff800011167420
[ 1.296220] x15: ffff00000eb90480 x14: 0000000000000267
[ 1.301515] x13: 0000000000000004 x12: 0000000000000000
[ 1.306810] x11: 0000000000000000 x10: 0000000000000003
[ 1.312105] x9 : ffff00003fcbac10 x8 : ffff00003fcba240
[ 1.317400] x7 : ffff00003bc3c800 x6 : 0000000000000003
[ 1.322696] x5 : 0000000000000000 x4 : 0000000000000000
[ 1.327991] x3 : ffff00003b8683bc x2 : 0000000000000001
[ 1.333285] x1 : 0000000000000000 x0 : 0000000000000000
[ 1.338583] Call trace:
[ 1.343875] kthread_queue_work+0x90/0xa0
[ 1.346050] spi_start_queue+0x50/0x78
[ 1.350213] spi_register_controller+0x458/0x820
[ 1.353860] devm_spi_register_controller+0x44/0xa0
[ 1.358638] spi_qup_probe+0x5d8/0x638
[ 1.363235] platform_drv_probe+0x54/0xa8
[ 1.367053] really_probe+0xd8/0x320
[ 1.371133] driver_probe_device+0x58/0xb8
[ 1.374779] device_driver_attach+0x74/0x80
[ 1.378685] __driver_attach+0x58/0xe0
[ 1.382766] bus_for_each_dev+0x70/0xc0
[ 1.386583] driver_attach+0x24/0x30
[ 1.390317] bus_add_driver+0x14c/0x1f0
[ 1.394137] driver_register+0x64/0x120
[ 1.397696] __platform_driver_register+0x48/0x58
[ 1.401519] spi_qup_driver_init+0x1c/0x28
[ 1.406378] do_one_initcall+0x54/0x1a0
[ 1.410372] kernel_init_freeable+0x1d4/0x254
[ 1.414106] kernel_init+0x14/0x110
[ 1.418616] ret_from_fork+0x10/0x34
[ 1.421918] ---[ end trace 4b59f327623c9e10 ]---
[ 1.426526] spi_qup 78b9000.spi: IN:block:16, fifo:64, OUT:block:16, fifo:64
[ 1.430721] ------------[ cut here ]------------
[ 1.437374] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:819

when in "spi_init_queue" func :
"kthread_init_worker(&ctlr->kworker); (worker->task = NULL)
ctlr->kworker_task = kthread_run(kthread_worker_fn, &ctlr->kworker,
"%s", dev_name(&ctlr->dev)); "

in "spi_start_queue" func:
"kthread_queue_work(&ctlr->kworker, &ctlr->pump_messages);"

Becasue the kthread_worker_fn is not begin running, if queue work to worker,
the "!worker->task" = true, trigger WARN.

________________________________________
发件人: Tejun Heo <htejun@xxxxxxxxx> 代表 Tejun Heo <tj@xxxxxxxxxx>
发送时间: 2020年7月6日 22:59
收件人: Zhang, Qiang
抄送: ben.dooks@xxxxxxxxxxxxxxx; bfields@xxxxxxxxxx; cl@xxxxxxxxxxxxxx; peterz@xxxxxxxxxxxxx; pmladek@xxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; mm-commits@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
主题: Re: [PATCH v3] kthread: Work could not be queued when worker being destroyed

On Sun, Jul 05, 2020 at 09:30:18AM +0800, qiang.zhang@xxxxxxxxxxxxx wrote:
> From: Zhang Qiang <qiang.zhang@xxxxxxxxxxxxx>
>
> Before the work is put into the queue of the worker thread,
> the state of the worker thread needs to be detected,because
> the worker thread may be in the destruction state at this time.
>
> Signed-off-by: Zhang Qiang <qiang.zhang@xxxxxxxxxxxxx>
> Suggested-by: Petr Mladek <pmladek@xxxxxxxx>
> Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>

Andrew already brought this up but can you please provide some context on
why you're making this change?

Thanks.

--
tejun