Re: ata: BUG in ata_sff_hsm_move

From: Dmitry Vyukov
Date: Fri Jan 29 2016 - 08:41:03 EST


On Fri, Jan 29, 2016 at 2:18 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Fri, Jan 29, 2016 at 1:23 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
>> Hello, Dmitry.
>>
>> On Fri, Jan 29, 2016 at 12:59:49PM +0100, Dmitry Vyukov wrote:
>>> > Hmmm... the port interrupt handler checks for IDLE before calling into
>>> > hsm_move, so the only explanation would be that something is resetting
>>> > it to IDLE inbetween. ce7514526742 ("libata: prevent HSM state change
>>> > race between ISR and PIO") describes and fixes the same problem. The
>>> > fix seems correct and I can't find anywhere else where this can
>>> > happen. :(
>>> >
>>> > Can you please post the kernel log leading to the BUG? Also, I don't
>>> > think that condition needs to be BUG. I'll change it to WARN.
>>>
>>> Here are two logs, in both cases no kernel messages prior to the bug:
>>> https://gist.githubusercontent.com/dvyukov/5087d633e3620280b6c7/raw/31c9ab1ced92ac5f85cfb15eaf48ec5793c2c3a1/gistfile1.txt
>>> https://gist.githubusercontent.com/dvyukov/825b2e3d5fb80ae08a9a/raw/03c5a4f4c4bd9d0a304a71cda2da4c92f4b7f1ba/gistfile1.txt
>>
>> lol, this is kinda embarrassing. It looks like the poll path wasn't
>> doing any locking. Can you please verify the following patch at least
>> doesn't crash the machine immediately and if so keep it applied to the
>> test kernel so that we can verify that the problem actually goes away?
>
>
> Great that you managed to debug it without a repro!
> I've applied this patch to my tree and will rerun fuzzer. I will
> notify you if I see this warning again.
> Thanks

It now deadlocks at this stack. It is probably not OK to call
ata_sff_hsm_move holding the spinlock.

[ 8.168809] NMI backtrace for cpu 2
[ 8.168809] CPU: 2 PID: 17 Comm: kworker/2:0 Not tainted 4.5.0-rc1+ #302
[ 8.168809] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 8.168809] Workqueue: ata_sff ata_sff_pio_task
[ 8.168809] task: ffff88006d24c740 ti: ffff88006d090000 task.ti:
ffff88006d090000
[ 8.168809] RIP: 0010:[<ffffffff82c0e388>] [<ffffffff82c0e388>]
delay_tsc+0x18/0x70
[ 8.168809] RSP: 0000:ffff88006d097a60 EFLAGS: 00000086
[ 8.168809] RAX: 000000008edaeede RBX: ffff880036e88028 RCX: 000000000000001e
[ 8.168809] RDX: 0000000000000033 RSI: 000000338edaee9c RDI: 0000000000000001
[ 8.168809] RBP: ffff88006d097a60 R08: 0000000000000002 R09: 0000000000000000
[ 8.168809] R10: fffffbfff11baf82 R11: 1ffffffff134edc5 R12: ffff880036e88038
[ 8.168809] R13: 000000009a9d2d40 R14: ffff880036e88030 R15: 0000000099383d98
[ 8.168809] FS: 0000000000000000(0000) GS:ffff88006d600000(0000)
knlGS:0000000000000000
[ 8.168809] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 8.168809] CR2: 00000000ffffffff CR3: 0000000007a4b000 CR4: 00000000000006e0
[ 8.168809] Stack:
[ 8.168809] ffff88006d097a70 ffffffff82c0e2da ffff88006d097aa8
ffffffff814661a9
[ 8.168809] ffff880036e88028 0000000000000082 ffff880036e90008
ffff880036e90010
[ 8.168809] 0000000000000001 ffff88006d097ad0 ffffffff86652c47
ffffffff83a80c4c
[ 8.168809] Call Trace:
[ 8.168809] [<ffffffff82c0e2da>] __delay+0xa/0x10
[ 8.168809] [<ffffffff814661a9>] do_raw_spin_lock+0x149/0x2b0
[ 8.168809] [<ffffffff86652c47>] _raw_spin_lock_irqsave+0xa7/0xd0
[ 8.168809] [<ffffffff83a80c4c>] ? ata_hsm_qc_complete+0x12c/0x420
[ 8.168809] [<ffffffff83a80c4c>] ata_hsm_qc_complete+0x12c/0x420
[ 8.168809] [<ffffffff83a82f42>] ata_sff_hsm_move+0x1e2/0x1c60
[ 8.168809] [<ffffffff83a85f17>] ? ata_sff_pio_task+0x87/0x530
[ 8.168809] [<ffffffff83a86296>] ata_sff_pio_task+0x406/0x530
[ 8.168809] [<ffffffff813a2386>] process_one_work+0x796/0x1440
[ 8.168809] [<ffffffff813a22ba>] ? process_one_work+0x6ca/0x1440
[ 8.168809] [<ffffffff813d6190>] ? finish_task_switch+0x120/0x5f0
[ 8.168809] [<ffffffff813a1bf0>] ? pwq_dec_nr_in_flight+0x2e0/0x2e0
[ 8.168809] [<ffffffff813a310b>] worker_thread+0xdb/0xfc0
[ 8.168809] [<ffffffff813b637f>] kthread+0x23f/0x2d0
[ 8.168809] [<ffffffff813a3030>] ? process_one_work+0x1440/0x1440
[ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0
[ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0
[ 8.168809] [<ffffffff8665362f>] ret_from_fork+0x3f/0x70
[ 8.168809] [<ffffffff813b6140>] ? kthread_create_on_node+0x3b0/0x3b0




>> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
>> index 608677d..6991efc 100644
>> --- a/drivers/ata/libata-sff.c
>> +++ b/drivers/ata/libata-sff.c
>> @@ -1362,12 +1362,14 @@ static void ata_sff_pio_task(struct work_struct *work)
>> u8 status;
>> int poll_next;
>>
>> + spin_lock_irq(ap->lock);
>> +
>> BUG_ON(ap->sff_pio_task_link == NULL);
>> /* qc can be NULL if timeout occurred */
>> qc = ata_qc_from_tag(ap, link->active_tag);
>> if (!qc) {
>> ap->sff_pio_task_link = NULL;
>> - return;
>> + goto out_unlock;
>> }
>>
>> fsm_start:
>> @@ -1382,11 +1384,14 @@ static void ata_sff_pio_task(struct work_struct *work)
>> */
>> status = ata_sff_busy_wait(ap, ATA_BUSY, 5);
>> if (status & ATA_BUSY) {
>> + spin_unlock_irq(ap->lock);
>> ata_msleep(ap, 2);
>> + spin_lock_irq(ap->lock);
>> +
>> status = ata_sff_busy_wait(ap, ATA_BUSY, 10);
>> if (status & ATA_BUSY) {
>> ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE);
>> - return;
>> + goto out_unlock;
>> }
>> }
>>
>> @@ -1403,6 +1408,8 @@ static void ata_sff_pio_task(struct work_struct *work)
>> */
>> if (poll_next)
>> goto fsm_start;
>> +out_unlock:
>> + spin_unlock_irq(ap->lock);
>> }
>>
>> /**
>>
>> --
>> You received this message because you are subscribed to the Google Groups "syzkaller" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@xxxxxxxxxxxxxxxxx
>> For more options, visit https://groups.google.com/d/optout.