Re: skd: disable discard support

From: Mike Snitzer
Date: Wed Feb 12 2014 - 19:08:02 EST


On Wed, Feb 12 2014 at 5:18pm -0500,
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:

> The skd driver has never handled discards reliably.
>
> The kernel will BUG as a result of issuing discards to the skd device.
> Disable the skd driver's discard support until it is proven reliable.
>
> The device-mapper-test-suite test that exposed this bug just issues a
> discard that covers a portion of the skd device that was previously
> written through a dm-thin device. The discard spans the entire 1GB thin
> device (logical sector 0 through 2097152).
>
> dmtest run --profile stec --suite thin-provisioning -n /discard_fully_provisioned_device/

I retested after applying these linux-block.git commits ontop of
3.14-rc1:

5cb8850c9c4a block: Explicitly handle discard/write same segments
8423ae3d7a3c block: Fix cloning of discard/write same bios

And got this:

request botched: dev skd0: type=1, flags=12248081
sector 8390784, nr/cnr 0/128
bio ffff88033169cba0, biotail ffff88032e42bb60, buffer (null), len 0
------------[ cut here ]------------
kernel BUG at block/blk-core.c:2693!
invalid opcode: 0000 [#1] SMP
Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_iblock t
arget_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT n
f_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio i
pv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_int
el kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio ses enclosure sg ac
pi_cpufreq dm_mod ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix skd sd_mod crc_t10dif crct10dif_common megaraid_sas
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 3.14.0-rc1.snitm+ #5
Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
task: ffff88033299e150 ti: ffff8803329a4000 task.ti: ffff8803329a4000
RIP: 0010:[<ffffffff81252f1a>] [<ffffffff81252f1a>] __blk_end_request_all+0x2a/0x40
RSP: 0018:ffff88033fc43cf8 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff88032e636ac8 RCX: 0000000000000006
RDX: 0000000000000001 RSI: ffff88033169cba0 RDI: ffff88032ec755c0
RBP: ffff88033fc43cf8 R08: 0000000000000002 R09: 0000000000000000
R10: 00000000000006f3 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88033195faa8 R14: ffff8800ba396000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88033fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003bfea13000 CR3: 000000032fbdc000 CR4: 00000000000007e0
Stack:
ffff88033fc43d58 ffffffffa0037b85 ffff88033fc43d48 ffffffff8129ca09
ffff88033fc43d28 ffff88032e636ac8 ffff8800ba396000 ffff88032e650080
ffff8800ba396000 ffff88032e650080 ffff88032e636ac8 0000000000003c17
Call Trace:
<IRQ>
[<ffffffffa0037b85>] skd_end_request+0x55/0x160 [skd]
[<ffffffff8129ca09>] ? swiotlb_unmap_sg_attrs+0x69/0x80
[<ffffffffa003c513>] skd_isr_completion_posted+0x1e3/0x5d0 [skd]
[<ffffffff810930a3>] ? __wake_up+0x53/0x70
[<ffffffffa003d1b2>] skd_isr+0x122/0x280 [skd]
[<ffffffff810a73ed>] handle_irq_event_percpu+0x6d/0x200
[<ffffffff810a75c2>] handle_irq_event+0x42/0x70
[<ffffffff810aad19>] handle_edge_irq+0x69/0x120
[<ffffffff81005aec>] handle_irq+0x5c/0x150
[<ffffffff815470f2>] ? __atomic_notifier_call_chain+0x12/0x20
[<ffffffff81547116>] ? atomic_notifier_call_chain+0x16/0x20
[<ffffffff8154d91e>] do_IRQ+0x5e/0x110
[<ffffffff8154366a>] common_interrupt+0x6a/0x6a
<EOI>
[<ffffffff8144d5e3>] ? cpuidle_enter_state+0x53/0xd0
[<ffffffff8144d5df>] ? cpuidle_enter_state+0x4f/0xd0
[<ffffffff8144d7a7>] cpuidle_idle_call+0xc7/0x160
[<ffffffff8100cf5e>] arch_cpu_idle+0xe/0x30
[<ffffffff810a696a>] cpu_idle_loop+0x9a/0x240
[<ffffffff810b9e64>] ? clockevents_register_device+0xc4/0x130
[<ffffffff810a6b33>] cpu_startup_entry+0x23/0x30
[<ffffffff81032d5a>] start_secondary+0x7a/0x80
Code: 00 55 48 89 e5 66 66 66 66 90 48 8b 87 78 01 00 00 48 85 c0 75 10 31 c9 8b 57 64 e8 91 ff ff ff 84 c0 75 07 c9 c3 8b 48 64 eb ed <0f> 0b 0f 1f 40 00 eb fa 66 66
66 66 66 2e 0f 1f 84 00 00 00 00
RIP [<ffffffff81252f1a>] __blk_end_request_all+0x2a/0x40
RSP <ffff88033fc43cf8>
---[ end trace 494de22d0f0be0f8 ]---
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.394 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.402 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.405 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.410 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.414 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.417 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.421 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.424 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.428 msecs
INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.431 msecs
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/