xHCI corruption after double URB submit

From: Laura Abbott
Date: Tue Jul 28 2015 - 18:40:20 EST


Hi,

While debugging an issue with another driver, I've hit some corruption in xHCI.
I'm not sure if the corruption is directly caused by the first warning or if
the warning is exposing an issue with the driver. The issue I was actually
trying to debug was a URB double submit:

------------[ cut here ]------------
WARNING: CPU: 3 PID: 3563 at drivers/usb/core/urb.c:339 usb_submit_urb+0x2ad/0x5a0()
URB ffff8804078ac240 submitted while active
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_security iptable_raw bnep xpad snd_hda_codec_realtek
snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec
snd_hda_core snd_hwdep snd_seq intel_rapl iosf_mbi snd_seq_device
x86_pkg_temp_thermal coretemp snd_pcm kvm uvcvideo iTCO_wdt iTCO_vendor_support
iwlwifi videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common btusb videodev btrtl btbcm thinkpad_acpi snd_timer btintel mei_me
rtsx_pci_ms cfg80211 bluetooth pcspkr mei media memstick joydev snd tpm_tis
shpchp ie31200_edac i2c_i801 tpm lpc_ich edac_core nfsd rfkill wmi soundcore
auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc dm_crypt i915 8021q garp stp
llc mrp i2c_algo_bit drm_kms_helper drm rtsx_pci_sdmmc mmc_core e1000e
crct10dif_pclmul crc32_pclmul crc32c_intel rtsx_pci ghash_clmulni_intel ptp
serio_raw pps_core mfd_core video
CPU: 3 PID: 3563 Comm: led_test.sh Not tainted 4.2.0-rc4-xpad+ #14
Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS GMET62WW (2.10 ) 03/19/2014
0000000000000000 0000000017a45bc6 ffff8800c9a0fbd8 ffffffff81758a11
0000000000000000 ffff8800c9a0fc30 ffff8800c9a0fc18 ffffffff8109b656
0000000000000002 ffff8804078ac240 00000000000000d0 ffff8800c9806d60
Call Trace:
[<ffffffff81758a11>] dump_stack+0x45/0x57
[<ffffffff8109b656>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109b6e5>] warn_slowpath_fmt+0x55/0x70
[<ffffffff8120f218>] ? do_truncate+0x88/0xc0
[<ffffffff815427fd>] usb_submit_urb+0x2ad/0x5a0
[<ffffffff81230df4>] ? mntput+0x24/0x40
[<ffffffff8121b667>] ? terminate_walk+0xc7/0xe0
[<ffffffffa0430877>] xpad_send_led_command+0xc7/0x110 [xpad]
[<ffffffffa04308d5>] xpad_led_set+0x15/0x20 [xpad]
[<ffffffff815f9678>] led_set_brightness+0x88/0xc0
[<ffffffff815f9b0e>] brightness_store+0x7e/0xc0
[<ffffffff814b7478>] dev_attr_store+0x18/0x30
[<ffffffff8128bba7>] sysfs_kf_write+0x37/0x40
[<ffffffff8128b15d>] kernfs_fop_write+0x11d/0x170
[<ffffffff81210d17>] __vfs_write+0x37/0x100
[<ffffffff81213b28>] ? __sb_start_write+0x58/0x110
[<ffffffff813124dd>] ? security_file_permission+0x3d/0xc0
[<ffffffff81211696>] vfs_write+0xa6/0x1a0
[<ffffffff8120e93a>] ? filp_close+0x5a/0x80
[<ffffffff81212385>] SyS_write+0x55/0xc0
[<ffffffff8175f0ae>] entry_SYSCALL_64_fastpath+0x12/0x71
---[ end trace f573b768c94a66d6 ]---

I've found several issues where a double submit can happen so I mostly have
a handle on that. Shortly after the double submit though, I see corruption:

------------[ cut here ]------------
WARNING: CPU: 3 PID: 0 at lib/list_debug.c:36 __list_add+0xb4/0xc0()
list_add double add: new=ffff8804078ac260, prev=ffff8804078ac260, next=ffff88040456c358.
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_security iptable_raw bnep xpad snd_hda_codec_realtek
snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec
snd_hda_core snd_hwdep snd_seq intel_rapl iosf_mbi snd_seq_device
x86_pkg_temp_thermal coretemp snd_pcm kvm uvcvideo iTCO_wdt iTCO_vendor_support
iwlwifi videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common btusb videodev btrtl btbcm thinkpad_acpi snd_timer btintel mei_me
rtsx_pci_ms cfg80211 bluetooth pcspkr mei media memstick joydev snd tpm_tis
shpchp ie31200_edac i2c_i801 tpm lpc_ich edac_core nfsd rfkill wmi soundcore
auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc dm_crypt i915 8021q garp stp
llc mrp i2c_algo_bit drm_kms_helper drm rtsx_pci_sdmmc mmc_core e1000e
crct10dif_pclmul crc32_pclmul crc32c_intel rtsx_pci ghash_clmulni_intel ptp
serio_raw pps_core mfd_core video
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 4.2.0-rc4-xpad+ #14
Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS GMET62WW (2.10 ) 03/19/2014
0000000000000000 b173c41eaf0978cf ffff88041e2c3918 ffffffff81758a11
0000000000000000 ffff88041e2c3970 ffff88041e2c3958 ffffffff8109b656
0000000000000000 ffff8804078ac260 ffff8804078ac260 ffff88040456c358
Call Trace:
<IRQ> [<ffffffff81758a11>] dump_stack+0x45/0x57
[<ffffffff8109b656>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109b6e5>] warn_slowpath_fmt+0x55/0x70
[<ffffffff813a75b4>] __list_add+0xb4/0xc0
[<ffffffff8153eb64>] usb_hcd_link_urb_to_ep+0x74/0x90
[<ffffffff8157cbe8>] prepare_transfer+0xa8/0x120
[<ffffffff8157dff3>] xhci_queue_bulk_tx+0xb3/0x750
[<ffffffff8148f820>] ? mix_pool_bytes+0x50/0x90
[<ffffffff8157e746>] xhci_queue_intr_tx+0xb6/0x150
[<ffffffff811f3fe0>] ? __kmalloc+0x200/0x260
[<ffffffff81575fa9>] xhci_urb_enqueue+0x4d9/0x6a0
[<ffffffff81540c66>] usb_hcd_submit_urb+0xa6/0xac0
[<ffffffff810d4927>] ? find_busiest_group+0x47/0x4e0
[<ffffffff8154294c>] usb_submit_urb+0x3fc/0x5a0
[<ffffffffa043042e>] xpad_irq_out+0x7e/0xc0 [xpad]
[<ffffffff8153f515>] __usb_hcd_giveback_urb+0x85/0x130
[<ffffffff8153f6ef>] usb_hcd_giveback_urb+0x3f/0xe0
[<ffffffff8158121f>] xhci_irq+0xd9f/0x20f0
[<ffffffff81582581>] xhci_msi_irq+0x11/0x20
[<ffffffff810f0c14>] handle_irq_event_percpu+0x74/0x180
[<ffffffff810f0d50>] handle_irq_event+0x30/0x60
[<ffffffff810f3f1f>] handle_edge_irq+0x6f/0x130
[<ffffffff81016ed2>] handle_irq+0x72/0x120
[<ffffffff810ba0ca>] ? atomic_notifier_call_chain+0x1a/0x20
[<ffffffff81761cdf>] do_IRQ+0x4f/0xe0
[<ffffffff8175fbeb>] common_interrupt+0x6b/0x6b
<EOI> [<ffffffff81102f5f>] ? hrtimer_start_range_ns+0x1bf/0x3b0
[<ffffffff815f7210>] ? cpuidle_enter_state+0x130/0x270
[<ffffffff815f71eb>] ? cpuidle_enter_state+0x10b/0x270
[<ffffffff815f7387>] cpuidle_enter+0x17/0x20
[<ffffffff810dc542>] call_cpuidle+0x32/0x60
[<ffffffff815f7363>] ? cpuidle_select+0x13/0x20
[<ffffffff810dc7d8>] cpu_startup_entry+0x268/0x320
[<ffffffff8104cdd3>] start_secondary+0x183/0x1c0
---[ end trace f573b768c94a66d7 ]---
------------[ cut here ]------------
WARNING: CPU: 3 PID: 0 at lib/list_debug.c:33 __list_add+0x91/0xc0()
list_add corruption. prev->next should be next (ffff8800350e3240), but was ffff8804064fca20. (prev=ffff8804064fca20).
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_security iptable_raw bnep xpad snd_hda_codec_realtek
snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec
snd_hda_core snd_hwdep snd_seq intel_rapl iosf_mbi snd_seq_device
x86_pkg_temp_thermal coretemp snd_pcm kvm uvcvideo iTCO_wdt iTCO_vendor_support
iwlwifi videobuf2_vmalloc videobuf2_core videobuf2_memops
v4l2_common btusb videodev btrtl btbcm thinkpad_acpi snd_timer btintel mei_me
rtsx_pci_ms cfg80211 bluetooth pcspkr mei media memstick joydev snd tpm_tis
shpchp ie31200_edac i2c_i801 tpm lpc_ich edac_core nfsd rfkill wmi soundcore
auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc dm_crypt i915 8021q garp stp
llc mrp i2c_algo_bit drm_kms_helper drm rtsx_pci_sdmmc mmc_core e1000e
crct10dif_pclmul crc32_pclmul crc32c_intel rtsx_pci ghash_clmulni_intel ptp
serio_raw pps_core mfd_core video
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 4.2.0-rc4-xpad+ #14
Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS GMET62WW (2.10 ) 03/19/2014
0000000000000000 b173c41eaf0978cf ffff88041e2c3938 ffffffff81758a11
0000000000000000 ffff88041e2c3990 ffff88041e2c3978 ffffffff8109b656
ffff880407109bb0 ffff8804064fca20 ffff8804064fca20 ffff8800350e3240
Call Trace:
<IRQ> [<ffffffff81758a11>] dump_stack+0x45/0x57
[<ffffffff8109b656>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109b6e5>] warn_slowpath_fmt+0x55/0x70
[<ffffffff813a7591>] __list_add+0x91/0xc0
[<ffffffff8157cc02>] prepare_transfer+0xc2/0x120
[<ffffffff8157dff3>] xhci_queue_bulk_tx+0xb3/0x750
[<ffffffff8148f820>] ? mix_pool_bytes+0x50/0x90
[<ffffffff8157e746>] xhci_queue_intr_tx+0xb6/0x150
[<ffffffff811f3fe0>] ? __kmalloc+0x200/0x260
[<ffffffff81575fa9>] xhci_urb_enqueue+0x4d9/0x6a0
[<ffffffff81540c66>] usb_hcd_submit_urb+0xa6/0xac0
[<ffffffff810d4927>] ? find_busiest_group+0x47/0x4e0
[<ffffffff8154294c>] usb_submit_urb+0x3fc/0x5a0
[<ffffffffa043042e>] xpad_irq_out+0x7e/0xc0 [xpad]
[<ffffffff8153f515>] __usb_hcd_giveback_urb+0x85/0x130
[<ffffffff8153f6ef>] usb_hcd_giveback_urb+0x3f/0xe0
[<ffffffff8158121f>] xhci_irq+0xd9f/0x20f0
[<ffffffff81582581>] xhci_msi_irq+0x11/0x20
[<ffffffff810f0c14>] handle_irq_event_percpu+0x74/0x180
[<ffffffff810f0d50>] handle_irq_event+0x30/0x60
[<ffffffff810f3f1f>] handle_edge_irq+0x6f/0x130
[<ffffffff81016ed2>] handle_irq+0x72/0x120
[<ffffffff810ba0ca>] ? atomic_notifier_call_chain+0x1a/0x20
[<ffffffff81761cdf>] do_IRQ+0x4f/0xe0
[<ffffffff8175fbeb>] common_interrupt+0x6b/0x6b
<EOI> [<ffffffff81102f5f>] ? hrtimer_start_range_ns+0x1bf/0x3b0
[<ffffffff815f7210>] ? cpuidle_enter_state+0x130/0x270
[<ffffffff815f71eb>] ? cpuidle_enter_state+0x10b/0x270
[<ffffffff815f7387>] cpuidle_enter+0x17/0x20
[<ffffffff810dc542>] call_cpuidle+0x32/0x60
[<ffffffff815f7363>] ? cpuidle_select+0x13/0x20
[<ffffffff810dc7d8>] cpu_startup_entry+0x268/0x320
[<ffffffff8104cdd3>] start_secondary+0x183/0x1c0
---[ end trace f573b768c94a66d8 ]---

which will repeat itself until the kernel GPFs or crashes somewhere
else. The test case for reproducing this is sending commands to the
USB driver very rapidly so the double submit is happening fairly
frequently. Is this a case where the driver is very questionable so
all bets are off or is this an issue in the xHCI layer? I've never
seen any other double submit reports indicating this type of corruption.

Thanks,
Laura
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/