Re: Observing Softlockup's while running heavy IOs

From: Sreekanth Reddy
Date: Fri Aug 19 2016 - 08:10:41 EST


First of all thanks for Robert and Bart for reply.

Robert,
Thanks for the URL, I have gone though this URL.
Yes I am also observing that all the interrupts are routed to one CPU.
But still I observing softlockups (sometime hardlockups) even when I
set rq_affinity to 2.

Is their any way to route the interrupts the same CPUs which has
submitted the corresponding IOs?
or
Is their any way/option in the irqbalance/kernel which can route
interrupts to CPUs (enabled in affinity_hint) in round robin manner
after specific time period.


Bart,
I have tried with your path and here is logs,

[Aug19 13:48] __blk_mq_run_hw_queue() finished after 1 ms
[ +1.196454] __blk_mq_run_hw_queue() finished after 11 ms
[Aug19 13:49] __blk_mq_run_hw_queue() finished after 20 ms
[ +14.173018] __blk_mq_run_hw_queue() finished after 278 ms
[ +14.066448] __blk_mq_run_hw_queue() finished after 588 ms
[ +5.394698] __blk_mq_run_hw_queue() finished after 1360 ms
[Aug19 13:51] __blk_mq_run_hw_queue() finished after 1539 ms
[Aug19 13:54] __blk_mq_run_hw_queue() finished after 1762 ms
[Aug19 13:55] __blk_mq_run_hw_queue() finished after 2087 ms
[Aug19 13:57] __blk_mq_run_hw_queue() finished after 2915 ms
[Aug19 14:06] perf interrupt took too long (2509 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[Aug19 14:10] __blk_mq_run_hw_queue() finished after 3266 ms
[Aug19 14:15] __blk_mq_run_hw_queue() finished after 3953 ms
[Aug19 14:22] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
[llvmpipe-9:3152]
[ +0.000439] __blk_mq_run_hw_queue() finished after 10058 ms
[ +0.007206] Modules linked in: mpt3sas(OE) xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack tun bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables raid456 async_raid6_recov async_memcpy
async_pq async_xor xor intel_rapl async_tx iosf_mbi
x86_pkg_temp_thermal raid6_pq coretemp kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel mei_me joydev pcspkr sb_edac i2c_i801
mei iTCO_wdt iTCO_vendor_support edac_core lpc_ich ipmi_ssif ipmi_si
ipmi_msghandler shpchp tpm_tis ioatdma acpi_power_meter tpm wmi
acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ast
i2c_algo_bit drm_kms_helper ttm ixgbe drm mdio vxlan ip6_udp_tunnel
udp_tunnel crc32c_intel raid_class ptp scsi_transport_sas
[ +0.000063] pps_core nvme dca [last unloaded: mpt3sas]
[ +0.000007] CPU: 6 PID: 3152 Comm: llvmpipe-9 Tainted: G W
OE 4.2.0 #1
[ +0.000002] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+,
BIOS 1.1 07/22/2015
[ +0.000003] task: ffff883f5cf557c0 ti: ffff883f5afd8000 task.ti:
ffff883f5afd8000
[ +0.000002] RIP: 0010:[<ffffffff810a159b>] [<ffffffff810a159b>]
__do_softirq+0x7b/0x290
[ +0.000008] RSP: 0000:ffff883f7f183f08 EFLAGS: 00000206
[ +0.000002] RAX: ffff883f5afdc000 RBX: ffff883f7f190080 RCX: 00000000000006e0
[ +0.000002] RDX: 0000000000003508 RSI: 0000000071c139c0 RDI: ffff883f5cf557c0
[ +0.000002] RBP: ffff883f7f183f68 R08: 00003508717f8da4 R09: ffff883f7f183d80
[ +0.000002] R10: 0000000000000000 R11: 0000000000000004 R12: ffff883f7f183e78
[ +0.000001] R13: ffffffff8177304b R14: ffff883f7f183f68 R15: 0000000000000000
[ +0.000003] FS: 00007fa76b7f6700(0000) GS:ffff883f7f180000(0000)
knlGS:0000000000000000

Message from syslogd@dhcp-135-24-192-127 at Aug 19 14:22:42 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [llvmpipe-9:3152]
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000002] CR2: 00007fa03a2d90c0 CR3: 0000003f61fe4000 CR4: 00000000001406e0
[ +0.000002] Stack:
[ +0.000001] ffff883f7f18fc40 404040407f18fce8 ffff883f5afdc000
0000000100aecc1a
[ +0.000004] 00000ab10000000a 000002027f18fc58 ffff883f7f183f48
0000000000000000
[ +0.000003] ffff883f7f1967c0 ffff883f448d57c0 0000000000000000
0000000000000001
[ +0.000003] Call Trace:
[ +0.000002] <IRQ>
[ +0.000005] [<ffffffff810a19c6>] irq_exit+0x116/0x120
[ +0.000005] [<ffffffff81774eb6>] smp_apic_timer_interrupt+0x46/0x60
[ +0.000005] [<ffffffff8177304b>] apic_timer_interrupt+0x6b/0x70
[ +0.000002] <EOI>
[ +0.000005] [<ffffffff810c392b>] ? finish_task_switch+0x6b/0x200
[ +0.000006] [<ffffffff8176dabc>] __schedule+0x36c/0x950
[ +0.000002] [<ffffffff8176e0d7>] schedule+0x37/0x80
[ +0.000006] [<ffffffff81116a1c>] futex_wait_queue_me+0xbc/0x120
[ +0.000004] [<ffffffff81116da9>] futex_wait+0x119/0x270
[ +0.000004] [<ffffffff81116800>] ? futex_wake+0x90/0x180
[ +0.000003] [<ffffffff81118d6b>] do_futex+0x12b/0xb00
[ +0.000005] [<ffffffff810d348e>] ? set_next_entity+0x23e/0x440
[ +0.000007] [<ffffffff810136f1>] ? __switch_to+0x261/0x4b0
[ +0.000004] [<ffffffff811197c1>] SyS_futex+0x81/0x180
[ +0.000002] [<ffffffff8176e0d7>] ? schedule+0x37/0x80
[ +0.000004] [<ffffffff817721ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ +0.000001] Code: 7e 00 01 00 00 65 48 8b 04 25 c4 3c 01 00 c7 45 c0
0a 00 00 00 48 89 45 b0 65 c7 05 6c 26 f7 7e 00 00 00 00 fb 66 0f 1f
44 00 00 <b8> ff ff ff ff 49 c7 c4 c0 a0 c0 81 0f bc 45 cc 83 c0 01 89
45

and here is the 'sar -u ALL 1 -P ALL 1'
02:22:43 PM CPU %usr %nice %sys %iowait %steal
%irq %soft %guest %gnice %idle
02:22:44 PM all 5.73 0.60 29.31 14.90 0.00
0.00 1.57 0.00 0.00 47.89
02:22:44 PM 0 8.82 0.00 48.53 38.24 0.00
0.00 0.00 0.00 0.00 4.41
02:22:44 PM 1 10.29 0.00 63.24 26.47 0.00
0.00 0.00 0.00 0.00 0.00
02:22:44 PM 2 8.96 0.00 65.67 25.37 0.00
0.00 0.00 0.00 0.00 0.00
02:22:44 PM 3 11.94 0.00 61.19 26.87 0.00
0.00 0.00 0.00 0.00 0.00
02:22:44 PM 4 8.96 0.00 64.18 26.87 0.00
0.00 0.00 0.00 0.00 0.00
02:22:44 PM 5 7.46 0.00 67.16 25.37 0.00
0.00 0.00 0.00 0.00 0.00
02:22:44 PM 6 0.00 0.00 0.00 0.00 0.00
0.00 100.00 0.00 0.00 0.00
02:22:44 PM 7 1.33 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 98.67
02:22:44 PM 8 5.19 0.00 1.30 0.00 0.00
0.00 0.00 0.00 0.00 93.51
02:22:44 PM 9 2.67 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 97.33
02:22:44 PM 10 2.67 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 97.33
02:22:44 PM 11 3.95 0.00 1.32 1.32 0.00
0.00 0.00 0.00 0.00 93.42
02:22:44 PM 12 11.11 0.00 83.33 5.56 0.00
0.00 0.00 0.00 0.00 0.00
02:22:44 PM 13 8.70 0.00 52.17 36.23 0.00
0.00 0.00 0.00 0.00 2.90
02:22:44 PM 14 6.15 0.00 53.85 38.46 0.00
0.00 0.00 0.00 0.00 1.54
02:22:44 PM 15 5.97 0.00 55.22 37.31 0.00
0.00 0.00 0.00 0.00 1.49
02:22:44 PM 16 8.70 0.00 52.17 36.23 0.00
0.00 0.00 0.00 0.00 2.90
02:22:44 PM 17 7.35 0.00 55.88 35.29 0.00
0.00 0.00 0.00 0.00 1.47
02:22:44 PM 18 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 100.00
02:22:44 PM 19 2.63 0.00 1.32 2.63 0.00
0.00 0.00 0.00 0.00 93.42
02:22:44 PM 20 2.67 0.00 1.33 0.00 0.00
0.00 0.00 0.00 0.00 96.00
02:22:44 PM 21 7.89 5.26 2.63 0.00 0.00
0.00 0.00 0.00 0.00 84.21
02:22:44 PM 22 1.32 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 98.68
02:22:44 PM 23 4.05 8.11 2.70 4.05 0.00
0.00 0.00 0.00 0.00 81.08

Still I am continuing my investigation on this.

Note:
I am taking vacation next week, please expect some delay for response.

Thanks,
Sreekanth

On Fri, Aug 19, 2016 at 2:38 AM, Elliott, Robert (Persistent Memory)
<elliott@xxxxxxx> wrote:
>
>
>> -----Original Message-----
>> From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Sreekanth Reddy
>> Sent: Thursday, August 18, 2016 12:56 AM
>> Subject: Observing Softlockup's while running heavy IOs
>>
>> Problem statement:
>> Observing softlockups while running heavy IOs on 8 SSD drives
>> connected behind our LSI SAS 3004 HBA.
>>
> ...
>> Observing a loop in the IO path, i.e only one CPU is busy with
>> processing the interrupts and other CPUs (in the affinity_hint mask)
>> are busy with sending the IOs (these CPUs are not yet all receiving
>> any interrupts). For example, only CPU6 is busy with processing the
>> interrupts from IRQ 219 and remaining CPUs i.e CPU 7,8,9,10 & 11 are
>> just busy with pumping the IOs and they never processed any IO
>> interrupts from IRQ 219. So we are observing softlockups due to
>> existence this loop in the IO Path.
>>
>> We may not observe these softlockups if irqbalancer might have
>> balanced the interrupts among the CPUs enabled in the particular
>> irq's
>> affinity_hint mask. so that all the CPUs are equaly busy with send
>> IOs
>> and processing the interrupts. I am not sure how irqbalancer balance
>> the load among the CPUs, but here I see only one CPU from irq's
>> affinity_hint mask is busy with interrupts and remaining CPUs won't
>> receive any interrupts from this IRQ.
>>
>> Please help me with any suggestions/recomendations to slove/limit
>> these kind of softlockups. Also please let me known if I have missed
>> any setting in the irqbalance.
>>
>
> The CPUs need to be forced to self-throttle by processing interrupts for
> their own submissions, which reduces the time they can submit more IOs.
>
> See https://lkml.org/lkml/2014/9/9/931 for discussion of this
> problem when blk-mq was added.
>
>
> ---
> Robert Elliott, HPE Persistent Memory
>
>
>