Question on threaded handlers for managed interrupts

From: John Garry
Date: Thu Apr 22 2021 - 12:13:43 EST


Hi Thomas,

I am finding that I can pretty easily trigger a system hang for certain scenarios with my storage controller.

So I'm getting something like this when running moderately heavy data throughput:

Starting 6 processes
[70.656622] sched: RT throttling activatedB/s][r=356k,w=0 IOPS][eta
01h:14m:43s]
[ 207.632161] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:ta
01h:12m:26s]
[ 207.638261] rcu: 0-...!: (1 GPs behind)
idle=312/1/0x4000000000000000 softirq=508/512 fqs=0
[ 207.646777] rcu: 1-...!: (1 GPs behind) idle=694/0/0x0

It ends pretty badly - see [0].

The multi-queue storage controller (see [1] for memory refresh, but note that I can also trigger on PCI device host controller as well) is using managed interrupts and threaded handlers. Since the threaded handler uses SCHED_FIFO, aren't we always vulnerable to this situation with the managed interrupt and threaded handler combo? Would the advice be to just use irq polling here?

I unsuccessfully tried to trigger the same on NVMe PCI - however I have only 1x card, so hardly overloading the system.

Thanks,
John

[0] https://lore.kernel.org/rcu/412926e8-d3e1-3071-8cb9-098a7f49b64c@xxxxxxxxxx/T/#mbd60463c543e04f87090d89301e1a5f10de958dd

[1] https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@xxxxxxxxxx/#t