Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector_lock

From: Waiman Long
Date: Mon Mar 29 2021 - 15:58:32 EST


On 3/29/21 8:42 AM, Thomas Gleixner wrote:
Waiman,

On Sun, Mar 28 2021 at 20:52, Waiman Long wrote:
It was found that the following circular locking dependency warning
could happen in some systems:

[ 218.097878] ======================================================
[ 218.097879] WARNING: possible circular locking dependency detected
[ 218.097880] 4.18.0-228.el8.x86_64+debug #1 Not tainted
Reports have to be against latest mainline and not against the random
distro frankenkernel of the day. That's nothing new.

Plus I was asking you to provide a full splat to look at so this can be
discussed _upfront_. Oh well...

That was the full splat that I can see except the following trailing data:

[  218.098064] RIP: 0033:0x7ff4ee7620d6
[  218.098066] Code: 89 54 24 08 e8 9b f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f2 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 89 44 24 08 e8 c6 f4 ff ff 8b 44
[  218.098067] RSP: 002b:00007ffdda1116a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[  218.098069] RAX: ffffffffffffffda RBX: 0000564733953f70 RCX: 00007ff4ee7620d6
[  218.098070] RDX: 0000000000080902 RSI: 00005647339560e0 RDI: 00000000ffffff9c
[  218.098071] RBP: 0000000000000015 R08: 0000000000000000 R09: 000000000000004b
[  218.098072] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080902
[  218.098074] R13: 00005647339560e0 R14: 0000564733953a90 R15: 0000000000000002
[  218.098460] irq 3: Affinity broken due to vector space exhaustion.


[ 218.097914] -> #2 (&irq_desc_lock_class){-.-.}:
[ 218.097917] _raw_spin_lock_irqsave+0x48/0x81
[ 218.097918] __irq_get_desc_lock+0xcf/0x140
[ 218.097919] __dble_irq_nosync+0x6e/0x110
This function does not even exist in mainline and never existed...

[ 218.097967]
[ 218.097967] Chain exists of:
[ 218.097968] console_oc_lock_class --> vector_lock
[ 218.097972]
[ 218.097973] Possible unsafe locking scenario:
[ 218.097973]
[ 218.097974] CPU0 CPU1
[ 218.097975] ---- ----
[ 218.097975] lock(vector_lock);
[ 218.097977] lock(&irq_desc_lock_class);
[ 218.097980] lock(vector_lock);
[ 218.097981] lock(console_owner);
[ 218.097983]
[ 218.097984] *** DEADLOCK ***
[ 218.097984]
[ 218.097985] 6 locks held by systemd/1:
[ 218.097986] #0: ffff88822b5cc1e8 (&tty->legacy_mutex){+.+.}, at: tty_init_dev+0x79/0x440
[ 218.097989] #1: ffff88832ee00770 (&port->mutex){+.+.}, at: tty_port_open+0x85/0x190
[ 218.097993] #2: ffff88813be85a88 (&desc->request_mutex){+.+.}, at: __setup_irq+0x249/0x1e60
[ 218.097996] #3: ffff88813be858c0 (&irq_desc_lock_class){-.-.}, at: __setup_irq+0x2d9/0x1e60
[ 218.098000] #4: ffffffff84afca78 (vector_lock){-.-.}, at: x86_vector_activate+0xca/0xab0
[ 218.098003] #5: ffffffff84c27e20 (console_lock){+.+.}, at: vprintk_emit+0x13a/0x450
This is a more fundamental problem than just vector lock and the same
problem exists with any other printk over serial which is nested in the
interrupt activation chain not only on X86.

That is true. This problem is more generic than just that. I am hoping that the new printk rewrite may address this problem. I have been waiting for a while and that work is still not upstream yet. So what is your current timeline for that? If that will happen soon, I probably don't need this patch. I send this patch out as I am uncertain about it.


-static int activate_reserved(struct irq_data *irqd)
+static int activate_reserved(struct irq_data *irqd, char *wbuf, size_t wsize)
{
...

if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
irq_data_get_affinity_mask(irqd))) {
- pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n",
- irqd->irq);
+ snprintf(wbuf, wsize, KERN_WARNING
+ "irq %u: Affinity broken due to vector space exhaustion.\n",
+ irqd->irq);
This is not really any more tasteful than the previous one and it does
not fix the fundamental underlying problem.

But, because I'm curious and printk is a constant source of trouble, I
just added unconditional pr_warns into those functions under vector_lock
on 5.12-rc5.

Still waiting for the lockdep splat to show up while enjoying the
trickle of printks over serial.

If you really think this is an upstream problem then please provide a
corresponding lockdep splat on plain 5.12-rc5 along with a .config and
the scenario which triggers this. Not less, not more.

I will try to reproduce this problem with an upstream kernel.

Thanks,
Longman