On Sun, Jun 01, 2025 at 07:24:14PM -0400, Sasha Levin wrote:
From: Brian Norris <briannorris@xxxxxxxxxxxx>
[ Upstream commit 788019eb559fd0b365f501467ceafce540e377cc ]
Affinity-managed interrupts can be shut down and restarted during CPU
hotunplug/plug. Thereby the interrupt may be left in an unexpected state.
Specifically:
1. Interrupt is affine to CPU N
2. disable_irq() -> depth is 1
3. CPU N goes offline
4. irq_shutdown() -> depth is set to 1 (again)
5. CPU N goes online
6. irq_startup() -> depth is set to 0 (BUG! driver expects that the interrupt
still disabled)
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
This is only a problem for managed interrupts and CPU hotplug, all other
cases like request()/free()/request() truly needs to reset a possibly stale
disable depth value.
Provide a startup function, which takes the disable depth into account, and
invoked it for the managed interrupts in the CPU hotplug path.
This requires to change irq_shutdown() to do a depth increment instead of
setting it to 1, which allows to retain the disable depth, but is harmless
for the other code paths using irq_startup(), which will still reset the
disable depth unconditionally to keep the original correct behaviour.
A kunit tests will be added separately to cover some of these aspects.
[ tglx: Massaged changelog ]
Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Brian Norris <briannorris@xxxxxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Link: https://lore.kernel.org/all/20250514201353.3481400-2-briannorris@xxxxxxxxxxxx
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
This one breaks suspend of laptops like the Lenovo ThinkPad T14s. Issue
was just reported here by Alex:
https://lore.kernel.org/lkml/24ec4adc-7c80-49e9-93ee-19908a97ab84@xxxxxxxxx/
Please drop from all stable queues for now.