RE: system hung up when offlining CPUs

From: Thomas Gleixner
Date: Tue Oct 31 2017 - 20:47:27 EST


On Mon, 30 Oct 2017, Shivasharan Srikanteshwara wrote:

> In managed-interrupts case, interrupts which were affine to the offlined
> CPU is not getting migrated to another available CPU. But the
> documentation at below link says that "all interrupts" are migrated to a
> new CPU. So not all interrupts are getting migrated to a new CPU then.

Correct.

> https://www.kernel.org/doc/html/v4.11/core-api/cpu_hotplug.html#the-offlin
> e-case
> "- All interrupts targeted to this CPU are migrated to a new CPU"

Well, documentation is not always up to date :)

> Once the last CPU in the affinity mask is offlined and a particular IRQ
> is shutdown, is there a way currently for the device driver to get
> callback to complete all outstanding requests on that queue?

No and I have no idea how the other drivers deal with that.

The way you can do that is to have your own hotplug callback which is
invoked when the cpu goes down, but way before the interrupt is shut down,
which is one of the last steps. Ideally this would be a callback in the
generic block code which then calls out to all instances like its done for
the cpu dead state.

Jens, Christoph?

Thanks,

tglx