Re: [PATCH v7 4/4] nmi_backtrace: generate one-line reports for idle cpus

From: Chris Metcalf
Date: Tue Aug 09 2016 - 13:58:53 EST


On 8/9/2016 6:37 AM, Lorenzo Pieralisi wrote:
On Mon, Aug 08, 2016 at 05:48:28PM +0100, Mark Rutland wrote:
Hi,

[adding Lorenzo]

On Mon, Aug 08, 2016 at 12:03:38PM -0400, Chris Metcalf wrote:
When doing an nmi backtrace of many cores, most of which are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the
interrupted PC to see if it lies within that section.

This commit suitably tags x86, arm64, and tile idle routines,
and only adds in the minimal framework for other architectures.
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 659963d40bb4..fe7f93b7b11b 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -122,6 +122,7 @@ SECTIONS
ENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
HYPERVISOR_TEXT
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5bb61de23201..64f088ca3192 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -48,11 +48,13 @@
*
* Idle the processor (wait for interrupt).
*/
+ .pushsection ".cpuidle.text","ax"
ENTRY(cpu_do_idle)
dsb sy // WFI may enter a low-power mode
wfi
ret
ENDPROC(cpu_do_idle)
+ .popsection
From a quick scan it looks like we only call this with interrupts
disabled, and we have no NMI. So shouldn't we be annotating
arch_cpu_idle(), which calls this and subsequently enables interrupts?

You're right - I made a quick mental mapping between the arch/tile
_cpu_idle assembly and the arch/arm64 cpu_do_idle. But on tile the
way it works is we can racelessly enable interrupts and then issue the
"nap" instruction; it is similar to WFI except that you actually take
the interrupt right from the nap instruction itself, and then have to
manually bump forward the PC in the handler if you want the nap to act
more like a WFI. I see on closer examination that you're right, we
won't interrupt in the cpu_do_idle assembly anyway.

You're also right that there is no support for remote stack dump on
arm64 right now. I added the arm64 "support" just because I am
hacking on arm64 most of the day at this point anyway, and felt like
the cpu_idle tracking knowledge might as well be there if/when support
for some kind of NMI-style remote interrupt was added to the Linux
implementation.

The Tile architecture also has no "NMI" per se, but we use individual
bitmasks to enable and disable interrupts, so the Linux irq_disable()
just amounts to "write a particular bitmask into the enable
register". The bitmask itself is just a per-cpu variable that changes
as interrupt sources are configured, and there are a few (a couple of
performance interrupts, and a synthetic one used for cross-core ipi)
that we never mark as maskable.

I'm also not sure what you need to do for PSCI, which is the preferred
(FW-backed) idle mechanism for arm64. The infrastrucure for that is
spread over a few files:

arch/arm64/kernel/sleep.S
arch/arm64/kernel/smccc-call.S
arch/arm64/kernel/suspend.c
drivers/cpuidle/cpuidle-arm.c
drivers/firmware/psci.c

I'm not sure where we'd be an an interruptible state, and therefore I'm
not immediately sure what we should annotate.
I am probably missing something here, but let me add that I am not
sure I understand how this patch can be used on ARM/ARM64 systems
so ARM platform idle back-end code annotation is basically useless
given that it is code that can't be preempted anyway (and even if
it could PC range check can even fail given that we may execute some
code with MMU off so out of physical addresses).

I think this is all fair enough, and I will back out the arm64 "support" for my next
patch series.

What's the purpose of this cpu idle tracking ? Can't it be implemented
in a simpler way (ie RCU API) ?

The cpu idle tracking here is done solely to make the "backtrace all cpus" output
less crazy-verbose. We annotate functions because claiming "there's nothing
interesting to see here; go away" is not something you want to do unless you're
really quite sure that there's nothing interesting going on there. In particular, if
the RCU stuff is screwed up, you want to see backtraces out of the RCU code if you
happen to be somehow stuck there, even if some RCU state claims you are idle.

See e.g. the discussion with Peter Ziljstra starting around here:

https://lkml.org/lkml/2016/3/7/681

Thanks!

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com