Re: [PATCH] trace: adjust code layout in get_recursion_context

From: Peter Zijlstra
Date: Tue Aug 22 2017 - 11:20:34 EST


On Tue, Aug 22, 2017 at 05:14:10PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 22, 2017 at 04:40:24PM +0200, Jesper Dangaard Brouer wrote:
> > In an XDP redirect applications using tracepoint xdp:xdp_redirect to
> > diagnose TX overrun, I noticed perf_swevent_get_recursion_context()
> > was consuming 2% CPU. This was reduced to 1.6% with this simple
> > change.
>
> It is also incorrect. What do you suppose it now returns when the NMI
> hits a hard IRQ which hit during a Soft IRQ?

Does this help any? I can imagine the compiler could struggle to CSE
preempt_count() seeing how its an asm thing.

---
kernel/events/internal.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 486fd78eb8d5..e0b5b8fa83a2 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -206,13 +206,14 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs);

static inline int get_recursion_context(int *recursion)
{
+ unsigned int pc = preempt_count();
int rctx;

- if (in_nmi())
+ if (pc & NMI_MASK)
rctx = 3;
- else if (in_irq())
+ else if (pc & HARDIRQ_MASK)
rctx = 2;
- else if (in_softirq())
+ else if (pc & SOFTIRQ_OFFSET)
rctx = 1;
else
rctx = 0;