Re: [RFC][PATCH v6]trace,x86: add x86 irq vector tracepoints

From: Steven Rostedt
Date: Mon Dec 17 2012 - 22:02:03 EST


On Tue, 2012-12-18 at 01:34 +0000, Seiji Aguchi wrote:
> Change log
>
> v5 -> v6
> - Rebased to 3.7
>
> v4 -> v5
> - Rebased to 3.6.0
>
> - Introduce a logic switching IDT at enabling/disabling TP time
> so that a time penalty makes a zero when tracepoints are disabled.
> This IDT is created only when CONFIG_TRACEPOINTS is enabled.
>
> - Remove arch_irq_vector_entry/exit and add followings again
> so that we can add each tracepoint in a generic way.
> - error_apic_vector
> - thermal_apic_vector
> - threshold_apic_vector
> - spurious_apic_vector
> - x86_platform_ipi_vector
>
> - Drop nmi tracepoints to begin with apic interrupts and discuss a logic switching
> IDT first.
>
> - Move irq_vectors.h in the directory of arch/x86/include/asm/trace because
> I'm not sure if a logic switching IDT is sharable with other architectures.
>
> v3 -> v4
> - Add a latency measurement of each tracepoint
> - Rebased to 3.6-rc6
>
> v2 -> v3
> - Remove an invalidate_tlb_vector event because it was replaced by a call function vector
> in a following commit.
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
>
> v1 -> v2
> - Modify variable name from irq to vector.
> - Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit.
> - error_apic_vector
> - thermal_apic_vector
> - threshold_apic_vector
> - spurious_apic_vector
> - x86_platform_ipi_vector
>
> [Purpose of this patch]
>
> As Vaibhav explained in the thread below, tracepoints for irq vectors
> are useful.
>
> http://www.spinics.net/lists/mm-commits/msg85707.html
>
> <snip>
> The current interrupt traces from irq_handler_entry and irq_handler_exit
> provide when an interrupt is handled. They provide good data about when
> the system has switched to kernel space and how it affects the currently
> running processes.
>
> There are some IRQ vectors which trigger the system into kernel space,
> which are not handled in generic IRQ handlers. Tracing such events gives
> us the information about IRQ interaction with other system events.
>
> The trace also tells where the system is spending its time. We want to
> know which cores are handling interrupts and how they are affecting other
> processes in the system. Also, the trace provides information about when
> the cores are idle and which interrupts are changing that state.
> <snip>
>
> On the other hand, my usecase is tracing just local timer event and
> getting a value of instruction pointer.
>
> I suggested to add an argument local timer event to get instruction pointer before.
> But there is another way to get it with external module like systemtap.
> So, I don't need to add any argument to irq vector tracepoints now.
>
> [Patch Description]
>
> Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
> But there is an above use case to trace specific irq_vector rather than tracing all events.
> In this case, we are concerned about overhead due to unwanted events.
>
> This patch adds following tracepoints instead of introducing irq_vector_entry/exit.
> so that we can enable them independently.
> - local_timer_vector
> - reschedule_vector
> - call_function_vector
> - call_function_single_vector
> - irq_work_entry_vector
> - error_apic_vector
> - thermal_apic_vector
> - threshold_apic_vector
> - spurious_apic_vector
> - x86_platform_ipi_vector
>
> Also, it introduces a logic switching IDT at enabling/disabling time so that a time penalty makes
> a complete zero when tracepoints are disabled. Detailed explanations are as follows.
> - Create new irq handlers inserted tracepoints by using macros.
> - Create a new IDT, trace_idt_table, at boot time by duplicating original IDT, idt table, and
> registering the new handers for tracpoints.
> - Switch IDT to new one at enabling TP time.
> - Restore to an original IDT at disabling TP time.
> The new IDT is created only when CONFIG_TRACEPOINTS is enabled to avoid being used for other purposes.
>
> Signed-off-by: Seiji Aguchi <seiji.aguchi@xxxxxxx>
> ---
> arch/x86/include/asm/desc.h | 27 +++++
> arch/x86/include/asm/entry_arch.h | 32 +++++
> arch/x86/include/asm/hw_irq.h | 14 +++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/apic/apic.c | 186 +++++++++++++++++-------------
> arch/x86/kernel/cpu/mcheck/therm_throt.c | 26 +++--
> arch/x86/kernel/cpu/mcheck/threshold.c | 27 +++--
> arch/x86/kernel/entry_64.S | 33 ++++++
> arch/x86/kernel/head_64.S | 6 +
> arch/x86/kernel/irq.c | 44 ++++---
> arch/x86/kernel/irq_work.c | 22 +++-
> arch/x86/kernel/irqinit.c | 2 +
> arch/x86/kernel/smp.c | 68 ++++++++----
> 13 files changed, 345 insertions(+), 143 deletions(-)
>
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index 8bf1c06..52becf4 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -345,6 +345,33 @@ static inline void set_intr_gate(unsigned int n, void *addr)
> _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
> }
>
> +#ifdef CONFIG_TRACEPOINTS
> +extern gate_desc trace_idt_table[];
> +extern void trace_idt_table_init(void);
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> + unsigned dpl, unsigned ist, unsigned seg)
> +{
> + gate_desc s;
> +
> + pack_gate(&s, type, (unsigned long)addr, dpl, ist, seg);
> + /*
> + * does not need to be atomic because it is only done once at
> + * setup time
> + */
> + write_idt_entry(trace_idt_table, gate, &s);
> +}
> +
> +static inline void trace_set_intr_gate(unsigned int n, void *addr)
> +{
> + BUG_ON((unsigned)n > 0xFF);
> + _trace_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
> +}
> +#else
> +static inline void trace_idt_table_init(void)
> +{
> +}
> +#endif
> +
> extern int first_system_vector;
> /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
> extern unsigned long used_vectors[];
> diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
> index 40afa00..8ef3900 100644
> --- a/arch/x86/include/asm/entry_arch.h
> +++ b/arch/x86/include/asm/entry_arch.h
> @@ -45,3 +45,35 @@ BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
> #endif
>
> #endif
> +
> +#ifdef CONFIG_TRACEPOINTS
> +#ifdef CONFIG_SMP
> +BUILD_INTERRUPT(trace_reschedule_interrupt, RESCHEDULE_VECTOR)
> +BUILD_INTERRUPT(trace_call_function_interrupt, CALL_FUNCTION_VECTOR)
> +BUILD_INTERRUPT(trace_call_function_single_interrupt,
> + CALL_FUNCTION_SINGLE_VECTOR)
> +#endif
> +
> +BUILD_INTERRUPT(trace_x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
> +
> +#ifdef CONFIG_X86_LOCAL_APIC
> +
> +BUILD_INTERRUPT(trace_apic_timer_interrupt, LOCAL_TIMER_VECTOR)
> +BUILD_INTERRUPT(trace_error_interrupt, ERROR_APIC_VECTOR)
> +BUILD_INTERRUPT(trace_spurious_interrupt, SPURIOUS_APIC_VECTOR)
> +
> +#ifdef CONFIG_IRQ_WORK
> +BUILD_INTERRUPT(trace_irq_work_interrupt, IRQ_WORK_VECTOR)
> +#endif
> +
> +#ifdef CONFIG_X86_THERMAL_VECTOR
> +BUILD_INTERRUPT(trace_thermal_interrupt, THERMAL_APIC_VECTOR)
> +#endif
> +
> +#ifdef CONFIG_X86_MCE_THRESHOLD
> +BUILD_INTERRUPT(trace_threshold_interrupt, THRESHOLD_APIC_VECTOR)
> +#endif
> +
> +#endif

Um, you could save all the above duplication if you just did:

in arch/x86/kernel/entry_32.S

#ifdef CONFIG_TRACING
#define BUILD_TRACE_INTERRUPT(name, nr) \
BUILD_INTERRUPT3(trace_##name, nr, smp_trace_##name)
#else
#define BUILD_TRACE_INTERRUPT(name, nr)
#endif

#define BUILD_INTERRUPT(name, nr) \
BUILD_INTERRUPT3(name, nr, smp_##name) \
BUILD_TRACE_INTERRUPT(name, nr)

#include <asm/entry_arch.h>


> +
> +#endif /* CONFIG_TRACEPOINTS */
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index eb92a6e..4472a78 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -76,6 +76,20 @@ extern void threshold_interrupt(void);
> extern void call_function_interrupt(void);
> extern void call_function_single_interrupt(void);
>
> +#ifdef CONFIG_TRACEPOINTS
> +/* Interrupt handlers registered during init_IRQ */
> +extern void trace_apic_timer_interrupt(void);
> +extern void trace_x86_platform_ipi(void);
> +extern void trace_error_interrupt(void);
> +extern void trace_irq_work_interrupt(void);
> +extern void trace_spurious_interrupt(void);
> +extern void trace_thermal_interrupt(void);
> +extern void trace_reschedule_interrupt(void);
> +extern void trace_threshold_interrupt(void);
> +extern void trace_call_function_interrupt(void);
> +extern void trace_call_function_single_interrupt(void);
> +#endif /* CONFIG_TRACEPOINTS */
> +
> /* IOAPIC */
> #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs))
> extern unsigned long io_apic_irqs;
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 34e923a..33504ea 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -100,6 +100,7 @@ obj-$(CONFIG_OF) += devicetree.o
> obj-$(CONFIG_UPROBES) += uprobes.o
>
> obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
> +obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
>
> ###
> # 64 bit specific files
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index b994cc8..20c4b1f 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -55,6 +55,9 @@
> #include <asm/tsc.h>
> #include <asm/hypervisor.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <asm/trace/irq_vectors.h>
> +
> unsigned int num_processors;
>
> unsigned disabled_cpus __cpuinitdata;
> @@ -912,27 +915,34 @@ static void local_apic_timer_interrupt(void)
> * [ if a single-CPU system runs an SMP kernel then we call the local
> * interrupt as well. Thus we cannot inline the local irq ... ]
> */
> -void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
> -{
> - struct pt_regs *old_regs = set_irq_regs(regs);
> -
> - /*
> - * NOTE! We'd better ACK the irq immediately,
> - * because timer handling can be slow.
> - */
> - ack_APIC_irq();
> - /*
> - * update_process_times() expects us to have done irq_enter().
> - * Besides, if we don't timer interrupts ignore the global
> - * interrupt lock, which is the WrongThing (tm) to do.
> - */
> - irq_enter();
> - exit_idle();
> - local_apic_timer_interrupt();
> - irq_exit();
> -
> - set_irq_regs(old_regs);
> -}
> +#define SMP_APIC_TIMER_INTERRUPT(trace, trace_enter, trace_exit) \
> +void __irq_entry smp_##trace##apic_timer_interrupt(struct pt_regs *regs)\
> +{ \
> + struct pt_regs *old_regs = set_irq_regs(regs); \
> + \
> + /* \
> + * NOTE! We'd better ACK the irq immediately, \
> + * because timer handling can be slow. \
> + */ \
> + ack_APIC_irq(); \
> + /* \
> + * update_process_times() expects us to have done irq_enter(). \
> + * Besides, if we don't timer interrupts ignore the global \
> + * interrupt lock, which is the WrongThing (tm) to do. \
> + */ \
> + irq_enter(); \
> + exit_idle(); \
> + trace_enter; \
> + local_apic_timer_interrupt(); \
> + trace_exit; \
> + irq_exit(); \
> + \
> + set_irq_regs(old_regs); \
> +}
> +
> +SMP_APIC_TIMER_INTERRUPT(,,)
> +SMP_APIC_TIMER_INTERRUPT(trace_, trace_local_timer_entry(LOCAL_TIMER_VECTOR),
> + trace_local_timer_exit(LOCAL_TIMER_VECTOR))


These macros are just plan ugly as all hell. What about using static
inlines and something like:

static inline void entering_irq(void)
{
irq_enter();
exit_idle();
}

static inline void exiting_irq(void)
{
irq_exit();
}

-- Just duplicate the two apic timer interrupts because it has the
needed ack first.


Then the rest could be:

static inline void __smp_spurious_interrupt(struct pt_regs *regs)
{
u32 v;


/*
* Check if this really is a spurious interrupt and ACK it
* if it is a vectored one. Just in case...
* Spurious interrupts should not be ACKed.
*/
v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));
if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
ack_APIC_irq();

inc_irq_stat(irq_spurious_count);

/* see sw-dev-man vol 3, chapter 7.4.13.5 */
pr_info("spurious APIC interrupt on CPU#%d, "
"should never happen.\n", smp_processor_id());

}

void smp_spurious_interrupt(struct pt_regs *regs)
{
entering_irq();
__smp_spurious_interrupt(regs);
exiting_irq();
}

void smp_trace_spurious_interrupt(struct pt_regs *regs)
{
entering_irq();
trace_spurious_apic_enter(SPURIOUS_APIC_VECTOR);
__smp_spurious_interrupt(regs);
trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR);
exiting_irq();
}

This is so much more readable (and maintainable) than using those ugly
macros.

-- Steve

>
> int setup_profiling_timer(unsigned int multiplier)
> {
> @@ -1908,71 +1918,91 @@ int __init APIC_init_uniprocessor(void)
> /*
> * This interrupt should _never_ happen with our APIC/SMP architecture
> */
> -void smp_spurious_interrupt(struct pt_regs *regs)
> -{
> - u32 v;
> -
> - irq_enter();
> - exit_idle();
> - /*
> - * Check if this really is a spurious interrupt and ACK it
> - * if it is a vectored one. Just in case...
> - * Spurious interrupts should not be ACKed.
> - */
> - v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));
> - if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f)))
> - ack_APIC_irq();
> -
> - inc_irq_stat(irq_spurious_count);
> -
> - /* see sw-dev-man vol 3, chapter 7.4.13.5 */
> - pr_info("spurious APIC interrupt on CPU#%d, "
> - "should never happen.\n", smp_processor_id());
> - irq_exit();
> -}
> +#define SMP_SPURIOUS_INTERRUPT(trace, trace_enter, trace_exit) \
> +void smp_##trace##spurious_interrupt(struct pt_regs *regs) \
> +{ \
> + u32 v; \
> + \
> + irq_enter(); \
> + exit_idle(); \
> + trace_enter; \
> + /* \
> + * Check if this really is a spurious interrupt and ACK it \
> + * if it is a vectored one. Just in case... \
> + * Spurious interrupts should not be ACKed. \
> + */ \
> + v = apic_read(APIC_ISR + ((SPURIOUS_APIC_VECTOR & ~0x1f) >> 1));\
> + if (v & (1 << (SPURIOUS_APIC_VECTOR & 0x1f))) \
> + ack_APIC_irq(); \
> + \
> + inc_irq_stat(irq_spurious_count); \
> + \
> + /* see sw-dev-man vol 3, chapter 7.4.13.5 */ \
> + pr_info("spurious APIC interrupt on CPU#%d, " \
> + "should never happen.\n", smp_processor_id()); \
> + trace_exit; \
> + irq_exit(); \
> +}
> +
> +SMP_SPURIOUS_INTERRUPT(,,)
> +SMP_SPURIOUS_INTERRUPT(trace_, trace_spurious_apic_entry(SPURIOUS_APIC_VECTOR),
> + trace_spurious_apic_exit(SPURIOUS_APIC_VECTOR))
>
> /*
> * This interrupt should never happen with our APIC/SMP architecture
> */
> -void smp_error_interrupt(struct pt_regs *regs)
> -{
> - u32 v0, v1;
> - u32 i = 0;
> - static const char * const error_interrupt_reason[] = {
> - "Send CS error", /* APIC Error Bit 0 */
> - "Receive CS error", /* APIC Error Bit 1 */
> - "Send accept error", /* APIC Error Bit 2 */
> - "Receive accept error", /* APIC Error Bit 3 */
> - "Redirectable IPI", /* APIC Error Bit 4 */
> - "Send illegal vector", /* APIC Error Bit 5 */
> - "Received illegal vector", /* APIC Error Bit 6 */
> - "Illegal register address", /* APIC Error Bit 7 */
> - };
> -
> - irq_enter();
> - exit_idle();
> - /* First tickle the hardware, only then report what went on. -- REW */
> - v0 = apic_read(APIC_ESR);
> - apic_write(APIC_ESR, 0);
> - v1 = apic_read(APIC_ESR);
> - ack_APIC_irq();
> - atomic_inc(&irq_err_count);
> -
> - apic_printk(APIC_DEBUG, KERN_DEBUG "APIC error on CPU%d: %02x(%02x)",
> - smp_processor_id(), v0 , v1);
> -
> - v1 = v1 & 0xff;
> - while (v1) {
> - if (v1 & 0x1)
> - apic_printk(APIC_DEBUG, KERN_CONT " : %s", error_interrupt_reason[i]);
> - i++;
> - v1 >>= 1;
> - }
> -
> - apic_printk(APIC_DEBUG, KERN_CONT "\n");
> -
> - irq_exit();
> -}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/