Re: [PATCH] tracing/perf: disable preemption in perf_trace calls
From: Aditya Chillara
Date: Tue Jul 01 2025 - 11:21:07 EST
On 7/1/2025 12:02 AM, Steven Rostedt wrote:
> On Mon, 30 Jun 2025 22:35:28 +0530
> Aditya Chillara <quic_achillar@xxxxxxxxxxx> wrote:
>
>> It is possible that perf_trace_sched_switch call is rescheduled causing
>> NULL access in perf_trace_buf_alloc because of perf_trace_event_unreg
>> called in between, leading to data abort:
>>
>> [ 363.202351][ T0] Unable to handle kernel paging request at virtual address ffffffba56d0a03c
>> [ 363.202355][ T0] Mem abort info:
>> [ 363.202355][ T0] ESR = 0x0000000096000045
>> [ 363.202357][ T0] EC = 0x25: DABT (current EL), IL = 32 bits
>> [ 363.202359][ T0] SET = 0, FnV = 0
>> [ 363.202360][ T0] EA = 0, S1PTW = 0
>> [ 363.202361][ T0] FSC = 0x05: level 1 translation fault
>> [ 363.202363][ T0] Data abort info:
>> [ 363.202364][ T0] ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
>> [ 363.202365][ T0] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>> [ 363.202367][ T0] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [ 363.202369][ T0] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000c9795000
>> [ 363.202371][ T0] [ffffffba56d0a03c] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
>> [ 363.202377][ T0] Internal error: Oops: 0000000096000045 [#1] PREEMPT SMP
>> [ 363.202383][ T0] Dumping ftrace buffer:
>> [ 363.202385][ T0] (ftrace buffer empty)
>> .
>> .
>> [ 363.203036][ T0] CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Tainted: G W O \
>> 6.12.23-android16-5-maybe-dirty-4k #1 6d534d25cf9b12caecb741d5004c181a6a432b53
>> [ 363.203042][ T0] Tainted: [W]=WARN, [O]=OOT_MODULE
>> [ 363.203045][ T0] pstate: 834000c5 (Nzcv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
>> [ 363.203048][ T0] pc : perf_trace_buf_alloc+0x90/0xe0
>> [ 363.203057][ T0] lr : perf_trace_buf_alloc+0x40/0xe0
>> [ 363.203060][ T0] sp : ffffffc08016bc90
>> [ 363.203062][ T0] x29: ffffffc08016bc90 x28: 0000000000000000 x27: ffffff88083d0000
>> [ 363.203066][ T0] x26: ffffffba56d0a000 x25: 00000044684e8bd8 x24: 1198f2d01457f4e4
>> [ 363.203070][ T0] x23: 0000000000000000 x22: ffffff88083d0000 x21: ffffffc08016bccc
>> [ 363.203073][ T0] x20: ffffffc08016bcd0 x19: 0000000000000044 x18: ffffffd01580bec0
>> [ 363.203077][ T0] x17: 00000000d2dc9ceb x16: 00000000d2dc9ceb x15: 0000000000000010
>> [ 363.203080][ T0] x14: 0000000000000100 x13: 0000000000000100 x12: 0000000000000100
>> [ 363.203083][ T0] x11: 0000000000ff0100 x10: 0000000000000001 x9 : ffffffba56d0a000
>> [ 363.203087][ T0] x8 : ffffffba56d0a044 x7 : 0000000000000001 x6 : 000000548fd2bf93
>> [ 363.203090][ T0] x5 : 00000054908be990 x4 : 0000000000000000 x3 : ffffff8808329640
>> [ 363.203094][ T0] x2 : ffffffc08016bccc x1 : ffffffc08016bcd0 x0 : ffffffba56d0a000
>> [ 363.203097][ T0] Call trace:
>> [ 363.203098][ T0] perf_trace_buf_alloc+0x90/0xe0
>> [ 363.203102][ T0] perf_trace_sched_switch+0x74/0x190
>> [ 363.203107][ T0] __schedule+0xbb0/0xeac
>> [ 363.203112][ T0] schedule_idle+0x24/0x48
>> [ 363.203114][ T0] do_idle+0x214/0x25c
>> [ 363.203117][ T0] cpu_startup_entry+0x34/0x3c
>> [ 363.203119][ T0] secondary_start_kernel+0x130/0x150
>> [ 363.203124][ T0] __secondary_switched+0xc0/0xc4
>> [ 363.203129][ T0] Code: f8605908 d538d089 8b080120 8b33c008 (f81f811f)
>> [ 363.203131][ T0] ---[ end trace 0000000000000000 ]---
>> [ 363.203133][ T0] Kernel panic - not syncing: Oops: Fatal exception
>> [ 363.203135][ T0] SMP: stopping secondary CPUs
>> [ 363.203276][ T0]
>> [ 363.203276][ T0] Total active clients count: 0
>> [ 363.203280][ T0] Dumping ftrace buffer:
>> [ 363.203281][ T0] (ftrace buffer empty)
>>
>> Fix the issue by disabling preemption in perf_trace calls.
>
> Does it really fix the issue?
No, wanted to confirm that perf_trace_##call() should not get
scheduled out.
The following is also present in the logs:
[<ffffffd4cde44348>] do_mem_abort+0x54
[<ffffffd4cef79b5c>] el1_abort+0x3c
[<ffffffd4cef79ad4>] el1h_64_sync_handler[jt]+0xb4
[<ffffffd4cde11298>] ret_to_kernel[jt]+0x0
[<ffffffd4ce05a19c>] perf_trace_buf_alloc+0x90
[<ffffffd4cdf137fc>] perf_trace_sched_process_template+0x64
[<ffffffd4cdf115f4>] __traceiter_sched_process_free+0x48
[<ffffffd4cded7eb8>] delayed_put_task_struct+0x94
[<ffffffd4cdfb1564>] rcu_do_batch+0x1f0
[<ffffffd4cdfb3ec8>] rcu_nocb_cb_kthread+0x250
[<ffffffd4cdf08420>] kthread+0x11c
[<ffffffd4cde1f174>] ret_from_fork+0x10
>
> 1. perf_trace_##call() gets called either by DECLARE_EVENT_CLASS or by
> DECLARE_EVENT_SYSCALL_CLASS. The latter already disables preemption
> but the former does not because it is called by the tracepoint logic
> that has:
>
> #define __DECLARE_TRACE(name, proto, args, cond, data_proto) \
> __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \
> static inline void __do_trace_##name(proto) \
> { \
> if (cond) { \
> guard(preempt_notrace)(); \
> __DO_TRACE_CALL(name, TP_ARGS(args)); \
> } \
> }
>
> Where preemption should already be disabled.
DECLARE_EVENT_SYSCALL_CLASS in not present in our tree.
>
> 2. trace_sched_switch() is called with interrupts disabled. So there
> should be no scheduling out in this trace event. So how did it get
> scheduled out in that trace event?
Thank you for confirming that trace_sched_switch() should not be
scheduled out.
>
> I think there's something else going on.
I will try my luck with CONFIG_DEBUG_PREEMPT to check for any
preemption underflows. Any other idea what could be going on?
>
> -- Steve
>
>
>>
>> Signed-off-by: Aditya Chillara <quic_achillar@xxxxxxxxxxx>
>> ---
>> Change-Id: I787347a7db4554cfd2cb74e0fc985699f1824c92
>> ---
>> include/trace/perf.h | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/include/trace/perf.h b/include/trace/perf.h
>> index 5800d13146c3d0cdf4eb474a3bb895dd71753de4..76994a5d8c55f4ca071070ff551a53a84fb1d95c 100644
>> --- a/include/trace/perf.h
>> +++ b/include/trace/perf.h
>> @@ -66,6 +66,8 @@ perf_trace_##call(void *__data, proto) \
>> int __data_size; \
>> int rctx; \
>> \
>> + guard(preempt_notrace)(); \
>> + \
>> __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
>> \
>> head = this_cpu_ptr(event_call->perf_events); \
>>
>> ---
>> base-commit: 70575e77839f4c5337ce2653b39b86bb365a870e
>> change-id: 20250630-fix-perf-trace-null-access-9661f910df72
>>
>> Best regards,
>