Re: [PATCH V3 4/4] Sched: Add tracepoint for sched time slice extension

From: Prakash Sangappa
Date: Sun May 04 2025 - 21:44:28 EST




> On May 2, 2025, at 2:14 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, May 02, 2025 at 01:59:55AM +0000, Prakash Sangappa wrote:
>> Trace task's preemption due to a wakeups, getting delayed. Which
>> can occurs when the running task requested extra time on cpu.
>> Also, indicate the NEED_RESCHED flag set on the task getting
>> cleared.
>>
>> Signed-off-by: Prakash Sangappa <prakash.sangappa@xxxxxxxxxx>
>> ---
>> include/trace/events/sched.h | 28 ++++++++++++++++++++++++++++
>> kernel/entry/common.c | 8 ++++++--
>> 2 files changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
>> index 8994e97d86c1..4aa04044b14a 100644
>> --- a/include/trace/events/sched.h
>> +++ b/include/trace/events/sched.h
>> @@ -296,6 +296,34 @@ TRACE_EVENT(sched_migrate_task,
>> __entry->orig_cpu, __entry->dest_cpu)
>> );
>>
>> +/*
>> + * Tracepoint for delayed resched requested by task:
>> + */
>> +TRACE_EVENT(sched_delay_resched,
>> +
>> + TP_PROTO(struct task_struct *p, unsigned int resched_flg),
>> +
>> + TP_ARGS(p, resched_flg),
>> +
>> + TP_STRUCT__entry(
>> + __array( char, comm, TASK_COMM_LEN )
>> + __field( pid_t, pid )
>> + __field( int, cpu )
>> + __field( int, flg )
>> + ),
>> +
>> + TP_fast_assign(
>> + memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
>> + __entry->pid = p->pid;
>> + __entry->cpu = task_cpu(p);
>> + __entry->flg = resched_flg;
>> + ),
>> +
>> + TP_printk("comm=%s pid=%d cpu=%d resched_flg_cleared=0x%x",
>> + __entry->comm, __entry->pid, __entry->cpu, __entry->flg)
>> +
>> +);
>> +
>> DECLARE_EVENT_CLASS(sched_process_template,
>>
>> TP_PROTO(struct task_struct *p),
>> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
>> index b26adccb32df..1951e6a4e9bc 100644
>> --- a/kernel/entry/common.c
>> +++ b/kernel/entry/common.c
>> @@ -12,6 +12,7 @@
>>
>> #include "common.h"
>>
>> +#include <trace/events/sched.h>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/syscalls.h>
>>
>> @@ -100,9 +101,12 @@ __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>> local_irq_enable_exit_to_user(ti_work);
>>
>> if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) {
>> - if (irq && rseq_delay_resched())
>> + if (irq && rseq_delay_resched()) {
>> clear_tsk_need_resched(current);
>> - else
>> + trace_sched_delay_resched(current,
>> + ti_work & (_TIF_NEED_RESCHED |
>> + _TIF_NEED_RESCHED_LAZY));
>> + } else
>> schedule();
>> }
>
> This is horrific coding style. But really why do we need this? I'm not,
> in general, a fan of tracepoints.

I will move the trace_sched_delay* call towards the end of the routine,
assuming we are keeping this tracepoint.