Re: [PATCH] sched/core: Minor optimize ttwu_runnable()

From: Valentin Schneider
Date: Fri Nov 04 2022 - 14:21:14 EST


On 02/11/22 18:23, Chengming Zhou wrote:
> ttwu_runnable() is used as a fast wakeup path when the wakee task
> is between set_current_state() and schedule(), in which case all
> we need to do is change p->state back to TASK_RUNNING. So we don't
> need to update_rq_clock() and check_preempt_curr() in this case.
>
> Some performance numbers using mmtests/perfpipe on Intel Xeon server:
>
> linux-next patched
> Min Time 8.67 ( 0.00%) 8.66 ( 0.13%)
> 1st-qrtle Time 8.83 ( 0.00%) 8.72 ( 1.19%)
> 2nd-qrtle Time 8.90 ( 0.00%) 8.76 ( 1.57%)
> 3rd-qrtle Time 8.98 ( 0.00%) 8.82 ( 1.82%)
> Max-1 Time 8.67 ( 0.00%) 8.66 ( 0.13%)
> Max-5 Time 8.67 ( 0.00%) 8.66 ( 0.13%)
> Max-10 Time 8.79 ( 0.00%) 8.69 ( 1.09%)
> Max-90 Time 9.01 ( 0.00%) 8.84 ( 1.94%)
> Max-95 Time 9.02 ( 0.00%) 8.85 ( 1.86%)
> Max-99 Time 9.02 ( 0.00%) 8.88 ( 1.56%)
> Max Time 9.59 ( 0.00%) 8.89 ( 7.29%)
> Amean Time 8.92 ( 0.00%) 8.77 * 1.65%*
>
> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
> ---
> kernel/sched/core.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 87c9cdf37a26..3785418de127 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
>
> rq = __task_rq_lock(p, &rf);
> if (task_on_rq_queued(p)) {
> - /* check_preempt_curr() may use rq clock */
> - update_rq_clock(rq);
> - ttwu_do_wakeup(rq, p, wake_flags, &rf);
> + WRITE_ONCE(p->__state, TASK_RUNNING);
> + trace_sched_wakeup(p);

This also loses the class->task_woken() call, AFAICT we could have
!p->on_cpu here (e.g. an IRQ hit before the task got to schedule()), but
then again if there is a need to push we should have done that at the IRQ
preempt via set_next_task_{rt, dl}()... So I'm starting to think this is
OK, but that needs elaborating in the changelog.


> ret = 1;
> }
> __task_rq_unlock(rq, &rf);
> --
> 2.37.2