Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

From: Qais Yousef
Date: Wed Apr 22 2020 - 09:13:55 EST


On 04/22/20 12:59, Dietmar Eggemann wrote:
> On 21/04/2020 13:27, Qais Yousef wrote:
> > On 04/21/20 13:18, Dietmar Eggemann wrote:
> >> On 20/04/2020 17:13, Qais Yousef wrote:
> >>> On 04/20/20 10:29, Dietmar Eggemann wrote:
> >>>> On 03.04.20 14:30, Qais Yousef wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>> @@ -924,6 +945,14 @@ uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> >>>>> return uc_req;
> >>>>> }
> >>>>>
> >>>>> +static void uclamp_rt_sync_default_util_min(struct task_struct *p)
> >>>>> +{
> >>>>> + struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> >>>>> +
> >>>>> + if (!uc_se->user_defined)
> >>>>> + uclamp_se_set(uc_se, sysctl_sched_rt_default_uclamp_util_min, false);
> >>>>> +}
> >>>>> +
> >>>>> unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
> >>>>> {
> >>>>> struct uclamp_se uc_eff;
> >>>>> @@ -1030,6 +1059,12 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> >>>>> if (unlikely(!p->sched_class->uclamp_enabled))
> >>>>> return;
> >>>>>
> >>>>> + /*
> >>>>> + * When sysctl_sched_rt_default_uclamp_util_min value is changed by the
> >>>>> + * user, we apply any new value on the next wakeup, which is here.
> >>>>> + */
> >>>>> + uclamp_rt_sync_default_util_min(p);
> >>>>> +
> >>>>
> >>>> Does this have to be an extra function? Can we not reuse
> >>>> uclamp_tg_restrict() by slightly rename it to uclamp_restrict()?
>
> Btw, there was an issue in my little snippet. I used uc_req.user_defined
> uninitialized in uclamp_restrict().
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f3706dad32ce..7e6b2b7cd1e5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -903,12 +903,11 @@ uclamp_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> {
> struct uclamp_se uc_req, __maybe_unused uc_max;
>
> - if (unlikely(rt_task(p)) && clamp_id == UCLAMP_MIN &&
> - !uc_req.user_defined) {
> + if (unlikely(rt_task(p)) && clamp_id == UCLAMP_MIN) {
> struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> int rt_min = sysctl_sched_rt_default_uclamp_util_min;
>
> - if (uc_se->value != rt_min) {
> + if (!uc_se->user_defined && uc_se->value != rt_min) {
> uclamp_se_set(uc_se, rt_min, false);
> printk("uclamp_restrict() [%s %d] p->uclamp_req[%d].value=%d\n",
> p->comm, p->pid, clamp_id, uc_se->value);
>
> >>> Hmm the thing is that we're not restricting here. In contrary we're boosting,
> >>> so the name would be misleading.
> >>
> >> I always thought that we're restricting p->uclamp_req[UCLAMP_MIN].value (default 1024) to
> >> sysctl_sched_rt_default_uclamp_util_min (0-1024)?
> >
> > The way I look at it is that we're *setting* it to
> > sysctl_sched_rt_default_uclamp_util_min if !user_defined.
> >
> > The restriction mechanism that ensures this set value doesn't escape
> > cgroup/global restrictions setup.
>
> I guess we overall agree here.
>
> I see 3 restriction levels: (!user_defined) task -> taskgroup -> system
>
> I see sysctl_sched_rt_default_uclamp_util_min (min_rt_default) as a
> restriction on task level.

Hmm from code perspective it is a request. One that is applied by default if
the user didn't make any request.

Since restriction has a different meaning from code point of view, I think
interchanging both could be confusing.

A restriction from the code view is that you can request 1024, but if cgroup or
global settings doesn't allow you, the system will automatically crop it.

The new sysctl doesn't result in any cropping. It is equivalent to a user
making a sched_setattr() call to change UCLAMP_MIN value of a task. It's just
this request is applied automatically by default, if !user_defined.

But if you meant your high level view of how it works you think of it as
a restriction, then yeah, it could be abstracted in this way. The terminology
just conflicts with the code.

>
> It's true that the task level restriction is setting the value at the same time.
>
> For CFS (id=UCLAMP_[MIN\|MAX]) and RT (id=UCLAMP_MAX) we use
> uclamp_none(id) and those values (0, 1024) are fixed so these task level
> values don't need to be further restricted.

I wouldn't think of these as restriction. They're default requests, if
I understood what you're saying correctly, by default:

cfs_task->util_min = 0
cfs_task->util_max = 1024

rt_task->util_min = 1024
rt_task->util_max = 1024

Which are the requested value.

sysctl_util_clamp_{min,max} are the default restriction which by default would
allow the tasks to request any value within the full range.

The root taskgroup will inherit this value by default. And new cgroups will
inherit from the root taskgroup.

>
> For RT (id=UCLAMP_MIN) we use 'min_rt_default' and since it can change
> we have to check the task level restriction in 'uclamp_eff_get() ->
> uclamp_(tg)_restrict()'.

Yes. If we take the approach to apply the default request in uclamp_eff_get(),
then this must be applied before uclamp_tg_restrict() call.

>
> root@h960:~# echo 999 > /proc/sys/kernel/sched_rt_default_util_clamp_min
>
> [ 2540.507236] uclamp_eff_get() [rtkit-daemon 419] tag=0 uclamp_id=0 uc_req.value=1024
> [ 2540.514947] uclamp_eff_get() [rtkit-daemon 419] tag=1 uclamp_id=0 uc_req.value=1024
> [ 2548.015208] uclamp_restrict() [rtkit-daemon 419] p->uclamp_req[0].value=999
>
> root@h960:~# echo 666 > /proc/sys/kernel/sched_util_clamp_min
>
> [ 2548.022219] uclamp_eff_get() [rtkit-daemon 419] tag=0 uclamp_id=0 uc_req.value=999
> [ 2548.029825] uclamp_eff_get() [rtkit-daemon 419] tag=1 uclamp_id=0 uc_req.value=999
> [ 2553.479509] uclamp_eff_get() [rtkit-daemon 419] tag=0 uclamp_id=0 uc_max.value=666
> [ 2553.487131] uclamp_eff_get() [rtkit-daemon 419] tag=1 uclamp_id=0 uc_max.value=666
>
> Haven't tried to put an rt task into a taskgroup other than root.

I do run a test that Patrick had which checks for cgroup values. One of the
tests checks if RT are boosted to max, and it fails when I change the default
RT max-boost value :)

I think we're in agreement, but the terminology is probably making things a bit
confusing.

Thanks

--
Qais Yousef