Re: [PATCH] dl_server: Reset DL server params when rd changes

From: Waiman Long
Date: Thu Nov 07 2024 - 23:40:44 EST

Next message: Jakub Kicinski: "Re: [PATCH net-next] net: mctp: Expose transport binding identifier via IFLA attribute"
Previous message: anish kumar: "Re: [PATCH] Documentation: extcon: add documentation for Extcon subsystem"
In reply to: Waiman Long: "Re: [PATCH] dl_server: Reset DL server params when rd changes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/6/24 1:05 PM, Waiman Long wrote:

On 11/6/24 11:08 AM, Juri Lelli wrote:

On 04/11/24 17:41, Joel Fernandes wrote:

On Mon, Nov 04, 2024 at 11:54:36AM +0100, Juri Lelli wrote:

...

I added a printk in __dl_server_attach_root which is called after the
dynamic rd is built to transfer bandwidth to it.

__dl_server_attach_root came with d741f297bceaf ("sched/fair: Fair
server interface"), do you have this change in your backport?

You nailed it! Our 5.15 backport appears to be slightly older and is missing
this from topology.c as you mentioned. Thanks for clarifying!

         /*
          * Because the rq is not a task, dl_add_task_root_domain() did not
          * move the fair server bw to the rd if it already started.
          * Add it now.
          */
         if (rq->fair_server.dl_server)
__dl_server_attach_root(&rq->fair_server, rq);

So if rd changes during boot initialization, the correct dl_bw has to be
updated AFAICS. Also if cpusets are used, the rd for a CPU may change.
cpusets changes are something that I still need to double check. Will
do.

Sounds good, that would be good to verify.

So, I played a little bit with it and came up with a simple set of ops
that point out an issue (default fedora server install):

echo Y >/sys/kernel/debug/sched/verbose

echo +cpuset >/sys/fs/cgroup/cgroup.subtree_control

echo 0-7 > /sys/fs/cgroup/user.slice/cpuset.cpus
echo 6-7 > /sys/fs/cgroup/user.slice/cpuset.cpus.exclusive
echo root >/sys/fs/cgroup/user.slice/cpuset.cpus.partition

The domains are rebuilt correctly, but we end up with a null total_bw.

The conditional call above takes care correctly of adding back dl_server
per-rq bandwidth when we pass from the single domain to the 2 exclusive
ones, but I noticed that we go through partition_sched_domains_locked()
twice for a single write of 'root' and the second one, since it's not
actually destroying/rebuilding anything, is resetting total_bw w/o
addition dl_server contribution back.

Now, not completely sure why we need to go through partition_sched_
domains_locked() twice, as we have (it also looked like a pattern from
other call paths)

update_prstate()
-> update_cpumasks_hier()
    -> rebuild_sched_domains_locked() <- right at the end
-> update_partition_sd_lb()
    -> rebuild_sched_domains_locked() <- right after the above call

Removing the first call does indeed fix the issue and domains look OK,
but I'm pretty sure I'm missing all sort of details and corner cases.

Waiman (now Cc-ed), maybe you can help here understanding why the two
back to back calls are needed?

Thanks for letting me know about this case.

I am aware that rebuild_sched_domains_locked() can be called more than once. I have addressed the hotplug case, but it can happen in some other corner cases as well. The problem with multiple rebuild_sched_domains_locked() calls is the fact that intermediate ones may be called where the internal states may not be consistent. I am going to work on a fix to this issue by making sure that rebuild_sched_domains_locked() will only be called once.

I am working on a set of cpuset patches to eliminate redundant rebuild_sched_domains_locked() calls. However, my cpuset test script fails after the change due to the presence of test cases where the only CPU in a 1-cpu partition is being offlined. So I sent out a sched/deadline patch [1] to work around this particular corner case.

[1] https://lore.kernel.org/lkml/20241108042924.520458-1-longman@xxxxxxxxxx/T/#u

Apparently, the null total_bw bug caused by multiple rebuild_sched_domains_locked() calls masks this problem.

Anyway, I should be able to post the cpuset patch series next week after further testing. Please review my sched/deadline patch to see if you are OK with this minor change.

Cheers,
Longman

Next message: Jakub Kicinski: "Re: [PATCH net-next] net: mctp: Expose transport binding identifier via IFLA attribute"
Previous message: anish kumar: "Re: [PATCH] Documentation: extcon: add documentation for Extcon subsystem"
In reply to: Waiman Long: "Re: [PATCH] dl_server: Reset DL server params when rd changes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]